Connect with us

AI detects drug dealers on Instagram with 95% accuracy

Published

on

A multimodal machine learning system identifies drug dealers on Instagram by analyzing various content.

American experts have developed a multimodal machine learning system capable of identifying the pages and publications of drug traffickers on Instagram by analyzing various content, including photographs.

By the author research dubbed Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion, is a team from the University of West Virginia and Case University of the Western Reserve District.

As part of the project, researchers created a database called Identifying Drug Dealers on Instagram (IDDIG), which included 4,000 pages of social network users, of which 1,400 belonged to drug dealers, and the rest played a role control group.

As shown by the results of the first tests, the system developed by specialists identified drug dealers on Instagram with an accuracy of 95%. In addition, the system has pushed for the creation of a hashtag-based community discovery project designed to detect changing indications of drug-related activity using geographic factors and identify specific types of drugs.

The activities of drug dealers on Instagram are not always obvious. They often advertise their services in comments and using hashtags, rather than in publications that would be much easier for both machine and person to find. In this regard, the system developed by specialists also analyzes hashtags and comments.

In addition to text analysis using the BERT language model and image classification using the ResNet neural network, the system also uses multimodal data fusion at the function level, as suggested in document IEEE Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition 2016.

The system begins work to identify drug traffickers by tracking posts with one or more of two hundred drug-related hashtags using the hashtag search API.

Photos in posts with these hashtags are then classified using a binary classification model based on VGG-16. If the images match the images of known drugs, they are stored in the system and the publication is converted to a JSON object for further retrieval.

Next, the system examines comments and other information (both text and images) on the pages of users who published the searched hashtags and whose content was marked as drug-related. Thus, 10 thousand publications and more than 23 thousand user pages were added to the dataset.

In order to bypass detection by law enforcement agencies, drug-related hashtags are constantly changing. Therefore, each new hashtag in the tagged post that is not on the list of drug-related hashtags is recorded and entered into the system for later use.

Ultimately, the dataset is processed using the NetworkX package of the Python programming language. By treating the hashtags as if they belonged to the same publication, the researchers were able to generate a drug-related undirected graph for analysis using NetworkX.

Researchers tested the IDDIG dataset on a variety of protocols, including Multimodal Data Fusion, Multisource Data Fusion, and Quadruple Fusion, and were able to identify drug-related publications and users with up to 95% accuracy when compared to human data processing.

BERT (Bidirectional Encoder Representations from Transformers) is a neural network transformer model developed by Google, on which most automatic language processing tools are currently built.

IEEE (Institute of Electrical and Electronics Engineers) – Institute of Electrical and Electronics Engineers. International non-profit association of technicians, the world leader in the development of standards for electronics, electrical engineering and hardware for computing systems and networks.

VGG16 is a convolutional neural network model for image feature extraction. It was proposed by experts from the University of Oxford K. Simonyan and A. Zisserman. The model achieves an accuracy of 92.7% when tested on ImageNet in the task of recognizing objects in an image.

JSON is a textual data interchange format based on JavaScript. Like many other text formats, JSON is easy to read by humans. Despite its origin from JavaScript (more precisely, from a subset of the 1999 ECMA-262 standard), the format is considered language independent and can be used with almost any programming language.

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Advertisement

Latest News

Advertisement