How AI Helps Illuminate the Dark Web
The public face of the internet where we log in daily to check emails, social media and news items exists alongside the so-called ‘dark web’. Here you find the anonymous sites, password-protected areas that are a host to criminals who use it to sell drugs, guns and sometimes people.
The law enforcement community have a running battle to try to stop the activities that take place here by tracking down the users in real life. It has been an uphill struggle but now they are using AI to help find users by looking a similarities of profiles, usernames, content and sentence structure.
Dark web markets open and close quickly because they are either hacked, raided or were designed to close immediately customers had paid for non-existent goods. Having operating for a couple of months or years they close and the owners disappear without trace.
The challenges posed by the dark web have been addressed by MIT Lincoln Laboratory and they are now leveraging AI to look at the connections between users on the web across both the light and dark web forums to establish links between profiles.
Switching sites is a key way that dark web markets operate, with new profiles made constantly and changing usernames they can keep their connections alive and signal their location to other users. The tools produced by MIT focus on linking the users to their multiple identities.
Law enforcement’s attempts to do this in the past have come across the obvious difficulties that shuffling through a huge amount of data presents. With 500,000 phone numbers and over 2 million sex ads posted every single month it just is not possible to keep up the pace. The process is now a lot faster thanks to Lincoln MIP who have trained machine learning to look at similarities of the users that exist on different forums.
The ML looks at three main aspects around the user’s communication: Identity, subjects matter and those they connect with. For example:
- Data from Forum A is inputted to create an authorship model.
- Data from Forum B is run against the model for Forum A.
- The algorithm looks for basic clues like changes in username spelling and other subtle username changes occurring between the forums.
- Content similarity is the next focus. Unique phrases used in multiple areas, the use of copy-and-paste which could indicate that it was the same user.
- The system looks at who the user interacts with and the topics discussed.
- The system combines all these factors and produces a single probability score showing how likely the different users are the same person in real life.
Testing has taken place combining Twitter, Instagram and dark web data, all via authorized means, with promising results. The AI shows a 95% success rate in finding the correct match.
This new algorithm ads to the research that Lincoln Laboratory conducted from 2014 to 2017. This brought about the Memex Program which is used by DARPA (Defense Advanced Research Projects Agency) and has spawned a whole suite of key web data analysis software in collaboration with many universities, laboratories and companies. Memex is now available as open-source software and over 30 agencies globally use effectively to investigate criminal activities online. The biggest user is currently Manhattan’s District Attorney’s Office as part of their Human Trafficking Response Unit (HTRU) where it has assisted in over 6,000 arrests in 2017 and has significantly increased the investigation rate of sex trafficking cases.
Researchers are continuing to develop new ways for these technologies to assist agencies, particularly on the dark web. The dark web economy has been increasingly used to fund terrorism, slavery and other criminal activities and the goal is that by linking real-life personas to online profiles more successful prosecutions can be brought.
If you’re looking for a company that provides React Native Application Development and other software development services contact us.