Google search engines only indexes 5% of the Internet. The websites that we can see through search engines is knows as Surface Web. The rest of 95% of Internet is called Deep Web. And. Deep web pages can’t be indexed by search engine and there is lot of illegal activity going on in these Dark Web websites. To resolve these problems the US government is funding a project for creation of search engine for Deep Web. This will not only allow US government to control the Internet completely but also stop crimes like human trafficking mentions Mike Stevens who is information security training professor from IICS.
The search engine is called Memex; the search engine indexes websites that normal search engines can’t index, presenting results graphically so that any hidden links can be identified.
The US government is focusing on Memex to resolve problem of human trafficking as it largely relies on the Internet to attract clients. However the government has plans to go against cyber crime is Deep Web.
The Dark Web could soon be a lot brighter with this new search engine that aims at criminals. Memex depends heavily on indexing forums, chat services, job postings and other hidden services that allow trade in Dark Web. Memex will track and map the connection between illicit advertisements with the suspected criminals who post them.
Memex scans the Deep Web for ads that point users to sites where child pornography or other human slavery exist. Thus it will index those images, sources and websites so that information can used to map the criminals. It takes phone numbers and emails information to track the criminals. Memex has been designed for normal users without technical background. The image search focuses on image metadata like camera serial number and image comparison to find the exact match.
Memex has two crawlers, Ache and Nutch. Both crawlers use the data they collect in unique ways. Both crawlers require a list of URLs to crawl, which is called a seeds list.
Nutch is developed by Apache, and has interaction with both Solr and Elasticsearch, and this makes Memex different from Ache. Nutch runs in uninterruptible rounds of crawling. Nutch will run indefinitely until asked to stop.
The number of pages left to crawl in a Nutch increases significantly after each round. With Nutch, you can begin with a seeds list of 100 pages to crawl, and it can find over 1000 pages to crawl for the next round.
Ache is developed by NYU. Ache is different from Nutch because we have to create a crawl model before you can run a crawl. Unlike Nutch, Ache can be stopped at any time.
As per information security training experts, Memex project is still under development however it is viable to general public to use. You can download and install Memex from Gethub by searching Memex-Explorer.