Every once in a while Lennart Haagsma (a colleague and friend) and me think of new projects to work on. Anything that peaks our interest at the time gets on a list. When we actually end up going through with building one of the projects on the list we usually only get to a POC stage and we stop working on it. This is mostly because the fun of building and/or researching is gone at the point that it is fully functioning.
About 2 years ago we decided we wanted to do something with the Tor network. One of the things we’d been interested in at work was passively monitoring and researching DNS on the ‘normal’ internet. This started our quest of figuring out how DNS worked for hidden services inside the Tor network. For this project we actually ended with a set of completed tools including a modified Tor client to do our work. The end product of everything combined, including a small web interface to explore collected data, currently looks like this:
The code for this project can be found here: https://www.github.com/0x3a/tor-dns/. How to run this project and set it up properly will be explained in the rest of this blog.
Introduction
Although we call it DNS because the current implementation is quite similar to how the DNS system works in general, in detail resolving hidden services within Tor works quite differently. The 'resolving’ of hidden services, just like the rest of the aspects of the Tor network, is designed to ensure anonymity of the user as well as the hidden service itself. Our research (and this article) concludes that while hidden services have anonymity in terms of nobody knowing their real IP address or location they have no anonymity for the fact they exist and how to reach them. Mass collection of onion addresses already occurs and has been possible for a long time. Most recently there was a talk at DEFCON 24 titled 'Honey Onions: Exposing Snooping Tor HSDir Relays’ which investigated this subject. The researchers setup hidden services for the purpose of seeing who would end up on it without advertising their onion address anywhere. For some tests almost immidiately after announcing to HSDirs (Tor DNS servers) their hidden service would be visited and in some cases attacked. This gave clear indication onion collection is already occuring on a large scale with malicious intent in some cases.
A good thing to know is that the people from the Tor Project have been actively working to improve this system and have been working on ’Next-Generation Hidden Services in Tor’ since late 2013. This new generation of the hidden services will not allow mass collection/harvesting of onion addresses as the onion address will only be known to the hidden service and the client trying to visit it. Exact details of the improvements and design can be found on the Tor Project gitweb proposals section: 224-rend-spec-ng.txt. After the completion of this system, any newly generated hidden services following this spec will not be captured using the logging methods described here. Hidden Service operators that do not generate a new hidden service will still follow the old system due to compatability.
Hidden Services, Directories and Onion addresses
This section is a high-level overview of how the Tor network implementation works in terms of hidden services and hidden service directories in relation to onion address harvesting. Would you want exact technical details you will have to read the offical Tor rendezvous protocol design document: rend-spec.txt.
As described on the “Tor: Hidden Service Protocol” page on the Tor project website a hidden service is made reachable for a client reachable via:
- Introduction points by which connectivity is arranged to actually reach the service.
- Hidden Service Directories which allows clients to figure out the introduction points for a specific hidden service. (On a high-level they function as a DNS server for hidden services.)
Hidden services generate a descriptor file which clients can use to reach them. These descriptors contain the introduction points for this hidden service as well as some additional information. A hidden service publishes this descriptor file to a fixed amount of Hidden Service Directories about every X amount of time. The exact hidden service directories a hidden service publishes this file to changes every time of publication. Clients will perform lookup requests to hidden service directories to get the descriptor file for the hidden service they want to reach.
This request the client performs doesn’t contain the actual onion address but because the onion address can be calculated from the descriptor it means any succesfully 'resolved’ request from a client can be logged with the actual requested onion address.
To log these requests we patched the rendevous code in directory.c
around line #3006
in the Tor client sourcecode. Additionally an extra function was added to convert the descriptor information to an actual onion address as wel as an extra log-type named HSDIR_REQUEST
to specify the log reason. With this simple modification we can now log any request a client makes. Examples of the logs can be seen in the section below called ’Running your own onion address collection system’.
One thing that is left open and we will not discuss in detail (or release code for) is the other way to log/harvest onion addresses. A hidden service, as explained, announces itself to hidden service directores, this directory could log any published descriptor file. Its trivial to implement but gives you a lot of additional data (not every hidden service you have in your database will be requested). We’ll leave this as an excercise for the reader to implement.
Using the collected onion addresses
Besides it just being interesting to browse around and look at what kind of services are available in the Tor network there were some active use-cases for this collection. I’ve documented a few examples below.
Darkweb market monitoring
At the end of 2014 the FBI, together with other agencies, took down Silkroad 2.0 the famous darkweb drugs and services market. About 2 days later a new market appeared called 'Silkroad Reloaded’ which was tweeted about a day after it came online:
#Silkroad 3 is has already appeared “Silk Road Reloaded” (http://qxvfcavhse45ckpw.onion) pic.twitter.com/lvc3SCpoYu
— Yonathan Klijnsma (@ydklijnsma)November 8, 2014
The only reason this was found was simply searching for any onion addresses that appeared after the takedown date and contained 'silkroad’ in their title or address in some form. This allowed the new market site and forum to be picked right after it was setup.
Of course we weren’t the first to know about it, the only reason we are seeing it is because other people were browsing for it meaning someone knew about it before us.
Monitoring criminal infrastructure
Since 2013 I had, in personal time and during work, been monitoring a group of criminals that were running a ransomware operation named 'CryptoWall’. Over the years I followed their developments and documented this on the CryptoWall Tracker.
One of the things I was able to do for this group was maintain a good view on their infrastructure inside the Tor network. For this I used my own onion collection server which would, on the collected hidden services, retrieve their favicon.ico file from the webserver root (would the service be running a web server). The return data from this request was hashed and compared against a set of known values. These known values were the favicons the criminals, for some weird reason, always had custom set on their servers. The icons can be seen in the infrastructure subpage on the CryptoWall tracker: [Infrastructure].
This position allowed me to quickly investigate any new developments for this group and share this amongst peers for further investigation.
Finding new criminal infrastructure: Ransomware
Back in 2015 I published a blog about a new ransomware strain named 'CryptoApp’. I found this ransomware by checking the content of the pages for the scraped onion addresses I had in my personal Onion Viewer instance. This viewer is slightly modified as it also stores full page content so I can do raw searches for html content. I do not however store any resources remotely linked for various reasons. You can read about the ransomware and the full analysis on my blog: Analysis of a piece of ransomware in development: the story of 'CryptoApp’.
Interestingly the hidden service went offline before my blog was published. However, a few days later I was pointed to an active spam campaign in Italy in which the ransomware was actually used to target people’s machines. Development of the ‘CryptoApp’ ransomware finished; changes & active campaign
Monitoring occurences inside the Tor network
In February of this year there was an unusual spike in announced hidden services inside the Tor network.
A lot of news agencies picked up on this but nobody seemed to have an explenation for what was going on. From my own probe all that I could see were the announcement of these services, never did any of them actually get resolved by a client. Interestingly trying to resolve any of these addresses myself it was never possible to rendezvous with them meaning I could never connect to them.. An interesting attack on the Tor network (some operators complained their HSDirs started crashing) or a research project to be presented at a conference next year? Who knows.
Previous work
One of the things we did when we were reading the Tor documentation is check if anybody else had done something like this. It turned out someone actually had done this back in 2013 which could well have been the reason the design of next generation hidden services protocol was started the same year.
The research was done by Donncha O'Cearbhaill and published on his blog in May 2013, you can read his research it here (its really good): Trawling Tor Hidden Service – Mapping the DHT. Parts of the modified Tor client source code was based on his work.
Running your own onion address collection system
As said in the introduction we ended up with a set of tools that were combined to form our onion collection system, we’ve decided to make the code for this public. This system consists of 4 parts for realtime aggregation of onion address collection, there is one additional tool available that will be described in a sub section. The 4 'core’ parts of the system:
- Modified Tor client
- onion-publisher (collects hsdir requests from the running client and forwards them to the receiver)
- onion-receiver (receives hsdir requests, processes them and puts them in a database)
- 'Onion Viewer’ web interface (allows you to browse the collected onion address data)
While everything could have been put in one application it was split up in different components. This was done for a sample reason: flexibility to be able to split up infrastruacture. It means you’re able to run multiple Tor nodes on different servers and stream the requests from those Tor nodes to one backend server and ingest the data in one place to make analysis easier.
You can get all the components to run your own collection system here: https://www.github.com/0x3a/tor-dns/. The next sections will explain how to setup these modules
Tor client
The modified Tor client source can be compiled by simply running the following two commands:
user@tornode:~/tor-0.2.6.1-alpha$ ./configure
checking for gethostbyname_r... yes
checking how many arguments gethostbyname_r() wants... 6
checking whether the C compiler supports __func__... yes
checking whether the C compiler supports __FUNC__... no
checking whether the C compiler supports __FUNCTION__... yes
checking whether we have extern char **environ already declared... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Doxyfile
config.status: creating Makefile
config.status: creating contrib/dist/suse/tor.sh
config.status: creating contrib/operator-tools/tor.logrotate
config.status: creating contrib/dist/tor.sh
config.status: creating contrib/dist/torctl
After having confirmed you have all the prerequisites you can build the Tor binary:
user@tornode:~/tor-0.2.6.1-alpha$ make
make all-am
make[1]: Entering directory `/home/user/tor-0.2.6.1-alpha'
CC src/or/src_or_libtor_testing_a-rendclient.o
After compiling you should have a binary located at src/or/tor
which you can run:
user@tornode:~/tor-0.2.6.1-alpha/src/or$ ./tor -v
Oct 26 12:00:00.000 [notice] Tor v0.2.6.10 (git-58c51dc6087b0936) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1f and Zlib 1.2.8.
Oct 26 12:00:00.001 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Oct 26 12:00:00.001 [warn] Command-line option '-v' with no value. Failing.
After this you will have to build a torrc
file which is a configuration file for the Tor client. While you can keep the default settings, there are 2 things you need to enable / set to ensure that your node will become a Hidden Service Directory and that the correct log output is generated.
The first option that needs to be set is logging. Within the Tor client there are different levels of log entries. For the specific 'DNS’ requests we need to enable the 'notice’ level to be logged to a file. You can do this by specifying the following line in the torrc config file:
Log notice file
The second option is to tell the Tor network you would like to function as a directory, you can do this by specifying the following line in the torrc config:
DirPort 9030
After setting up your configuration file (you can simply grab the one located in src/config/torrc.example.in
and edit it) you can run the Tor client. You specify your torrc file with the -f
flag:
user@tornode:~$ ~/tor-0.2.6.1-alpha/src/or$ -f /home/user/torrc
Oct 26 12:00:00.000 [notice] Tor v0.2.6.10 (git-58c51dc6087b0936) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1f and Zlib 1.2.8.
Oct 26 12:00:00.000 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Oct 26 12:00:00.000 [notice] Read configuration file "/home/user/torrc".
Oct 26 12:00:00.005 [notice] Opening Socks listener on 127.0.0.1:9050
Oct 26 12:00:00.005 [notice] Opening OR listener on 0.0.0.0:9001
Oct 26 12:00:00.005 [notice] Opening Directory listener on 0.0.0.0:9030
Now you will have to wait for the authorities to vote for your status to become a directory, this takes up to 96 hours. You can check your status on the Tor Atlas website by checking your IP address. Once you have received the flag you will start to see requests like this come in:
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6eoyaaogpxkipv6lelirmerpbm|76qugh5bey5gum7l
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6eoyaaogpz4bf6x6kipvmerpbm|76qugh5bey5gum7l
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|xsgossyphypyqpyz6vhnnix2t3|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6erx3mhciaxtcfpdbpklq6hwif|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6eospmff2w7mbclhid6gxnjq2r|m6rgrem34sednagr
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6exjuuhzp2aeu4vm3o76xt6adq|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6erx3uiizwmmhciaxtcfpdbpkf|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6exv6xsq36kbjlgf3b7fk4mzrv|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6eospmff2w77dclhid6gxnjq2r|m6rgrem34sednagr
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6erx3uiizwmmhciaxtclq6hwif|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6erx3uiciaxtcfpdbpklq6hwif|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6eospmff2w77dclhid6gxnjq2r|m6rgrem34sednagr
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6exyzl2c2cwrln4fs6gj4kvahg|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6exv6xslgf3qqd6zjb7fk4mzrv|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6eos77d3ggrmbclhid6gxnjq2r|m6rgrem34sednagr
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6erx3uiizwxtcfpdbpklq6hwif|None
Oct 26 12:00:00.000 [notice] HSDIR_REQUEST|6eppb73j2n5zk5bk7kc4yrqick|None
The format of these logs is HSDIR_REQUEST||
, the None will be seen for requests you cannot 'resolve’ for a client. On average you will be not be able most of the requests you receive.
If you have this running it means your Tor client setup is done, now you will have to start processing the logs.
onion-publisher
The onion-publisher simply tails the logfile specified in the torrc configuration file you created earlier. It supports commandline arguments to specify networking setup and which file to monitor:
user@tornode:~$ python onion-publisher.py -h
usage: onion-publisher.py [-h] [-l LISTEN] [-p PORT] [-n NOTICELOG]
optional arguments:
-h, --help show this help message and exit
-l LISTEN, --listen LISTEN address to listen on (default: 127.0.0.1)
-p PORT, --port PORT port to listen on (default: 5556)
-n NOTICELOG, --noticelog NOTICELOG notice logfile from the Tor client to tail (default: notice.log)
The tool has defaults for all the arguments that will make it publish locally and read the 'normal’ default tor logfile named notice.log
. If you plan to run everything locally (client, publisher, receiver & onionviewer) you can simply run the script without any arguments.
Once the tool is running, in this example with default settings, its output should start looking something like this:
user@tornode:~$ python onion-publisher.py
[+] ---- onion.watch ZMQ publisher started on tcp://127.0.0.1:5556 ----
[+] Tailing notice log 'notice.log'
[+] Sending new request: 1477483200000.000000|76qugh5bey5gum7l
This tool sets up a local ZMQ publisher to which the onion-receiver
tool can subscribe to start receiving the hidden service queries.
onion-receiver
The onion-receiver tool subscribes to the onion-publisher ZMQ node to receive all the published onion addresses. It parses these addresses and inserts it into a SQLite database with a first seen, last seen and the amount of times it was requested. Just like the publisher it has a set of commandline flags to control input and output:
user@tornode:~$ python onion-receiver.py -h
usage: onion-receiver.py [-h] [-a ADDRESS] [-p PORT] [-d DBFILE] [-t TITLE]
optional arguments:
-h, --help show this help message and exit
-a ADDRESS, --address ADDRESS address to connect on (default: 127.0.0.1)
-p PORT, --port PORT port to connect on (default: 5556)
-d DBFILE, --dbfile DBFILE SQLite database filename to save results in (default: onion.db)
-t TITLE, --title TITLE Sets a flag to grab page titles from hidden services (default: True)
The address and port flags are the ones set in the onion-publisher, as it connects to this. The database file flag is the output file for the sqlite database, it the file already exists it simply tries to update entries in this database. This database is what the onion viewer component uses to display the collected data.
The default settings will ensure it connects locally to the onion-publisher who has similar default settings. Running the tool without any flags (together with an already running publisher and Tor client) gives you the following output:
user@tornode:~$ python onion-receiver.py
[+] ---- onion.watch importer connecting to publisher at tcp://127.0.0.1:5556 ----
[+] ---- connected to onion publisher, awaiting messages ---
[+] Received new request: 1477483200000.000000|76qugh5bey5gum7l
[+] Obtaining page title for: 76qugh5bey5gum7l.onion
[+] Added 76qugh5bey5gum7l.onion
As you can see there is a log entry about obtaining a page title. We assume the hidden service is a web service (many are) and will grab the page title automatically. The way we do this is reallyreally hacky and ugly. We use torsocks to proxy wget over Tor to grab the whole index page of the hidden service which is then parsed using a regex to find the page title. This is done by spawning a seperate commandline process, the commandline command that is executed looks like this:
torsocks wget -qO- '.onion' | perl -l -0777 -ne 'print $1 if /\s*(.*?)\s*<\/title/si'
You can disable the automatic fetching of page titles by specifying the -t
or --title
tag as False
.
Onion Viewer
The onion viewer component is a web-app written using the Flask web development framework. It is a simple web GUI for the collected onion data. The SQLite database generated by the onion-receiver
can be viewed with this web-app. It can be configured from the commandline. If you specify no arguments it will use default settings which match those from the other components (only prerequisite is that the receiver’s output SQLite database is in the same directory):
user@tornode:~$ python onion-viewer.py -h
usage: onion-viewer.py [-h] [-l LISTEN] [-p PORT] [-d DATABASE]
optional arguments:
-h, --help show this help message and exit
-l LISTEN, --listen LISTEN address to listen on (default: 127.0.0.1)
-p PORT, --port PORT port to listen on (default: 8081)
-d DATABASE, --database DATABASE database file (default: onion.db)
When it is up and running (with some data in the database) it will look something like this:
Additionally if you click the information i
symbol you will be taken to a subview on only the onion address (this also shows the page title):
noticelog-import
This tool was added as an extra. It is similar to the onion-receiver tool but instead of listening over ZMQ to a publisher’s new onion addresses it will parse the Tor client notice log file. From this it generates a database identical to the onion-receiver utility. This tool allows you to sort of replay a notice log would you ever corrupt your database or have old notice logs you want to add to your data set. You run the tool from the commandline by specifying the correct settings as shown in the help:
user@tornode:~$ python noticelog-import.py -h
usage: noticelog-import.py [-h] [-n NOTICEFILE] [-y LOGYEAR] [-d DBFILE] [-t TITLE]
optional arguments:
-h, --help show this help message and exit
-n NOTICEFILE, --noticefile NOTICEFILE Tor notice log to parse (default: notice.log)
-y LOGYEAR, --logyear LOGYEAR Year the notice log was generated in (notice log itself doesnt specify this) (default: 2016)
-d DBFILE, --dbfile DBFILE SQLite database filename to save results in (default: onion.db)
-t TITLE, --title TITLE Sets a flag to grab page titles from hidden services (default: True)
One thing to keep in mind is that by default the Tor logging system does not specify a year in the log entries. For this reason the -y
or --logyear
flag was added in case your log is not from the year you are importing these logs (by default it just picks the current year).