May 2002 Wildlist
Unfortunately, while there are lots of charts and distribution statistics available for Win32 viruses, the same cannot be said about Unix viruses. Actually, at the time of writing, there is no reliable (and constantly updated) source of information regarding the distribution of Linux/FreeBSD malware, not to mention other Unix flavors which are less appealing to virus writers.
Given the relative lack of information from this point of view, it would be very interesting to compare the spreading of Linux and FreeBSD malware with their Win32 counterparts as a means of evaluating the security and "virus-proofness" of these platforms. Such data may also help to develop a model for predicting the spreading of future Unix malware.
But how exactly can this be done? For Win32 viruses, the large amount of AV software deployed at workstation and server level provides a reliable source of reports, giving antivirus companies a simple way to create statistics on viral spread. Unfortunately, the same is not true for Unix viruses. First of all, these are not commonly distributed as infected e-mails, so gateway reports are almost non-existent. Similarly, the Unix versions of AV products rarely have remote reporting capabilities, thus making the source of Unix infections scarce. So, again, how does one track the proliferation of Unix viruses?
Well, two interesting examples of this recent chain of worms and viruses - FreeBSD/Scalper and Linux/Slapper - may offer some insights. Since both exploit a popular service (HTTP) and both probe random machines on the Internet with specific packets of data, it would be possible to use a simple honeypot to build a list of machines presumably infected by Slapper or Scalper, and analyze their distribution and proliferation.
Unfortunately, a single honeypot will not provide accurate results, as attacks of both Scalper and Slapper are not fully random, but instead go on predefined patterns. However, by using a set of honeypots spread around the world, the results will provide a broader view of the number of infected machines and the geographical distribution of attacks. Therefore, in order to build any reliable statistic regarding the distribution of these two viruses, one will need access to the data collected by a reasonable large honeypot network, covering most areas of the world.
Here's where the Smallpot Project comes in handy. The Smallpot Project is a generic honeypot that was initially designed to track the spreading of the CodeRed worm on Win32 systems, slowly grown into a means of monitoring almost any kind of Internet malware (tracking e-mail worms is one thing not covered by the Smallpot project) or even hacking attempts. Smallpot nodes are distributed all around the world, in countries such as the United States, South Korea, Romania, Germany, the Philippines, and Taiwan. Moreover, since all Smallpot reports are collected on a daily basis on a central node, it is relatively easy to process and analyze all the logs.
Thus, by combining the results provided by the Smallpot network over the past year, we can attempt to track the spread of Slapper and Scalper since their initial release. Building a geographic distribution map will also provide a simple view of the countries and areas that have been most affected by these two worms, allowing us to compare them to popular Win32 worms such as CodeRed, Nimda or Spida, drawing a parallel between Win32 and Unix malware, and for comparing their general spreading behavior.
Scalper and Slapper
Our two subjects of interest, Scalper and Slapper, rely on different vulnerabilities to replicate, but probe potential victims in a similar manner. Both viruses will first send the following HTTP request on port 80, and then scan the reply for various known Apache versions. Here's how the initial HTTP request string looks like:
GET / HTTP/1.1\r\n\r\n
If the probed host looks vulnerable, Scalper will send a 32314-byte packet that is a common buffer overflow exploit. Likewise, Slapper connects on port 443 (SSL) and tries to exploit a bug present in OpenSSL versions older than 0.9.6e and 0.9.7-beta.
Just a small note regarding the initial probe used by Slapper and Scalper: a host can be hit by port 80 requests coming from a variety of tools, browsers, agents, and viruses. The tools usually include security scanners and HTTP exploits used by hackers to find and compromise machines for use as DDoS bots, storage space, or stealing data. Fortunately, the request used by Slapper and Scalper on port 80 is not one of these, and to my knowledge, is not used in any security scanner or stand-alone exploit. The same applies for browsers, which usually identify themselves with tons of other HTTP parameters and information requests, while for viruses, (the major offender here obviously being Nimda) they don't send such plain HTTP/1.1 requests either. Because of that, one can assume within a reasonable error threshold that all the requests of the form mentioned above were due either to Slapper or Scalper. Of course, we can further refine the process by taking the second packet for Scalper and the port 443 data for Slapper, but unfortunately, because of time-outs and network errors, they do not always arrive after the initial HTTP request.
The Smallpot Project
In the early July, 2002, the Net witnessed the emerge of the first fileless automotive Windows viral code sequence, now known as CodeRed. Due to its "fileless" nature, CodeRed brought at least two new problems for the antivirus developers. Firstly, of course, detection, which requires more than the usual file scan methods. Secondly, it created the need to implement tools to capture and study the movements of such things, directly on the Internet.
As has already been mentioned, Smallpot, short for "Small Honeypot", was designed with exactly the latter purpose in mind: to simplify the collection and classification of Internet malware, as well as tracking their spread and studying hacker attacks. It should be noted that besides HTTP, Smallpot also tries to fake various other Internet services such as FTP, POP3, SMTP, SUN-RPC, Telnet, UPnP, MS-SQL, SSH and backdoor servers such as NetBus or SubSeven. However, in this discussion of Slapper and Scalper, we will only take into consideration the HTTP packets, which should provide an acceptable degree of precision. It should also be mentioned that the data collected by Smallpot on Slapper and Scalper begins in September 2002, when the first reports (and samples) started to arrive.
A Problem: Mapping IP Addresses to Countries
The IP addresses around the world hitting Smallpot nodes rarely have a reverse DNS lookup entry. Because of that, constructing a country-based statistic on the spread of Slapper/Scalper is no easy task. To further complicate the problem, at the time of writing there is no absolute authority on the Internet that can be automatically queried for the geographical position of a machine with a given IP address. Even worse, sometimes the reverse DNS lookup entry for an IP address will not tell much on the location of the attacking machine. For instance, a machine with a name ending in .com can be anywhere.
Fortunately, there are a couple of solutions that solve this problem by maintaining a database of known networks and associated country codes. Amongst these services, MaxMind's GeoIP (*) and JufSoft's ActiveTarget (*) seem to be easiest to automate and operate with a large list of IP addresses. Unfortunately, because they operate on static lists and perform no further checks on the exactness of the location guess, both tools have a small error factor in the precision of their reports. Still, during my experimental tests, the error proved small enough not to affect the results by any relevant measure, therefore, the final results and figures included in this article should reflect the reality with a degree of precision of about 97%.
Results
The following list of results has been obtained by parsing all Smallpot reports caused by Slapper and Scalper between September 2002 and January 2003 (the two-letter country codes used in the following tables refer to the standard English ISO 3166 country codes):