, SecurityFocus 2006-06-08
The reliance on humans for analyzing malware bothers Thomas Dullien.
The reverse engineer--better known amongst security researchers by his nom de plume, Halvar Flake-- created an automated system for classifying software into groups, a process for which he believes machines are much better suited. Research using the system has underscored the sometimes-arbitrary decisions humans make in classifying malicious programs, he said. Among other anomalies, he found that Sasser.D has only a 69 percent correlation to previous members of the Sasser family, while two examples of bot software, Gobot and Ghostbot, are more similar.
"It's like putting donkeys and bunnies in the same class because they both have long ears," Dullien, the founder and CEO of reverse-engineering tool maker Sabre Security, said in a recent interview.
The current problems with classifying and naming viruses are among the reasons that automated classification technology has once again become a focus of research. The plethora of names for specific malicious programs has caused confusion amongst consumers, despite a project that seeks to provide guidance, if not to consumers, to software analysts and incident responders. In January, when a new computer virus appeared on the Internet, antivirus companies rushed to issue alerts and inundated consumers with a confusing array of names: Blackmal, Nyxem, MyWife, KamaSutra, Blackworm, Tearec and Worm_Grew all describe the same mass-mailing computer virus.
Several research projects hope to improve upon that record.
Last month, at the annual conference of the European Institute for Computer Anti-Virus Research (EICAR), Microsoft released early results of its development of a system to automate classification of malicious software based on the actions performed by the code at runtime.
"A significant challenge we have today is the large number of active malware samples, totaling on the order of tens of thousands, and increasing rapidly," Tony Lee, a virus researcher at Microsoft, said in a recent blog posting following the conference. "It has become apparent to us that the traditional manual analysis process is not adequate in dealing with malware of this order of magnitude, and that we should seek automation technologies to aid human analysts."
The researchers modeled a piece of malicious software as the series of actions that the software takes at the operating system level. Referred to as "events" in a paper written by Lee and anti-malware program team manager Jigar Mody, the actions can include data copying, changing registry keys and opening network connections.
The researchers then trained a recognition engine using an adaptive clustering algorithm--similar to self-organizing maps--and classified a previously unseen subset of malware using the trained system. Using more clusters typically resulted in better classification. When the software samples were classified based on 100 events, accuracy fell below 80 percent, while classification based on 500 and 1000 events typically has accuracy rates above 90 percent.