2002-10-17
|
Introduction Polymorphic viruses change their code in fundamental ways with each replication in order to avoid detection by anti-virus scanners. This may mean changing the encryption routine, the sequence of instructions, or other such changes in the behaviour of the virus. This article is the first of a two-part series that will offer a brief overview of the use of polymorphic strategies in macro viruses. This installment will focus on some early examples of polymorphic techniques. The first question to answer when it comes to macro viruses is whether there are any of them that can be qualified as polymorphic. Most macro viruses are very simple and would not be polymorphic, even if VBA were a compiled language. However, there are several, more complicated, encrypted viruses, some of which even have polymorphic encryptors. Early Ridiculo-morphs The first baby-step to polymorphism was Outlaw. Actually, it should not be called a polymorph at all. The virus code itself did not change a bit, only the name of the macro carrying it changed. The only reason Outlaw was considered to be a polymorph (not that deserved to be) was that in the early months right after the appearance of Concept there were several (mostly WordBasic-based) macro virus protection products that based their detection only on macro names. As these products were surpassed by OLE2 parsing scanning routines, this group of viruses sank back to non-morphism. Junk Instruction Inserters Most viruses insert junk code into the virus source. In the simplest case the inserted lines contain random comments. The WM97.Class family is a good example of this technique. In these viruses, every other line of the macrocode is filled with random comment, usually combining the user name, the date, the installed printer, etc. One infected sample looks like the following:
while another looks like:
In response to this, virus scanners developed code normalization techniques that remove the comment lines from the analyzed code. An old virus, FutureNot consists of two modules, AutoOpen and FileSave. The latter, which is responsible for infecting further documents, is gathered during the infection of the normal template and it is not present in the infected document. The original AutoOpen macro is saved into the global template with a randomly selected 5-character name. Additionally, the virus inserts to a random location within the code the text "1 Gen". This serves the role of the generation counter - the number of such comments equals the number of infected computers along the current infection chain. Line by line, the virus gathers the FileSaveAs macro. During this process the virus slightly mutates its code. It inserts two random numbers as comments, and a couple of extra line feeds. The resulting macro looks like the following fragment:
As a solution to overcome these simple viruses, scanners introduced several code normalization measures. The comments are removed from the code, as well as the white-space characters and the empty lines. After this code transformation, the replicas of these types of polymorphic viruses will look exactly the same. Thus the identification is the same as with any other macro virus, even static CRC can be used for that. However, these code normalization measures are not effective in the case of junk-code inserters. One example of this is Polymac.A, also known as Chydow.A, which also inserts nested junk code program structures, as illustrated below, with the meaningful instructions in green:
In addition to inserting junk code intermittently, this virus changes the capitalization of the letters in the crucial strings like InsertLines or VBComponents. The latter effect is usually already taken into account in the source code normalizer procedures. This virus is quite successful in changing itself, as the replicas are quite different from each other. Traditional code normalization techniques are not applicable in this case; however, it is still possible to find some good scan strings, otherwise advanced scanning techniques can be used in the Polymac example. First, the code has to be precompiled, in which stage the VBA/application object variables are separated from the program variables. Both have to be replaced with variable tokens: this simple step will eliminate all the variable name changer polymorphs (which will be discussed below). After that the code flow has to be analyzed. In this process the basic code structures (conditionals, loops, etc.) have to be identified. If we find that within a structure all variables are internal - meaning that no document/VBA objects are accessed and the used variables are not used in other parts of the code - then that code fragment has to be declared as an intrinsic junk code and can be eliminated. This will eliminate the macro viruses that insert random code lines or structures into the virus code. If we remove from the quoted code fragment the intrinsic junk code, only the following remains:
Which is invariant enough to CRC it. The speed of the indifferent code removal procedure is largely dependent on how far should it look. It may be very slow if long structures are processed, but junk code may remain if only short structures are handled. Code Collectors Viruses can randomly gather not only junk code but also parts of the meaningful virus code. This type of virus is known as a code collector. Hope.AF is an example of a code collector. The virus propagation code is gathered randomly upon infection. The trick of the virus is that it uses equivalent alternative representations of the object variables in the code. For example, the ActiveDocument object reference can appear as any of the following:
The replication picks one of them randomly, thus gathering lines such as:
Or alternatively:
The vast majority of the virus code, the alternative variable definitions, and the code generation procedure is invariant, while the actually mutated code is relatively small compared to the fixed code. There is no way to transfer the code into invariant format (except for the replacement with normalized object references, which requires precompilation). However, the invariant part of the code is long enough to use CRC detection (though it actually does not contain the viral functionality code). Static Encryption Some macro viruses use static encryption in which most of the virus code is encrypted with a simple algorithm (XOR or Shift). The encryption key may change but it does not make much difference. One example of this is Antisocial.F, in which the encrypted code, along with the encryption key, is stored as a comment in the code. The encryption routine therefore is the same in all replicated samples. Obviously, since the comments are not taken into account, the remaining code is exactly the same in all of the samples. Usual code normalization techniques, which reject the comment lines, will generate an invariant form of the code, thus CRC detection is possible. I should note that in this case the CRC is calculated over the decryptor and not the actual virus code. Although it is not a good practice in general, there is no easy way to get to the actual virus code without some emulation of the macro code, or a combined algorithmic detection that first detects the decryptor, then decodes the remaining code and calculates the CRC over the decrypted virus boy. While this is possible, it is time consuming and, in practically all cases, the decryptor-based detection is sufficient. Variable Name Changers Variable name changers were another interesting early attempts at creating a polymorphic virus. The best pure examples of this type are the members of the IIS family. For instance, IIS.I uses the variable name changing method with some interesting twists. The author's intent was to change each variable name that is used in the code but, as usual, some of those were forgotten. The virus creates variable names that use characters of high ASCII codes (from 130 to 204). The resulting code is rather difficult to read, as the following code snippet illustrates:
In the preparation stage, the virus collects the virus code from the Flitnic module in the global template into a temporary string buffer. It then collects the variable names to be modified into a variable name array. It contains both the old names of the variables and the new names acquired during the mutation process. The variable names are collected from the "variable pool" located in the middle of the virus code. It is just a series of commented lines that contain only the changeable variable names preceded by a single After collecting all the variable names, the virus mutates them. The new names will have variable length (randomly selected from 2 to 22) and will be built of randomly selected characters with ASCII codes between 130 and 204. The new names are stored in the variable name array too. Unlike most of the polymorphic macro viruses, IIS.I makes some checks for variable name conflicts. It searches the variable name array to find two basic types of errors (both could lead to errors):
If any of these mismatches are found, the virus solves the problem by generating a new name for the variable with higher index in the variable name array. This new name uses the same character range but it is only one-character long. As all variable names created in the first round of the polymorphic generation are at least 2 characters long, this should not lead to any new problems. After the consistency of the new variable names is checked, the virus searches through the buffer containing the virus code in the memory. It processes the code line-by-line. Each line is broken into tokens. The delimiters of the tokens are characters with ASCII code below 65 or the end of the line. If any token matches one of the old variable names, the virus replaces it with the new name of the same variable and continues the processing. The polymorphic nature of this virus depends a great deal on whether the virus scanner uses source code or p-code representation for detection. If the latter is used, then the replicas are not much different; consequently the virus is not much of a polymorph. Nevertheless, it has to be dealt with for the sake of those that use source code representation. However, at the source code level it is polymorphic enough to cause headaches for the virus scanners that base their detection on the macro source code. The appropriate solution for the detection of these type of viruses would be a pre-compilation of the code, which could determine which tokens are variables, which are VBA commands, and which are object references. The variables would be replaced by variable tokens. The above code would look like:
Is that enough? Are we ready now? Not exactly. It is not guaranteed that the virus will add itself to an empty code module. In fact, there are parasitic viruses that attach themselves to the end of the existing module, and add only a call to the virus code in the macro - much like Win32 EPO viruses. In that case the variable token Conclusion This concludes the first part of this two-part series examining polymorph viruses. This installment has looked at some early examples of polymorphism, along with some of the early polymorphic techniques. The next article in this series will discuss the first serious polymorph macro viruses, as well as the evolution of viruses into true polymorphs and, ultimately, metamorphic viruses. |