2005-08-04
Article continued from Page 1
Operating system differences
In Windows 2000 (and other NT based systems except XP and newer) no SYSENTER instruction is used. However, in Windows XP the "int 2eh" (our old way) was replaced by SYSENTER instruction. The following schema shows the syscall implementation for Windows 2000:
MOV EAX, SyscallNumber ; requested syscall number
LEA EDX, [ESP+4] ; EDX = params...
INT 2Eh ; throw the execution to the KM handler
RET 4*NUMBER_OF_PARAMS ; returnWe know already the Windows XP way, however here is the one I'm using in shellcode:
push fn ; push syscall number
pop eax ; EAX = syscall number
push eax ; this one makes no diff
call b ; put caller address on stack
b: add [esp],(offset r - offset b) ; normalize stack
mov edx, esp ; EDX = stack
db 0fh, 34h ; SYSENTER instruction
r: add esp, (param*4) ; normalize stackIt seems that SYSENTER was first introduced in the Intel Pentium II processors. This author is not certain but one can guess that SYSENTER is not supported by Athlon processors. To determine if the instruction is available on a particular processor, use the CPUID instruction together with a check for the SEP flag and some specific family/model/stepping checks. Here is the example how Intel does this type of checking:
IF (CPUID SEP bit is set)
THEN IF (Family = 6) AND (Model < 3) AND (Stepping < 3)
THEN
SYSENTER/SYSEXIT_NOT_SUPPORTED
FI;
ELSE SYSENTER/SYSEXIT_SUPPORTED
FI;But of course this is not the only difference in various Windows operating systems -- system call numbers also change between the various Windows versions, as the following table shows:
| Syscall symbol | NtAddAtom | NtAdjustPrivilegesToken | NtAlertThread | |
| Windows NT | SP 3 | 0x3 | 0x5 | 0x7 |
| SP 4 | 0x3 | 0x5 | 0x7 | |
| SP 5 | 0x3 | 0x5 | 0x7 | |
| SP 6 | 0x3 | 0x5 | 0x7 | |
| Windows 2000 | SP 0 | 0x8 | 0xa | 0xc |
| SP 1 | 0x8 | 0xa | 0xc | |
| SP 2 | 0x8 | 0xa | 0xc | |
| SP 3 | 0x8 | 0xa | 0xc | |
| SP 4 | 0x8 | 0xa | 0xc | |
| Windows XP | SP 0 | 0x8 | 0xb | 0xd |
| SP 1 | 0x8 | 0xb | 0xd | |
| SP 2 | 0x8 | 0xb | 0xd | |
| Windows 2003 Server | SP 0 | 0x8 | 0xc | 0xe |
| SP 1 | 0x8 | 0xc | 0xe | |
The syscall number tables are available on the Internet. The reader is advised to look at the one from metasploit.com, however other sources may also be good.
Syscall shellcode advantages
There are several advantages when using this approach:
- Shellcode doesn't require the use of APIs, due to the fact that it doesn't have to locate API addresses (there is no kernel address finding/no export section parsing/import section parsing, and so on). Due to this "feature" it is able to bypass most of ring3 "buffer overflow prevention systems." Such protection mechanisms usually don't stop the buffer overflow attacks in itself, but instead they mainly hook the most used APIs and check the caller address. Here, such checking would be of no use.
- Since you are sending the requests directly to the kernel handler and you "jump over" all of those instructions from the Win32 Subsystem, the speed of execution highly increases (although in the era of modern processors, who truly cares about speed of shellcode?).
Syscall shellcode disadvantages
There are also several disadvantages to this approach:
- Size -- this is the main disadvantage. Becase we are "jumping over" all of those subsytem wrappers, we need to code our own ones, and this increases the size of shellcode.
- Compability -- as has been written above, there exist various implementations from "int 2eh" to "sysenter," depending on the operating system version. Also, the system call number changes together with each Windows version (for more see the References section).
The ideas
The shellcode at the end of this article dumps a file and then writes an registry key. This action causes execution of the dropped file after the computer reboots. Many of you may ask me why we would not to execute the file directly without storing the registry key. Well, executing win32 application by syscalls is not a simple task -- don't think that NtCreateProcess will do the job; let's look at what CreateProcess API must do to execute an application:
- Open the image file (.exe) to be executed inside the process.
- Create the Windows executive process object.
- Create the initial thread (stack, context, and Windows executive thread object).
- Notify the Win32 subsystem of the new process so that it can set up for the new process and thread.
- Start execution of the initial thread (unless the CREATE_SUSPENDED flag was specified).
- In the context of the new process and thread, complete the initialization of the address space (such as load required DLLs) and begin execution of the program.
Therefore, it is clearly much easier and quicker to use the registry method. The following shellcode that concludes this article drops a sample MessageBox application (mainly, a PE struct which is big itself so the size increases) however there are plenty more solutions. Attacker can drop some script file (batch/vbs/others) and download a trojan/backdoor file from an ftp server, or just execute various commands such as: "net user /add piotr test123" & "net localgroup /add administrators piotr". This idea should help the reader with optimizations, now enjoy the proof of concept shellcode.
