Saturday, September 24, 2011

The basics of Computer Architecture for Reverse Engineering

Hello friends. I have been busy these days with some Metasploit stuff so I was not able to bring my continuation material for Reverse Engineering and Assembly. IN my previous post I provided a quick material for learning the basics of Assembly language. The pdf is a useful handbook and will help you for quick reference. Now I will continue the tutorial to next step. In the previous tutorial- Fast draft Assembly , I focused on Assembly basics. Here I wil throw some light on some more concepts which I found useful during my coarse of learning reverse engineering.

So in case you have missed my previous tutorial then please go back, download the pdf and read it once. It will give you a quick idea about assembly. Dont worry if you dont understand it completely. Assembly is too big to learn so if you want to dig deeper then you can refer a good book. Here I will cover those things that you will be requiring for learning reverse engineering.

So lets start with the second part.

Some basics about Computer Architecture

What the 'F' about x86 and x64 ?

Interesting question - x86 is a very old technology that started with the 8086 family of processors. It has now evolved into the x86-32 version which is the most common version and its successor the x86-64 or more commonly known as x64.

the architecture was called x86 to show that they all ended in the number 86, even though it's 32-bit (definitely not 86 bit!). x64 is a contraction of x86-64.  for example, Core 2 Duo processors actually use a 64-bit version of the older x86 architecture (and they are backwards compatible - notice how Windows 64-bit can run 32-bit programs without any problem? The processors support this natively!). x64 is just where people are too lazy to write x86-64.

The fuzz about 32 bit and 64 bit

You must have heard your friends or colleagues shouting about this. "mine is 64 bit , yours is 32 bit" . But what exactly means. Again a good question(by me) . Lets solve the puzzle  today.
The register is a small amount of storage used by the CPU where the CPU keeps the data it needs to access the quickest in order for optimum computer performance. The bit designation refers to the width of the register, thus a 64-bit register can hold more data than a 32-bit register which in turn holds more than 16-bit and 8-bit registers. The more ample the space in the CPU’s register system the more it can handle, especially in terms of utilizing system memory.
The major difference lies in utilizing the system memory. A CPU with  32 bit register can have a ceiling of 2^32 (2 raised to the power of 32) addresses within the register. Hence it cannot support more than 4 GB of system memory or RAM. Whereas in the case of 64 bit register, it can have 2^64 addresses within the register.
Hope you got answers to some questions you might have never thought about but used them frequently.

Understanding the CPU Registers

Assembly language is a low level or simply called machine language made up of machineinstructions. Assembly language is specific to processor architecture example different for x86 architecture than for SPARC architecture. Assembly language consist of assembly instructions andCPU registers.

Registers are small segments of memory inside CPU that are used for storing temporary data. Some registers have specific functions, others are just use for some general data storage. I am considering that you all are using x86 machines. There are two types of processors32 bit and 64 bit processors. In a 32 bit processor, each register can hold 32 bits of data. On the other hand 64 bit register can hold 64 bit data. We will mostly focus on 32 bit machines in our tutorial.
Registers play a key role in reverse engineering process so an overview of registers is necessary. We will look at some general purpose registers which are mostly used in reverse engineering.

EAX - This is the accumulator register which is used to store the results of calculations. The word E stands for "extended". I will explain you another concept related to it later on in the tutorial.

EDX - The data register is the an extension to the accumulator. It is most useful for storing data related to the accumulator's current calculation.

ECX - The count register is the universal loop counter. It functions similar to a variable that we use to set up our loop counter value.

EDI - EDI points to the location where the result of data operation is stored, or the destination index. Every loop must store its result somewhere, and the destination index points to that place. With a single-byte STOS instruction to write data out of the accumulator, this register makes data operations much more size-efficient.

ESI - In loops that process data, the source index holds the location of the input data stream. Like the destination index, EDI had a convenient one-byte instruction for loading data out of memory into the accumulator. ESI register is the source index for data operation and holds the location of the input data stream.

ESP - ESP is the sacred stack pointer. With the important PUSH, POP, CALL, and RET instructions requiring it's value, there is never a good reason to use the stack pointer for anything else.

EBP - In functions that store parameters or variables on the stack, the base pointer holds the location of the current stack frame. In other situations, however, EBP is a free data-storage register.

EBX - In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.EBX is the only register that was not designed for anything specific. It can be used for extra storage.

The 'E' at the beginning of each register name stands for Extended. When a register is referred to by its extended name, it indicates that all 32 bits of the register are being addressed.  An interesting thing about registers is that they can be broken down into smaller subsets of themselves; the first sixteen bits of each register can be referenced by simply removing the 'E' from the name. For example, if you wanted to only manipulate the first sixteen bits of the EAX register, you would refer to it as the AX register. Additionally, registers AX through DX can be further broken down into two eight bit parts. So, if you wanted to manipulate only the first eight bits (bits 0-7) of the AX register, you would refer to the register as AL; if you wanted to manipulate the last eight bits (bits 8-15) of the AX register, you would refer to the register as AH ('L' standing for Low and 'H' standing for High).

Understanding Memory and Stacks

There are three main sections of memory:

1. Stack Section - Where the stack is located, stores local variables and function arguments.

2. Data Section - Where the heap is located, stores static and dynamic variables.

3. Code Section - Where the actual program instructions are located.

The stack section starts at the high memory addresses and grows downwards, towards the lower memory addresses; conversely, the data section (heap) starts at the lower memory addresses and grows upwards, towards the high memory addresses. Therefore, the stack and the heap grow towards each other as more variables are placed in each of those sections. 

High Memory Addresses (0xFFFFFFFF)
---------------------- <-----Bottom of the stack 
|                          | 
|                          |   | 
|         Stack        |   | Stack grows down 
|                          |   v 
|                          | 
|---------------------| <----Top of the stack (ESP points here) 
|                          | 
|                          | 
|                          | 
|                          | 
|                          | 
|---------------------|  <----Top of the heap 
|                          | 
|                          |    ^ 
|       Heap          |     |   Heap grows up 
|                          |    | 
|                          | 
|---------------------| <-----Bottom of the heap 
|                          | 
|    Instructions    | 
|                          | 
|                          | 
Low Memory Addresses (0x00000000)  

So now let us relate these concepts with Assembly now. We will now analyse how we can actually use these concepts in generating machine code. 
Let us go back to our previous post in which we studied some concepts about Assembly language. Assembly language is based on machine instructions so proper knowledge of Computer architecture is very essential to understand the overall operation.
So let us now relate what we studied in our previous post and in this one. Let us see some important assebly instructions.

Some Important Assembly instructions

push eax
Pushes the value stored in EAX onto the stack
pop eax
Pops a value off of the stack and stores it in EAX
call 0x08abcdef
Calls a function located at 0x08abcdef
mov eax,0x5
Moves the value of 5 into the EAX register
sub eax,0x4
Subtracts 4 from the value in the EAX register
add eax,0x1
Adds 1 to the value in the EAX register
inc eax
Increases the value stored in EAX by one
dec eax
Decreases the value stored in EAX by one
cmp eax,edx
Compare values in EAX and EDX; if equal set the zero flag* to 1
test eax,edx
Performs an AND operation on the values in EAX and EDX; if the result is zero, sets the zero flag to 1
jmp 0x08abcde
Jump to the instruction located at 0x08abcde
jnz 0x08ffff01
Jump if the zero flag is set to 1
jne 0x08ffff01
Jump to 0x08ffff01 if a comparison is not equal
and eax,ebx
Performs a bit wise AND operation on the values stored in EAX and EBX; the result is saved in EAX
or eax,ebx
Performs a bit wise OR operation on the values stored in EAX and EBX; the result is saved in EAX
xor eax,eax
Performs a bit wise XOR operation on the values stored in EAX and EBX; the result is saved in EAX
Remove data from the stack before returning
Return to a parent function
No operation (a 'do nothing' instruction)

If you have understood the things we have learned so far then you are half way through. Rest involves how much you can explore yourself. All you need is a debugger. I prefer using Olly dbg but you can choose any debugger you are comfortable with. Try to open any exe file in the debugger and analyse the codes. You will find several things which you have learned here and help you understand things in detail. In my next po9st I will show you how we can Reverse engineer different software's using the knowledge we have learned so far. So stay tuned. In case you have any doubts or queries you can comment here.
Happy Hacking!!



  1. i like this verry much ... good
    See check spell mistake in XOR XOR eax ebx.....

  2. This short article has been surely an incredibly excellent read. We had been impressed with the items on this submit. I cannot hold out to see just what else you have up for grabs for us.