Blog04: Misc — Assembly’s ABCs
Note: This blog will consider that you have basic knowledge of how a computer works, as well as some coding knowledge. Now, let’s get right into it.
If you have tried playing CTFs before, if you have dived into cybersecurity or if you have simply been learning how to code; you have surely come across something that looks a bit like this:
If you have no idea what this is, or if you know and think “there is no way I can understand it”. This blog is made just for you.
The image you just saw is a snippet of Assembly language code. Now you might be asking yourself…
What is Assembly Language?
Assembly language is a type of low-level programming language that is intended to communicate directly with a CPU (central processing unit).
In order to run an assembly code on a processor, it requires several important components. Let us go through some elementary and essential ones.
The arithmetic logic unit (ALU)
This is the operational part of the system. The only one capable of making calculations.
The operations of an ALU are: access to memory, additions/subtractions, multiplications/divisions and all the logical operations relating to bits (AND operation, OR operation, shifts, rotations, complements, extractions, etc. ), all of these operations are typically only performed on integers (8 bits, 16 bits or 32 bits), which limits the calculation capacities.
Generally the ALU does not communicate directly with the memory to do its calculations. To gain speed and efficiency, it is connected to registers.
Registers come in different types depending on their use, we can find:
- a register that contains the current address of the current program. During the execution of the code, it modifies itself to fetch the next instruction to be carried out by the ALU. It is very often called PC (for Program Counter).
- a register containing an address accessible in RAM to store temporary information. This memory space is the system stack and this register is most often called Stack Pointer (SP).
- registers that can be used for almost any purpose (EAX, EBX,….).
Read-only memory (ROM or FLASH)
These memories have above all the capacity to retain their contents when the power supply is cut, this memory will therefore naturally contain the code to be executed.
Random Access Memory (RAM)
This type of memory is readable and writeable by the processor. All the variables you create in your program will naturally be stored in these elements. Unlike read-only memory, when these memory boxes lose power, they lose their contents.
It is an area of the RAM memory predefined at the start by the program. As mentioned above, this zone is identified (pointed) by a special register (SP). This pointer evolves by writes/reads on the principle of a LIFO (Last In First Out) structure.
The principle of this data structure is identical to that of a pile of plates in a cupboard. When we save a piece of data (in writing), we deposit a plate (necessarily at the top of the stack). When we need a data (reading), we take the one above. People can try to do otherwise but it is risky.
The last data deposited is therefore the first data retrieved, hence the name.
The presence of a stack is essential to a processor. For example, when a call to a function arrives at the processor, in this case the processor must stop its current calculations and branch into the part of the function’s code that will do the things required of the processor to respond to this call. In order to not lose his calculations in progress and to know where to resume them, he must store a whole series of information that he can restore when resuming.
The place where it stores them is the system stack. It writes its essential data in a certain order to the system stack and retrieves it in the reverse order using the mechanism of the LIFO structure.
Now we have everything we need to understand Assembly language. Now, we will be taking a look at one of the most commonly used instruction set architectures, the x86 assembly language (aka the x86 instruction set).
The x86 assembly language
There are several processor architectures (x86, arm, avr…) and each architecture it has its specific assembly language. In this blog we will focus on the x86 architecture (32bit), for other architectures, there is a difference in the names of registers and instructions, but they generally follow the same logic.
General Purpose Registers
These are 32 bit registers that can be used for almost any purpose, but they are usually used for specific purposes such as:
- EAX — Accumulator Register. Primary purpose: Math calculations
- EBX — Base Address Register. Primary purpose: Indirectly access memory through a base address.
- ECX — Counter Register. Primary purpose: Used in counting and looping.
- EDX — Data Register. Primary purpose: well… storing data. Yep, that's about it :)
Each of these 32 bit registers has two parts. The High order word and Low order word. The high order word is the upper 16 bits. The low order word is the lower 16 bits.
The upper 16 bits does not have a special name associated with them. However, the lower 16 bits do. These names have an appended ‘H’ (for higher 8 bits in low word), or an appended ‘L’ for lower 8 bits. For example, in EAX, we have:
The copy instruction:
mov dst, src — copies a value from one location to another. However, it is not possible to copy a value from memory to memory.
mov eax, ebx ;copy the value in ebx into eax
mov byte ptr [var], 5 ;store the value 5 into the byte at location var
Arithmetic and logic instructions:
add — Integer Addition
The add instruction adds together its two operands, storing the result in its first operand. Note, whereas both operands may be registers, at most one operand may be a memory location.
add eax, 10 ;EAX ← EAX + 10
add BYTE PTR [var], 10 ;add 10 to the single byte stored at memory address var
sub — Integer Subtraction
The sub instruction stores in the value of its first operand the result of subtracting the value of its second operand from the value of its first operand. As with add.
sub al, ah ;AL ← AL — AH
sub eax, 216 ;subtract 216 from the value stored in EAX
inc, dec — Increment, Decrement
The inc instruction increments the contents of its operand by one. The dec instruction decrements the contents of its operand by one.
dec eax ;subtract one from the contents of EAX.
inc DWORD PTR [var] ;add one to the 32-bit integer stored at location var
Control Flow Instructions:
cmp — compare:
Compare the values of the two specified operands, This instruction is equivalent to the sub instruction, except the result of the subtraction is discarded instead of replacing the first operand.
cmp eax,0x7 ; compare eax with 7, performing the subtraction eax-7 without storing the result
jmp — Jump:
Transfers the program control flow to the instruction at the memory location indicated by the operand. Typically used after a cmp instruction
jmp start ;Jump to the instruction labeled start.
We use the notation <label> to refer to labeled locations in the program text. Labels can be inserted anywhere in x86 assembly code text by entering a label name followed by a colon. For example:
In the second line we create a label named start that contains two instructions.
The table below summarizes the different jmps that can be used:
PUSH and POP — Stack manipulation:
When the processor performs a push eax, the pointer SP is decremented by 4 addresses then the content of the 32 bits of eax is deposited in these 4 addresses. Symmetrically pop eax reads the 32 bits it sees at the address contained in SP then increments the SP pointer by 4.
After the two PUSHs, the stack pointer SP is reduced by 8. The accesses are always made in 32 bits, that is 4 consecutive addresses. The writing is done after the decrementation of the pointer, the previously pointed value (xxxxxxxx) is therefore still present in the structure and the pointer now identifies the last value written, namely here the content of ecx.
The two POPs aren’t really consistent, but that’s done on purpose in this example. Consistency requires that we restore a value in the register that has been saved, but nothing prohibits doing this restoration in another register. So in this example ebx is assigned with the old value of ecx. On the other hand the second POP makes it possible to find the preceding value of eax. Note that ecx is represented with an unknown value because it is assumed that its value may have changed between PUSH and POP. Following these two reads (which are done before the pointer moves), the pointer has returned 8 addresses higher. The data is not actually erased, it is just the pointer that has been moved. Only a next write can erase this data.
This blog has introduced the basics of Assembly in a very simplified way. In future blogs, we will be getting into more advanced aspects of Assembly. But if you managed to understand everything we mentioned , you are ready to crack your first program.
Facebook: INSEC Ensias
Instagram: INSEC Ensias
Linkedin: INSEC Ensias
Youtube: INSEC Club
Don’t forget to drop us a follow on social media to stay up to date with everything the club is doing. Looking forward to sharing more knowledge with all the readers and we welcome your feedback at email@example.com
Editors: berradAtay, F3nn3C