The Assembly language (ASM), is the lowest-level programming language one can find. It is very close to the CPU code instructions, and therefore, there is a multitude of assembly languages, each designed for specific computer architecture. ASM is used in various situations, such as for performance-sensitive programs, system's boot code, or reverse-engineer programs.
In today's article, we will study NASM ("Netwide Assembler"), which is an ASM language for Intel x86 architecture. It can be used to write 16-bit, 32-bit, and 64-bit programs and is considered one of the most popular assemblers for Linux (although it can also be used with Mac and Windows).
This article will go through most of the things you need to know to be able to start coding in NASM or to understand NASM code if you want to reverse engineer programs. Although some things will change depending on the operating system for which you are coding, the general concepts stay the same so this tutorial can still be of use to you, even if you are not planning to use Linux.
As you can guess, assembly is quite a long and complex topic, so it is not possible to cover 100% of the things. If you want to know more about NASM, you can refer to this documentation.
NASM Program Structure
First, let's study an actual piece of code written in NASM. The snipped below presents a "Hello World" code, and will be used to introduce some main concepts, and how NASM code is structured. Some of the notions will be discussed more in-depth in the following parts.
First, we can observe that the code is divided into sections. There are only two on our example (
.data), but there are a couple more that you can use (note that this will change depending on the target OS: for example, Windows would use
.code instead of
.textis the section where you will write your ASM code.
rdxare called registers, and
xorare instructions. We will go through them in the next part
.dataallows you to statically allocate initialized global and static objects for the duration of the program execution
.bssallows you to reserve space for uninitialized global and static objects
rodatais the same as
.datawith the difference that the variables declared here will be read-only (i.e. constant)
The next things to stand out are the
msg elements. They are what is called labels.
_start is the equivalent of the main function in a higher-level language like C or C++. This is where the program will start its execution. Alternatively, it is also possible to use labels to define functions and as points to jump to (a bit like
goto in C - more on that later). Labels are also used to define variables, like
msg in our example; we will see more about these later as well.
One thing you will notice under the
msg is the
.len. A label that begins with a period is walled a local label and will be associated with the previous non-local label. In our example, it will be called as
The code is probably explicit enough but
; is used to put comments: everything put after it will not be interpreted.
Registers are storage locations kept inside of the processor, which make them very fast. There are 17 of them, and some have a specific usage attribution (for example pass arguments to a function). Technically, all of them can be modified at will (except
rip), even if there are some conventions stating how it should be done.
|rax||eax||ax||al||To provide the system call number
To provide the function return value
|rcx||ecx||cx||cl||4th function parameter
|rdx||edx||dx||dl||3rd function parameter
|rsi||esi||si||sil||2nd function parameter
Source pointer for string instructions
|rdi||edi||di||dil||1st function parameter
|rsp||esp||sp||spl||Slack pointer (top element)
|rbp||ebp||bp||bpl||Stack Base pointer
|r8||r8d||r8w||r8b||5th function parameter
|r9||r9d||r9w||r9b||6th function parameter
|rip||eip||Next instruction to be executed (rip can't be accessed directly by the programmer)|
You will notice that there are only 6 registers that can be used to pass parameters to a function. They allow passing integers or pointers only. To pass parameters larger than 64-bit, or to pass more than 6, they should be pushed into the stack, with the first argument being on the top.
You will also notice that there are two types of register: "Callee-saved" and "Caller-saved". This is actually a convention rather than something strict, but what it means is that:
- Caller-saved (volatile) registers are meant to be general-purpose and to hold temporary information. They can be rewritten by any subroutine
- Callee-saved (non-volatile) registers are meant to gold long-lived values and should be preserved across calls. i.e.: a function is supposed to back them up in the stack at its beginning and to restore them from there at the end (if the function wants to use these registers)
The next important concept in NASM is the instructions, which are basically keywords allowing us to tell the computer what to do. This part aims to list the main ones.
||Copy the value of src into dest|
||Call a function
See the code in the first part or the 'Run ASM Code and more Complex Files Structure' part for more details on how to use it. This page lists the code and arguments to use for common functions
||Send an interrupt signal. Can be another way to do
See the Linux example in Wikipedia. The function calls for Linux x32 are defined in
||Allow calling a defined label (i.e. function - might be coming from another file)|
||Push an item into the stack|
||Pull an item from the stack into a register|
Jumps and Conditions
||Jump to the location (can be a register or a label)|
||Bitwise compare a register and a constant.
||Jump to label if bits were not equal to 0|
||Jump to label if bits were equal to 0|
||Compare x and y. Must be followed by
||Jump to label if x is equal to y|
||Jump to label if x is different from y|
||Jump to label if x is greater than y|
||Jump to label if x is smaller than y|
||Jump to label if x is greater or equal to y|
||Jump to label if x is small or equal to y|
For more examples, you can have a look at this cheat sheet which is very useful.
In the example at the beginning of this article, we had the following line of code:
msg: db "Hello, world!,10", and we explained that this was a variable
msg getting attributed as
db here is the type of variable. Unlike a higher level language where you may have
int to store numbers,
char to store a single character, ... types in NASM are just used to say how much space your data will take. The available types are listed in the following table.
|Data Type||Suffix||Data Assignation||Size (bits)|
In our previous example, we can see that we are using a
db suffix, which works because a char is 8 bits big.
One thing that we haven't seen before is the data assignation column. This is used in the
.bss section to be able to define how much space we want to keep. For example:
Note that there are multiple ways to write values in NASM. The following code demonstrates various ways that can be used. Notice that when writing on bases other than 10, the values will have a suffix (e.g.
h for hexadecimal,
b for binary, ...) or a prefix (e.g.
0x for hexadecimal,
0o for octal, ...). See part 3.4.1 of the documentation for more details.
Another important concept in NASM is the effective address: an operand to an instruction that references memory. The syntax will be an expression contained in brackets. The following code snippet demonstrates some examples of how to use it:
; Accessing a variable msg ; The msg variable address byte[msg] ; The value of the first byte of the variable msg byte[msg + 1] ; The value of the second byte of the msg variable word[msg] ; The value of the first two bytes of the msg variable ; Various operations cmp BYTE [rdi], 0h ; Check if the first byte of rdi is 0h
One final thing I want to discuss in this part is the
.len: equ $ - msg portion of our first example:
equis used to define a symbol to be a constant value. When used, it will always be to attribute a label. The definition is absolute and can't be changed later
$is the address of the current position. since we defined
msgjust before, then we can know that the length of
msgwill be the distance in bytes obtained by
current address - address of msg
Run ASM Code and more Complex Files Structure
To finish this article, we will write a short program in ASM. It will get the program's argument, and print them and their size. For the sake of the example, we will write a
strlen function in a different file from the
If you have read everything until there, nothing in the
my_strlen function should be surprising you. One thing that wasn't mentioned before is that you need to declare your label as a global function if you want to be able to call it from another file. Let's jump to the main part of the program.
We can then compile our program as follows, and see that the output is what is expected.
This time again, you shouldn't be too surprised by the content of the file, except for one thing. We mentioned before that NASM programs should start with
_start, but our program here is starting with
main. The reason is that we use
gcc for the linking, and it will generate the
_start himself before calling the
If we wanted to use
_start, we could have compiled with
ld. The problem is that it is less convenient when using external functions like
printf (see this for more information). If we wanted to compile our first example, it would however be simple to do with
One final thing you could be wondering relating to the compilation is why we are using
-no-pie with GCC. The reason is that on Ubuntu, GCC will generate Position Independent Executables (PIE), which our current code is not compatible with. The
-no-pie argument tells GCC to not generate a PIE executable, but we also have the option of doing
call printf wrt ..plt in the code, which would make our code Position independent.
- Netwide Assembler (Wikipedia)
- x86 calling conventions (Wikipedia)
- x64 Cheat Sheet (Doeppner - brown.edu)
- Notes on x86-64 programming (filliatr - lri.fr)
- The Netwide Assembler: NASM (NASM documentation)
- NASM Intel x86 Assembly Language Cheat Sheet (Bencode.net)
- Intel® 64 and IA-32 Architectures Software Developer’s Manual (Intel)
- NASM Assembly Language Tutorials (asmtutor.com)
- NASM Tutorial (lmu.edu - ray)
- Linux System Call Table for x86_64 (blog.rchapman.org)
- NASM Manual - Local labels (tortall.net)