Blog10: MISC — An overview of the ELF format

9 min readMay 6, 2023

Definition of the ELF format

The ELF (Executable and Linkable Format) format is the main binary format in use on modern Linux systems.

What is ELF mainly used for?

Executables: These are binary files that contain machine code to be executed directly by the operating system.
Shared libraries: ELF is also used for shared libraries, which are collections of functions and data that can be used by multiple programs simultaneously.
Object files: These are intermediate files generated during the compilation of a program. Object files contain machine code, data, and other information needed for linking with other object files or shared libraries to create an executable or a shared library.

ELF files for executables

An ELF file for an executable program (rather than a shared library or an object file) must always contain a program header table near the start of the file, after the ELF header, each entry in this table provides information that is needed to run the program.
In this article, we will focus on ELF executables.

Components of ELF executables:

The ELF Header:

An ELF header resides at the beginning and holds a “road map” describing the file’s organization, the ELF header is a structure that contains the metadata of the file.

At the beginning of the file, we have what’s called E_IDENT, 16 bytes that describe how the ELF file needs to be parsed. For example, this is the E_IDENT part of an executable that just calls a function (exit(0)).

The first four bytes contain a magic number, these bytes are the same for all ELF files and help you quickly verify whether a file actually is an ELF file.
The next byte specifies the class of the ELF file whether it’s a 32-bit ELF or a 64-bit ELF file (EI_CLASS). In this case, it’s set to 2 and so we are looking at a 64-bit ELF file.
The next byte specifies the data encoding used in the file, 01 means that the least significant byte comes first and 02 means that the most significant byte comes first.
Next, we have the version byte, this has always been set to 1 since the 80s because ELF has always been at version 1.
The next byte is not important to us and the one following it is almost never used so we just skip over it.
Finally, we have the last 7 bytes which are simply used as padding and don’t serve any other purpose than that.

Let's jump back to the whole header, a table included at the end of this article will be describing all the data types used, this should give you a good enough overview. Let’s continue taking a look at the other parts of the ELF header.
The first part after the E_IDENT bytes is called the type and it specifies the type of the ELF file. The following table defines all the possible values.

The next member is the machine type, there are a couple of standard values for this. For exemple 0x03 for “x86”, 0x28 for ARM, and 0x3E for amd64. The machine type is followed by the version member, which just like in the E_IDENT bytes is always set to 1. Next is the entry point, which is an address type, this specifies the entry point into our executables or the constructor address for shared libraries. The next two members with a type of offset specify where in the ELF file the program headers, which specify the segments (e_phoff), and section headers, which specify the sections. The next 32-bit value contains flags for our file, these are heavily architecture and OS-dependent, and often specify things such as the execution mode for the processor. This is followed by a member specifying the size of the ELF header itself. After this, we have two members specifying the size of each entry in the program header table, and also the number of entries in the table (we will take a good look at this later on). Then we have basically the same, but this time specifying the size and number of the section headers.

Finally, we have what’s called the section header string table index. You can use the “readelf” tool, which will show you all the information contained in the ELF file.

Program Header

An executable or shared object file’s program header table is an array of structures, each describing a segment or other information the system needs to prepare the program for execution.

Let's take a look inside of a program header. At the beginning of the program header, the type of the segment is specified, the table below shows the possible values.

For example, we have the load segment type (PT_LOAD), segments of this type will be loaded into memory. How exactly it will be loaded is specified by the other members of the program header. The next member in the structure is the offset, this specifies where in the ELF file the content of this segment (if it has any) is located. Then we have the virtual address field, it specifies where the first byte of the segment will be in memory. The next member is only used in contexts where physical memory is relevant (firmware files), it specifies the physical address where the segment will be loaded. Next, the size of the segment in the file is specified, if this is 0, the segment is defined exclusively by the program header and has no further content in the file. Then we have the size of the segment in memory, if this is larger than the size in the file, the leftover bytes will be initialized as zero. The flag allows specifying the permissions of the segment, whether it’s readable, writable, and or executable. For example, the code segment will in most cases be set to executable and readable but not writable. Finally, we have a field specifying the alignment requirements for the segment, which allows specification of whether a segment needs to be aligned to let's say a 4-byte or an 8-byte boundary.

Creating a personalized ELF file using libgolf

Libgolf is a small library, it makes it very easy to generate a binary consisting of an ELF header, followed by a single program header, followed by a single loadable segment. By default, all the fields in the headers are set to the same values, but there’s a simple way to play with these defaults.
Let’s take a look at how this works, The basic setup needs two files, a C source file and a shellcode.h. For example lets take ‘b0 3c 48 31 ff 0f 05’, which disassembles to:

It just calls exit(0). This is nice because we can easily check that these bytes are successfully executed with the shell expansion $?. Let's create the two files.

Compiling this and running the resulting executable will provide you with a .bin file (you must use “chmod +x file.bin”), this is your personalized elf file! Pretty simple, right?

First off, INIT_ELF() takes two arguments, the ISA and the architecture. LibGolf supports X86_64, ARM32, and AARCH64 as valid ISAs and either 32 or 64 for the architecture. It first sets up some internal bookkeeping structs and decides whether to use the Elf32_* or Elf64_* objects for the headers.
It also automatically assigns pointers to the ELF and program headers, called ehdr and phdr respectively. It is these that we will use to easily modify the fields. Aside from that, it also copies the shellcode buffer over and populates the ELF and program headers before calculating a sane entry point. Now comes GEN_ELF(), which just prints some nice stats to stdout and then writes the appropriate structs to the .bin file. The name of the .bin is determined by argv[0].
Suppose we wanted to modify the e_version field of the ELF header. All we need to do is add a single line:

Let’s do something more interesting!!!

Using dead bytes

At the start of any ELF file is, of course, the familiar 0x7f followed by the ELF content. Modifying any of these four bytes results in the Linux loader rejecting the file. Our trusty specification tells us that this is the EI_CLASS byte and denotes the target architecture. Acceptable values are 0x01 and 0x02, for 32 and 64-bit respectively. What if we set it to 0x41 (“A” is ASCII), We can do that by adding:

Why 0x41? It shows up clearly in the xxd output!

Once we’ve got our .bin to play with, before trying to execute it, let’s try a couple of other familiar ELF parsing tools (eg. gdb).

Now, let’s try to run the binary normally. It works perfectly!
So, how far can we go? Just how much of the ELF and program headers will the Linux loader ignore?
After checking every bit that can be modified and ignored by the Linux loader, The binary generating C file now looks like this:

If you compile and run this program, you’ll get the following binary:

This file is 127 bytes in size, but we were able to replace a total of 50 bytes with ‘A’, meaning just under 40% of this binary is ignored by the Linux ELF loader! Who knows what you could do with 50 bytes? It turns out to be quite a lot!

In the following articles, we will see how to take advantage of these dead bytes!

For more details or if you could not understand some of the concepts you can find more details in the links below.

Data types

Commands

readelf -h binary #showing information about the elf header 
readelf - segments binary #list all the segments in the binary 
readelf - sections binary #list all the sections in the binary 
xxd binary #creates a hex dump of a given file(binary)

References

https://www.youtube.com/watch?v=nC1U1LJQL8o&ab_channel=stacksmashing
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/elf.h
https://www.github.com/xcellerator/libgolf
https://tmpout.sh/1/1.html
https://lwn.net/Articles/631631/
https://www.sco.com/developers/gabi/latest/ch4.eheader.html#elfid

Links

Facebook: INSEC Ensias

Instagram: INSEC Ensias

Linkedin: INSEC Ensias

Youtube: INSEC Club

Don’t forget to drop us a follow on social media to stay up to date with everything the club is doing. Looking forward to sharing more knowledge with all the readers and we welcome your feedback at insecblog@gmail.com

Writer: akna

Editors: berradAtay, F3nn3C