What happens when you compile your C code ?
Answer is simple, the compiler generates the executable. On a linux/unix system, by deafult the name of the executable generated is “a.out”.
What’s there inside an executable file (a.out) ?
Have you ever tried dissecting an a.out file ? Its not a plain binary file of machine codes. It is much more than that and has lot of other information that helps Operating System to load it in memory. The executable files have various formats like COFF, ELF etc.
Now a day, most of the unix like operating systems (linux, BSD, Solaris, IRIX) etc use ELF (Executable and Linkable Format) format for their executables.
Typically an elf executable includes
- ELF Header
- Program Headers
- Section Headers
- Data referred by program or section headers
Dissecting an ELF File
We will take a simple C Program, compile it and see what all is there in the generated a.out (ELF) file.
“file” command determines this information by reading the Elf Header which lies at the start of file.
The first four bytes hold a magic number identifying the file as ELF executable.
The sections flags have following meanings:
Symbol table has a symbol table entry for each symbol. Each entry is of fixed size. As each entry is of fixed size, we cannot keep “symbol” name in the entry. Here also an offset is stored. The offset is an index into the “.strtab” section, giving the location of null terminated symbol name.
Note the virtual address (0x080495c8) of symbol “global1”. It is an initialized global symbol, so it must go to .data section. The start address of section .data = 0x080495c4 and its size is 0x8 bytes. Hence, we can see that “global1” resides in .data section.
Note: we are storing 300 (0x12c) at address 0x80495c8 which is the address of variable global1.
If type of segment is PT_LOAD, it indicates that the Operating System should load this segment in memory.
The type GNU_STACK indicates that program needs a stack segment. Its virtual address and size is 0. It is upto the operating system to decide on size and where to create the stack segment.
We will take a simple C Program, compile it and see what all is there in the generated a.out (ELF) file.
/************************* test.c ************************/
int global1 = 100;
int global2;
int main (void)
{
global2 = 200;
global1 = 300;
printf(“global1 = %d global2 = %d\n”, global1, global2);
return 0;
}
On compiling it on a linux system, a.out is generated with elf file format.
You can determine the file format using the file command.
# file a.out
a.out ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (user shared libs), for GNU/Linux 2.6.9, not stripped
“file” command determines this information by reading the Elf Header which lies at the start of file.
ELF Header
Always lie at the start of the executable file. ELF header has an overall information about the entire elf file. It describes the target architecture (Intel 80386 in this case), version of elf, location and number of program and section headers. It also contains the location of the first executable instruction (called entry point).
Lets print the contents of ELF header for our “a.out” elf executable. You can use the tool “readelf” to dissect the elf executable.
ELF HEADER
-----------
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT]; // elf magic
Elf32_Half e_type;
Elf32_Half e_machine; // target machine architecture
Elf32_Word e_version;
Elf32_Addr e_entry; // entry point address
Elf32_Off e_phoff; // program hdr table’s file offset
Elf32_Off e_shoff; // section hgr table’s file offset
Elf32_Word e_flags;
Elf32_Half e_ehsize; // elf header size in bytes
Elf32_Half e_phentsize; // size of one entry in program
// header table in bytes. All
// Entries are of equal size
Elf32_Half e_phnum; // number of entries in programm header table
Elf32_Half e_shentsize; // size of section header in bytes
Elf32_Half e_shnum; // number of section headers in section header table
Elf32_Half e_shstrndx; // index of .shstrtab section in section header table.
} Elf32_Ehdr;
# readelf -h a.out
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x80482b0
Start of program headers: 52 (bytes into file)
Start of section headers: 1980 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 7
Size of section headers: 40 (bytes)
Number of section headers: 28
Section header string table index: 25
The first four bytes hold a magic number identifying the file as ELF executable.
The second (0x45), third (0x4c) and fourth (0x46) characters are in fact the ASCII values for ‘E’, ‘L’, ‘F’. The “file” command reads this magic number to determine if this is an ELF file or not.
Note the entry point address. This is the address of first instruction where the control is transferred after loading the executable in memory.
Elf Header also contains the offset at which the program header table and section header table are placed in the a.out file.
ELF Section Headers
The elf executable contains various sections and each section has a corresponding section header that contains the section name, the virtual address at which this should be loaded, the type of section, offset from the beginning file at which the first byte of the section resides, the size of section etc.
Few important sections are:
- .text : This section hold the executable instructions of the program.
- .bss : This holds the uninitialized global data. In our example code, the variable global2 will go to the .bss section. All data in this section is initialized with 0, when program is loaded into memory. This section occupies no space in elf file. We only have a header for .bss section in the elf file. There is no need to allocate any space in the a.out (elf file) as we know that the initial value of the variables inside .bss is 0.
- .data : Global initialized data goes here.
- .strtab : It holds the names of various symbols.
- .symtab : It holds a symbol entry for each symbol.
- .shstrtab : This section holds sections names.
There are various other sections as well. But we will concentrate only on the above sections.
Lets print the section header for above sections. Again, readelf can be used to print the section headers.
ELF SECTION HEADER
------------------
typedef struct {
Elf32_Word sh_name; // offset into .shstrtab section
Elf32_Word sh_type;
Elf32_Word sh_flags;
Elf32_Addr sh_addr;
Elf32_Off sh_offset;
Elf32_Word sh_size;
Elf32_Word sh_link;
Elf32_Word sh_info;
Elf32_Word sh_addralign;
Elf32_Word sh_entsize;
} Elf32_Shdr;
# readelf –S a.out
(only important fields are shown below)
Section Headers:
[Nr] Name Type Addr Off Size Flg
[12] .text PROGBITS 080482b0 0002b0 0001d8 AX
[22] .data PROGBITS 080495c4 0005c4 000008 WA
[23] .bss NOBITS 080495cc 0005cc 00000c WA
[25] .shstrtab STRTAB 00000000 0006e0 0000db
[26] .symtab SYMTAB 00000000 000c1c 000460
[27] .strtab STRTAB 00000000 00107c 00026a
The sections flags have following meanings:
- A (ALLOC) The space should be allocated in memory to load this section. See that symbol and string table are not loaded in memory
- X (EXEC INSTRUCTIONS) The section contians executable machine instructions. See that .text section has this flag set.
- W (WRITE) The section has data that can be modified during program execution.
Note the section type (NOBITS) of .bss section. NOBITS indicates that section does not occupy any space in th executable file.
Also, note that the virtual address of sections .symtab, .strtab is 0, which means that they are not loaded in memory. They are only used during debugging of the program.
The offset specifies where the actual bytes for that section reside in the elf file.
For eg. offset for .text section is 0x2b0, which means that the machine instructions for this program lie at an offset of 0x2b0 from the start of a.out file.
offset for .text section is 0x2b0, which means that the machine instructions for this program lie at an offset of 0x2b0 from the start of a.out file.
The name is not the actual name of the section. We cannot store the name of section in section header as want all section headers to be of equal size. Its easier to parse the sections if all of them are of equal size. So, instead of keeping the name an offset is stored. The offset is actually an index into the “.shstrtab” section, giving the location of null terminated string.
You can also print the symbol table of elf file.
SYMBOL TABLE ENTRY
-------------------
typedef struct {
Elf32_Word st_name; // offset into .strtab section
Elf32_Addr st_value;
Elf32_Word st_size;
unsigned char st_info;
unsigned char st_other;
Elf32_Half st_shndx;
} Elf32_Sym;
# readelf -s a.out
Symbol table '.symtab' contains 70 entries:
Num: Value Size Type Bind Vis Ndx Name
…
…
62: 080495d4 4 OBJECT GLOBAL DEFAULT 23 global2
63: 080495c8 4 OBJECT GLOBAL DEFAULT 22 global1
…
68: 08048384 82 FUNC GLOBAL DEFAULT 12 main
69: 08048250 0 FUNC GLOBAL DEFAULT 10 _init
…
Symbol table has a symbol table entry for each symbol. Each entry is of fixed size. As each entry is of fixed size, we cannot keep “symbol” name in the entry. Here also an offset is stored. The offset is an index into the “.strtab” section, giving the location of null terminated symbol name.
Note the virtual address (0x080495c8) of symbol “global1”. It is an initialized global symbol, so it must go to .data section. The start address of section .data = 0x080495c4 and its size is 0x8 bytes. Hence, we can see that “global1” resides in .data section.
# objdump –S a.out
..
..
08048384 :
8048384: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048388: 83 e4 f0 and $0xfffffff0,%esp
804838b: ff 71 fc pushl 0xfffffffc(%ecx)
804838e: 55 push %ebp
804838f: 89 e5 mov %esp,%ebp
8048391: 51 push %ecx
8048392: 83 ec 14 sub $0x14,%esp
8048395: c7 05 d4 95 04 08 c8 movl $0xc8,0x80495d4
804839c: 00 00 00
804839f: c7 05 c8 95 04 08 2c movl $0x12c,0x80495c8
..
..
Note: we are storing 300 (0x12c) at address 0x80495c8 which is the address of variable global1.
Simlilarly, 200 (0xc8) is stored at 0x80495d4 which is the address of variable global2. Also, see that global2 is uninitialized global variable so it must reside in .bss section. The start virtual address of .bss = 0x080495cc and its size is 0xc bytes. So, we can clearly see that global2 resides in .bss section.
Program Header Table
Program Header Table are used meaningful only for executable files and shared object files. Or you can say, that any object file that needs to be loaded into memory for execution needs a program header table.
Each entry in the program header table describes a segment in the process address space. It has the information needed to create an executable process image in memory. The operating system copies the loadable segment (PT_LOAD) into the memory according to the location and size information.
So, various sections having common attributes/types are combined together to form a single segment.
The sections like .text, .init, .fini, .plt etc all have machine executable code and have same attributes. So, they all can be combined together to form a single entry in program header table or single segment.
Similarly, sections like .bss, .data, .got etc all have data corresponding to various variables that can be modified during program execution. So, all these sections are combined together to form a single segment.
Lets print the program header table for our a.out file
PROGRAM HEADER
--------------
typedef struct {
Elf32_Word p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
Elf32_Word p_filesz;
Elf32_Word p_memsz;
Elf32_Word p_flags;
Elf32_Word p_align;
} Elf32_Phdr;
# readelf -l .a.out
Elf file type is EXEC (Executable file)
Entry point 0x80482b0
There are 7 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
…
…
LOAD 0x000000 0x08048000 0x08048000 0x004cc 0x004cc R E 0x1000
LOAD 0x0004cc 0x080494cc 0x080494cc 0x00100 0x0010c RW 0x1000
…
…
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .gnu.hash .dynsym .dynstr
.gnu.version .gnu.version_r .rel.dyn .rel.plt .init
.plt.text .fini .rodata .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
…
06
If type of segment is PT_LOAD, it indicates that the Operating System should load this segment in memory.
The type GNU_STACK indicates that program needs a stack segment. Its virtual address and size is 0. It is upto the operating system to decide on size and where to create the stack segment.
The command output also displays the section to segment mapping to tell which all sections are combined together to form a particular segment.
That’s all for today. Hope you find it useful. In case you have any suggestions or you find any errors please provide your comments below.
Till then, Have Fun !!!
Nice article. Easy to understand and written in simple words. Keep it up...
ReplyDeletegood work.
ReplyDeleteWell...references?
ReplyDeleteNeat and Quite simple. Congratulations!
ReplyDeletePlease continue your work
can any text be added to the .note section? How is this done? Is it included in a c file or header file that gets compiled?
ReplyDeleteIn response to the comment:
ReplyDelete"can any text be added to the .note section? How is this done? Is it included in a c file or header file that gets compiled?"
Note section can be used to keep vendoe specific information which other programs may check for conformance and compatibility.
How to add text to .note section ? Well, it is all done by the utility/code/tool that creates the elf file. The contents of .note section are not taken from the c file or c header. But, you may keep any arbitrary information in the note section depending on your needs.
Let me give you an example. The .note sections are typically useful in the core files. When a program crashes, a core file is created by your Operating System. The .note section is used to various information about the process that crashed and the reason why it got crashed.
For eg. process id, parent process id, cpu status (like all general purpose registers), process status, cpu usage, nice value etc.
It all depends on the Operating system and what information it wants to keep in the note section which may be useful for debuggers to find out th exact cause of crash.
So, there's nothing from your c code that goes into the .note section. It all depends on you what do you want to keep there which may be used by other tools/utilities/customers for any arbitrary purpose.
You must alter the code that generated the elf file to add the note section as per your needs.
Where well written and in easy words. Got the whole concept. Thank you
ReplyDeletehello.. thanks a lot for the great info.. i needed some help on the same lines.. in my project i have huge number of C files which are conditionally compiled using some macros (something like QA_ON or QA_OFF etc).. i have a task to evaluate how many times a specific #define (another macro which is functionality related) is called when the code is compiled with the QA_OFF flag enabled? can this info be somehow found using the elf file? Please suggest.
ReplyDeleteAwesome work.. Very well presented !!
ReplyDelete