Total Pageviews

Saturday, April 17, 2010

Executable File Formats (ELF)

What happens when you compile your C code ?
Answer is simple, the compiler generates the executable. On a linux/unix system, by deafult the name of the executable generated is “a.out”.

What’s there inside an executable file (a.out) ?
Have you ever tried dissecting an a.out file ? Its not a plain binary file of machine codes. It is much more than that and has lot of other information that helps Operating System to load it in memory. The executable files have various formats like COFF, ELF etc.
Now a day, most of the unix like operating systems (linux, BSD, Solaris, IRIX) etc use ELF (Executable and Linkable Format) format for their executables.

Typically an elf executable includes
  • ELF Header
  • Program Headers
  • Section Headers
  • Data referred by program or section headers

Dissecting an ELF File
We will take a simple C Program, compile it and see what all is there in the generated a.out (ELF) file.

/************************* test.c ************************/
int global1 = 100;
int global2;
int main (void)
{
   global2 = 200;
   global1 = 300;
   printf(“global1 = %d global2 = %d\n”, global1, global2);
   return 0;
}

On compiling it on a linux system, a.out is generated with elf file format.
You can determine the file format using the file command.

# file a.out
a.out ELF 32-bit  LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux  2.6.9, dynamically linked (user shared libs), for GNU/Linux 2.6.9, not stripped

“file” command determines this information by reading the Elf Header which lies at the start of file.

ELF Header
Always lie at the start of  the executable file. ELF header has an overall information about the entire elf file. It describes the target architecture (Intel 80386 in this case), version of elf, location and number of  program and section headers. It also contains the location of the first executable instruction (called entry point).

Lets print the contents of ELF header for our “a.out” elf executable. You can use the tool “readelf” to dissect the elf executable.

ELF HEADER
-----------
#define EI_NIDENT 16
typedef struct {
unsigned char  e_ident[EI_NIDENT];  // elf magic
   Elf32_Half  e_type;     
   Elf32_Half  e_machine;   // target machine architecture
   Elf32_Word  e_version;
   Elf32_Addr  e_entry;     // entry point address
   Elf32_Off   e_phoff;     // program hdr table’s file offset
   Elf32_Off   e_shoff;     // section hgr table’s file offset
   Elf32_Word  e_flags;
   Elf32_Half  e_ehsize;    // elf header size in bytes
   Elf32_Half  e_phentsize; // size of one entry in program
                            // header table in bytes. All 
                            // Entries are of equal size
   Elf32_Half  e_phnum;     // number of entries in programm header table
   Elf32_Half  e_shentsize; // size of section header in bytes
   Elf32_Half  e_shnum;     // number of section headers in section header table
   Elf32_Half  e_shstrndx;  // index of .shstrtab section in section header table.
} Elf32_Ehdr;

# readelf  -h a.out
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x80482b0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          1980 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         7
  Size of section headers:           40 (bytes)
  Number of section headers:         28
  Section header string table index: 25


The first four bytes hold a magic number identifying the file as ELF executable.

The second (0x45), third (0x4c) and fourth (0x46) characters are in fact the ASCII values for ‘E’, ‘L’, ‘F’. The “file” command reads this magic number to determine if this is an ELF file or not.
Note the entry point address. This is the address of first instruction where the control is transferred after loading the executable in memory.
Elf Header  also contains the offset at which the program header table and section header table are placed in the a.out file.

ELF Section Headers
The elf executable contains various sections and each section has a corresponding section header that contains the section name, the virtual address at which this should be loaded, the type of section, offset from the beginning file at which the first byte of the section resides, the size of section etc.

Few important sections are:
  • .text : This section hold the executable instructions of the program.
  • .bss : This holds the uninitialized global data. In our example code, the variable global2 will go to the .bss section. All data in this section is initialized with 0, when program is loaded into memory. This section occupies no space in elf file. We only have a header for .bss section in the elf file. There is no need to allocate any space in the a.out (elf file) as we know that the initial value of the variables inside .bss is 0.
  • .data : Global initialized data goes here.
  • .strtab : It holds the names of various symbols.
  • .symtab : It holds a symbol entry for each symbol.
  • .shstrtab : This section holds sections names.
There are various other sections as well. But we will concentrate only on the above sections. 
Lets print the section header for above sections. Again, readelf can be used to print the section headers.

ELF SECTION HEADER
------------------
typedef struct {
   Elf32_Word   sh_name;   // offset into .shstrtab section
   Elf32_Word   sh_type;
   Elf32_Word   sh_flags;
   Elf32_Addr   sh_addr;
   Elf32_Off    sh_offset;
   Elf32_Word   sh_size;
   Elf32_Word   sh_link;
   Elf32_Word   sh_info;
   Elf32_Word   sh_addralign;
   Elf32_Word   sh_entsize;
} Elf32_Shdr;

# readelf –S a.out 
(only important fields are shown below)

Section Headers:
  [Nr] Name      Type     Addr       Off     Size       Flg

  [12] .text     PROGBITS 080482b0   0002b0  0001d8     AX
  [22] .data     PROGBITS 080495c4   0005c4  000008     WA
  [23] .bss      NOBITS   080495cc   0005cc  00000c     WA
  [25] .shstrtab STRTAB   00000000   0006e0  0000db
  [26] .symtab   SYMTAB   00000000   000c1c  000460
  [27] .strtab   STRTAB   00000000   00107c  00026a

The sections flags have following meanings:
  • A (ALLOC) The space should be allocated in memory to load this section. See that symbol and string table are not loaded in memory
  • X (EXEC INSTRUCTIONS) The section contians executable machine instructions. See that .text section has this flag set.
  • W (WRITE) The section has data that can be modified during program execution.
Note the section type (NOBITS) of .bss section. NOBITS indicates that section does not occupy any space in th executable file.

Also, note that the virtual address of sections .symtab, .strtab is 0, which means that they are not loaded in memory. They are only used during debugging of the program.

The offset specifies where the actual bytes for that section reside in the elf file.
For eg. offset for .text section is 0x2b0, which means that the machine instructions for this program lie at an offset of 0x2b0 from the start of a.out file.

offset for .text section is 0x2b0, which means that the machine instructions for this program lie at an offset of 0x2b0 from the start of a.out file.

The name is not the actual name of the section. We cannot store the name of section in section header as  want all section headers to be of equal size. Its easier to parse the sections if all of them are of equal size. So, instead of keeping the name an offset is stored. The offset is actually an index into the “.shstrtab” section, giving the location of null terminated string.

You can also print the symbol table of elf file.

SYMBOL TABLE ENTRY
-------------------
typedef struct {
   Elf32_Word     st_name;   // offset into .strtab section
   Elf32_Addr     st_value;
   Elf32_Word     st_size;
   unsigned char  st_info;
   unsigned char  st_other;
   Elf32_Half     st_shndx;
} Elf32_Sym;


# readelf  -s a.out

Symbol table '.symtab' contains 70 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
    …
    …
    62: 080495d4     4 OBJECT  GLOBAL DEFAULT   23 global2
    63: 080495c8     4 OBJECT  GLOBAL DEFAULT   22 global1
    …
    68: 08048384    82 FUNC    GLOBAL DEFAULT   12 main
    69: 08048250     0 FUNC    GLOBAL DEFAULT   10 _init
    …


Symbol table has a symbol table entry for each symbol.  Each entry is of fixed size. As each entry is of fixed size, we cannot keep “symbol” name in the entry. Here also an offset is stored. The offset is an index into the “.strtab” section, giving the location of null terminated symbol name.


Note the virtual address (0x080495c8) of symbol “global1”. It is an initialized global symbol, so it must go to .data section. The start address of section .data = 0x080495c4 and its size is 0x8 bytes. Hence, we can see that “global1” resides in .data section.

# objdump –S a.out
..
..
08048384
:
 8048384:   8d 4c 24 04             lea    0x4(%esp),%ecx
 8048388:   83 e4 f0                and    $0xfffffff0,%esp
 804838b:   ff 71 fc                pushl  0xfffffffc(%ecx)
 804838e:   55                      push   %ebp
 804838f:   89 e5                   mov    %esp,%ebp
 8048391:   51                      push   %ecx
 8048392:   83 ec 14                sub    $0x14,%esp
 8048395:   c7 05 d4 95 04 08 c8    movl   $0xc8,0x80495d4
 804839c:   00 00 00
 804839f:   c7 05 c8 95 04 08 2c    movl   $0x12c,0x80495c8
..
..

Note: we are storing 300 (0x12c) at address 0x80495c8 which is the address of variable global1.

Simlilarly, 200 (0xc8) is stored at 0x80495d4 which is the address of variable global2. Also, see that global2 is uninitialized global variable so it must reside in .bss section. The start virtual address of .bss = 0x080495cc and its size is 0xc bytes. So, we can clearly see that global2 resides in .bss section.

Program Header Table
Program Header Table are used meaningful only for executable files and shared object files. Or you can say, that any object file that needs to be loaded into memory for execution needs a program header table.
Each entry in the program header table describes a segment in the process address space. It has the information needed to create an executable process image in memory. The operating system copies the loadable segment (PT_LOAD) into the memory according to the location and size information.
So, various sections having common attributes/types are combined together to form a single segment.
The sections like .text, .init, .fini, .plt etc all have machine executable code and have same attributes. So, they all can be combined together to form a single entry in program header table or single segment.

Similarly, sections like .bss, .data, .got etc all have data corresponding to various variables that can be modified during program execution. So, all these sections are combined together to form a single segment.

Lets print the program header table for our a.out file

PROGRAM HEADER
--------------
typedef struct {
   Elf32_Word   p_type;
   Elf32_Off    p_offset;
   Elf32_Addr   p_vaddr;
   Elf32_Addr   p_paddr; 
   Elf32_Word   p_filesz;
   Elf32_Word   p_memsz;
   Elf32_Word   p_flags;
   Elf32_Word   p_align;
} Elf32_Phdr;

# readelf -l .a.out
Elf file type is EXEC (Executable file)
Entry point 0x80482b0
There are 7 program headers, starting at offset 52

Program Headers:
 Type   Offset  VirtAddr  PhysAddr  FileSiz MemSiz Flg Align
 …
 …
 LOAD      0x000000 0x08048000 0x08048000 0x004cc 0x004cc R E 0x1000
 LOAD      0x0004cc 0x080494cc 0x080494cc 0x00100 0x0010c RW  0x1000
 …
 …
 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

Section to Segment mapping:
  Segment Sections...
   00    
   01     .interp
   02     .interp .note.ABI-tag .gnu.hash .dynsym .dynstr  
          .gnu.version .gnu.version_r .rel.dyn .rel.plt .init     
          .plt.text .fini .rodata .eh_frame
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   …
   06

If type of segment is PT_LOAD, it indicates that the Operating System should load this segment in memory.

The type GNU_STACK indicates that program needs a stack segment. Its virtual address and size is 0. It is upto the operating system to decide on size and where to create the stack segment.

The command output also displays the section to segment mapping to tell which all sections are combined together to form a particular segment.

That’s all for today. Hope you find it useful. In case you have any suggestions or you find any errors please provide your comments below.

Till then, Have Fun !!!


9 comments:

  1. Nice article. Easy to understand and written in simple words. Keep it up...

    ReplyDelete
  2. Neat and Quite simple. Congratulations!
    Please continue your work

    ReplyDelete
  3. can any text be added to the .note section? How is this done? Is it included in a c file or header file that gets compiled?

    ReplyDelete
  4. In response to the comment:

    "can any text be added to the .note section? How is this done? Is it included in a c file or header file that gets compiled?"

    Note section can be used to keep vendoe specific information which other programs may check for conformance and compatibility.

    How to add text to .note section ? Well, it is all done by the utility/code/tool that creates the elf file. The contents of .note section are not taken from the c file or c header. But, you may keep any arbitrary information in the note section depending on your needs.

    Let me give you an example. The .note sections are typically useful in the core files. When a program crashes, a core file is created by your Operating System. The .note section is used to various information about the process that crashed and the reason why it got crashed.
    For eg. process id, parent process id, cpu status (like all general purpose registers), process status, cpu usage, nice value etc.
    It all depends on the Operating system and what information it wants to keep in the note section which may be useful for debuggers to find out th exact cause of crash.

    So, there's nothing from your c code that goes into the .note section. It all depends on you what do you want to keep there which may be used by other tools/utilities/customers for any arbitrary purpose.
    You must alter the code that generated the elf file to add the note section as per your needs.

    ReplyDelete
  5. Where well written and in easy words. Got the whole concept. Thank you

    ReplyDelete
  6. hello.. thanks a lot for the great info.. i needed some help on the same lines.. in my project i have huge number of C files which are conditionally compiled using some macros (something like QA_ON or QA_OFF etc).. i have a task to evaluate how many times a specific #define (another macro which is functionality related) is called when the code is compiled with the QA_OFF flag enabled? can this info be somehow found using the elf file? Please suggest.

    ReplyDelete
  7. Awesome work.. Very well presented !!

    ReplyDelete

Followers