What is Process Address Space ?
In simple words, address space is a range of addresses that a process can access. If a process tries to access any address which does not lie in its address space, the operating
system kills the process. The process dies with the following prints “Segmentation Fault: core dumped”.
system kills the process. The process dies with the following prints “Segmentation Fault: core dumped”.
Lets take a simple example
test.c
int globalx = 100;
int globaly;
int main (void)
{
If (globalx == 100) {
globaly = 200;
}
return 0;
}
When you compile this program, you get an executable. The executable files have various formats like elf, coff etc. On a linux system the executable generated has elf file format. There are various tools available that can be used to read the elf executable file. One of such tools is readelf. This can display all the sections generated and the virtual addresses assigned to them.
# gcc test.c
Executable a.out is created. Use readelf to read the various sections of a.out
# readelf -e a.out
a.out has lots of other sections, but lets concentrate only on the following important sections and fields
Section Headers:
Name Addr Size
[ 0 ] .text 0x08048280 0x1b8
[ 1 ] .data 0x08049558 0x8
[ 2 ] .bss 0x08049560 0xc
All the machine instructions corresponding to your code go to .text sections. All initialized global variables go to .data section. In this case, globalx will go to .data section. All uninitialized global variables (globaly) will go to .bss sections.
So, what is the address space for this process ?
All .text/.data/.bss addresses consitute the address space of this process.
.text address range (0x08048280 to 0x08048437)
Note: size of .text section is 0x1b8
.data address range (0x08049558 to 0x0804955F)
.bss address range (0x08049560 to 0x0804956B)
Access to any address outside the above address ranges will cause segmentation fault and your process will be killed by the Operating System.
Apart from this there are other sections (like .stack etc) as well but we will not discuss them at this time.
Virtual to physical address translation
The above addresses are all virtual addresses and not the actual addresses of physical RAM. The MMU (memory management unit) translates these virtual addresses to physical addresses using page tables.
Page table for a process is created by the Operating System at the time when the process is created. OS reads the executable file and finds various sections and their addresses. The page table mapping is created on demand ie when the process accesses any virtual address, OS checks in the page table if the mapping already exists for this address. If no mapping exists, it finds a free page in memory and creates a mapping for that page.
The size of page depends on the processor architecture. Typically it is 4K on various processors. On such processors, address space is divided into pages of 4K and each entry in the page table maps 4K of virtual addresses to a physical page.
An entry (page table entry or PTE) in the page table typically has the following fields.
Page Frame Address |User /Kernel| Read/Write | Present |
- Page Frame Address is the physical address of the page in memory.
- User bit (User = 0, Kernel = 1) specifies whether the page can be accessed in user mode or not. If this is 1, it cannot be accessed by a user process. Only kernel can access such a page. If any user tries to access such a page, the processor will generate an exception. OS will find that the user is trying to access a kernel page and so it will kill the process.
- Read/Write (Read = 0, Write = 1) indicates if the process can read or write on this page. The text section pages are all mapped Read only.
- Present bit indicates if the mapping is valid or not. If this bit is not set, it indicates there is no mapping for this virtual address and OS should create it.
Process Segments
Apart from page tables, OS also keeps the information about each process sections. It creates a segment for each section and keeps the information like start virtual address of this section, protections for entire section (like read only for text sectioon), size of the section in the segment. This information is used to determine if process is accessing any address outside its address space.
So, for our example, OS will create three segments for .text, .bss and .data section. There’s one more for stack as well.
Accessing a valid address for the first time
Kernel creates the mapping on demand. When the virtual address is first accessed, the MMU (harware) first finds out its mapping in the page table. As this is the first time this address is accessed, MMU will not find any mapping in the page table. So, it will generate an exception and the control will be given to the exception handler code of kernel. The handler first checks if the virtual address is a valid address or not. It checks all the segments of that process one by one and validates that the address should lie in one of the segments. If the address lie in one of the segment , it will find a free page in memory and create a mapping for that page in the page table. It will also set various bits in the PTE. For eg. Present = 1, User = 0, Read = 1 or 0 (depending on type of section).
Accessing an address outside the address space
Suppose due to some bug, your process try to access some random address, the MMU first tries to find the physical address in the page table. In this case, it will not find any translation for this address. So, a page fault will be generated by the processor. The control will be transferred to the kernel. The page fault handler inspects all the segments of this process to determine if this is a valid address. In this case, the address will not lie in any of the segments. So, the kernel will not create any mapping for this and kill the process. You will see the prints “Segmentation Fault: core dumped”.
Writing to a read only section
If the process try to write to a read only page, there are two cases.
Case 1: The page table entry is already created earlier when the address was accessed for read. The PTE has Read bit = 0. ie page is read only. In this case, the MMU will find the PTE in the page table. As the process is trying to write on this page, MMU will generate an access fault exception. The exception handler will then kill the process.
Case 2: The page table doesn't have an entry for this address. Page fault is generated and the exception handler will find out if the address is a valid address. As the address is valid, it will find the corresponding section in which the address lie. Next it will check the permissions on the section. Here the section is Read Only, so the handler will kill the process.
Case 2: The page table doesn't have an entry for this address. Page fault is generated and the exception handler will find out if the address is a valid address. As the address is valid, it will find the corresponding section in which the address lie. Next it will check the permissions on the section. Here the section is Read Only, so the handler will kill the process.
How to prevent any process from accessing/corrupting the pages of another page
One process should not be allowed to access the data of another page for security reasons. How this is done?
Note that the access to the physical page is controlled by the page table of that process. The page tables are created by the kernel. Kernel will never create a mapping so as to allow one process to access another process pages. For eg. the data pages for a process A will be mapped only by its page table. No other page table will have a mapping for the data pages of process A.
How to prevent any process from accessing kernel pages
In Linux, user and kernel both share the same address space. On a 32 bit system out of 4GB, first 3GB is reserved for user and last 1 GB is used by kernel.All kernel pages will have User Bit = 1 in their PTEs (ie only kernel can access these pages).
If the user tries to access any kernel page (any address greater that or equal to 3GB), MMU during converting the virtual to physical address will find that the User Bit = 1 in the PTE. So, it will generate an exception and the process will be killed by the kernel.
That’s all for today.
Have Fun !!!
Very nice article.
ReplyDeleteGreat post.. I have 1 doubt. Suppose some date segment needs some 100 bytes. But the page size is 4K. So at first if you access a valid data within the 100 bytes. Next if you try to access some invalid data between the 100 and 4K, the entry is there in page table and will it be allowed to access the invalid data???
ReplyDeleteIn reply to Question asked by Anonymous:
ReplyDelete" I have 1 doubt. Suppose some date segment needs some 100 bytes. But the page size is 4K. So at first if you access a valid data within the 100 bytes. Next if you try to access some invalid data between the 100 and 4K, the entry is there in page table and will it be allowed to access the invalid data???"
Ans: Its a very valid question. I believe that if the translation is there in the TLB or page table, the user will be able to access the address even if it is outside the address space. If the translation is not there and a page fault is generated, OS will find that the user is trying to access an address outside the address space, So it will post SIGSEGV to the application to kill it.
your post is very detail, it helps me understand completely the process address space. Thank a lot!!!
ReplyDeleteVery nice post. I've 1 question though.
ReplyDeleteHow does MMU find out the type of access that processor wants for a given address? Because if I understand it correctly, processor just presents a linear address to MMU and asks it to convert it into physical address. Where does it specify abt whether it is trying to read/write/execute?
Thanks,
Very well explained !
ReplyDeleteVery good one most of the q's answered. Thanks.
ReplyDelete