Total Pageviews

Sunday, January 16, 2011

Log Structured File System : An Introduction

I am writing this post after a long gap. In this post I am trying to give a basic idea about the log structured file systems. I have oversimplified some of the stuff so as not to confuse you and keep the things as straight forward as possible.
In traditional file systems the data stored on the disk may not be contiguous and so the seek time is poor which degrades the performance. It is the seek time between various IO requests that dominates the performance. Also, buffer caches are sufficiently large these days and most of the reads can be served by the buffer cache. So, disk traffic is mostly dominated by writes and disk seek time is the major reason that impacts the performance.
Log structured try to solve this problem by writing all modifications contiguously on the disk in a log-like structure.
LFS first buffers all updates in memory into a segment (typically MB in size). When the segment is full, it is written to the disk in one long sequential write to the unused part of disk.
LFS never overwrites the data. The modified data blocks and inodes are written to the new segment. So, we have the old and new copies of the inodes and data blocks spread in different logs and we need to keep track of the latest inodes/data-blocks.
Lets take a simple example to see what happens when a block is added to a file
Here we need to allocate a new block and modify the inode to point to the newly allocated block. Initially the modification are done to the segment which is still in memory. Later, when the segment is full of such modifications it is written to the disk. Note, here there is no problem of finding free disk blocks. The free space in the log implies free blocks and can be allocated to a file. Both the modified inode and the new block are written on to the log.
image
Wait, we have a problem here. How do you find the location of new inode ?
How to find the location of new modified inode ?
On a traditions unix file system it is quite easy to find the location of an inode. The inode table lies at a fixed address on the disk. So, if we know the inode number and its size we can easily determine its location.
In LFS, the inodes are spread everywhere on the disk in various segments. Moreover there are multiple copies of the inodes (as we never overwrite in place). So, how to find which one is latest and its location in the segment ? The answer is inode map which keeps the disk address of all inodes.
inode map is an array that keeps the disk addresses of all the inodes. Imagine if there are one hundres thousand inodes what is the size of inode-map.
inode map size = 100000 * 4 = 400000 bytes = 390 KB.
So, inode maps are small enough and can be cached completely in memory. Every segment on disk has a copy of the inode map.
Now the question is where to put the inode map on disk?
inode map can be put on a fixed part of the disk. But, it is changed frequently and so whenever a segment is updated we would need to update the inode map. So there would be more disk seeks and performance may be impacted. Please note that we do not wait for the entire segment to be filled with modified inodes/data before the segment is synced to disk. In fact, the segments are updated partially and so for each partial write updating inode map may impact the performance. So, we keep the inode-map also on each segment.
image
Now how to find the inode-map ?
LFS keeps a fixed place on disk called checkpoint region (CR) that contains the address of the latest inode-map. The CR needs to be updated whenever we write a new segment. So, CR is required to be updated quite infrequently.
image
Note that each segment has pointer to the next segment and that is why the overall name of the file system is log structured file system.
By starting from the CR we can find the first segment written to the disk and traverse the chain to see what changes were made to the filesystem finally reaching the latest segment.
Reading files from a Log Structured File System
Lets assume that the system is just booted and there is nothing related to LFS in memory. The first thing that is read from the disk is CR (check point region) as this is the only region whose position is fixed. CR contains the pointer to the inode map. LFS then reads the entire inode map into the memory. To get the inode, LFS simply looks the cached copy of inode map and finds the inode number to inode disk address and thus reads the most recent version of inode. To find blocks of a file, rest of the operations are exactly the same like a traditional file system. Once LFS had read the latest copy of inode it can find out the disk addresses of the blocks that belong to this file and read them.
Garbage collection
Garbage collection is very important in LFS. As we have seen that LFS never overwrites any inode/disk-block. So, there are older versions of a file/blocks scattered all over the disk in various segments. LFS must periodically find out these old versions and clean/free them.
Periodically LFS reads the old segments and determine which blocks are live in this segment. Please note that any segment may contain few old and few latest versions of inodes/data-blocks. We will see how to determine if a particular inod/data-block is latest. Lets assume that we somehow know which inodes/blocks are latest. The cleaner process then writes only the latest/live blocks to a new segment and frees the old segment. So cleaner will read M old segments and compact them to N new segments (where M > N). This results in freeing of (M – N) segments and thereby creating more free space.
How to determine which inodes/data blocks in a segment are live/latest-version ?
How to find if a inode is live or not?
Fisrt determine the inode number. For this every segment has a segment summary that also keeps information about which inodes are stored in this segment. Looking at the segment summary we can find the inode number and its disk address (say D)
Now, we have the latest inode map cached in the memory. So, we can always find out the disk address of the latest version (L) of an inode. Then we can compare if the inode address in the segment (D) matches with the address of the latest version (L) in inode-map. If they both are equal that means the inode is live.
How to find if a data block is live ?
For this the segment summary block in the segment keeps some information for each data block. LFS includes the inode number and its offset in the file.
Given this information it is easy to determine if the block is live or not. LFS notes the inode number of block D and offset O from the segment summary and then looks in the latest inode map (cached in memory) to find the disk address of the corresponding inode and reads the inode from the disk. It then finds the disk address of the block at offset O in the file. If the disk address in the inode matches with the disk address of block in the segment, the block is live.
LFS also keeps the version number of a file. Whenever a new file is created LFS increases its version number. Thus looking at the version number LFS can save the longer checks to determine the liveliness of an inode/disk-block.
More on this later. Lots of things have been oversimplified for the sake of clarity. I will go into more details in later posts. Till then Have Fun !!!
And yes if you have any comments positive/negative please feel free to post.

Followers