Understanding Linux: Synchronization in Linux Kernel

In continuation to my previous post, lets discuss other concurrency
related problems and their solutions. Earlier, we saw how concurrent
access to a shared variable may result in loss of information or data
corruption. In this article, we will explore more such problems and
ways to solve them.

Consider a scenario where there is a need to update multiple shared
global variables atomically. No other thread should be allowed to
acess the global variables until the thread that is updating them is done.

For eg. take the case of multiple threads adding/removing/reading
elements from a linked list concurrently. This may result in linked list
corruption if done without using appropriate synchronization mechanism
as shown in following example.

Consider a linked list

Head -----> Element1 -----> Element2 -----> Element3 -----> Element4 -----> Null

To add a new element (say NewElement) in this list after a specific element (say Element3) following operations need to be performed.

Step 1) First update NewElement to point to Element4

Head ----> Element1 ----> Element2 -----> Element3 -----> Element4 -----> Null
|
NewElement -------+

Step 2) Update Element3 to point to NewElement

Head ----> Element1 ----> Element2 ----> Element3 Element4 ----> Null
| |

+--- NewElement --+

Now, suppose thread-1 needs to add NewElement1 after Element3 and

thread-2 wants to add NewElement2 after Element3.

thread-1 is running on cpu-1 and thread-2 is running on cpu-2 concurrently.

1) thread-1 executes Step-1 to make NewElement1 point to Element4. Just after that thread-2 executes and updates NewElement2 to point to Element4 as shown below.

NewElement1 -----+
|

Head ----> Element1 ----> Element2 -----> Element3 -----> Element4 ----> Null
|
NewElement2 -----+

2) thread-1 then updates Element3 to point to NewElement1.

+--- NewElement1 ---+
| |
Head ----> Element1 ----> Element2 -----> Element3 Element4 ----> Null

|
NewElement2---+

3) And then thread2 also updates Element3 to point to the NewElement2

NewElement1 ---+
|

Head ----> Element1 ----> Element2 ------> Element3 Element4 ----> Null
| |

+--- NewElement2 ---+

So, finally we have NewElement2 successfully inserted into the linked list. But, NewElement1 is not a part of the linked list, So we lost Element1.

How to solve the problem where multiple global shared variables need to be updated atomically ?

Answer is simple. We need to prevent all threads from accessing the linked list while one of the thread is manipulating the list.

To achieve this, use one global variable as a lock. Any thread that needs to access the list must acquire the lock by setting its value to 1. This thread will first test if the value of lock variable is 0, then set it to1 to acquire the lock. This test-and-set should be done atomically. If any thread is already holding a lock, all others threads should wait. The thread holding the lock will release the lock (by setting the lock to 0) after it is done with the list.

This locking variable is called Spin Lock.

A Spin Lock can be implemented as a single bit in an integer variable. The most important thing here is that the testing and setting of the lock bit should be done atomically. Performing read-modify-write operation atomically is one of the prime requirement of implementing a spin lock.

If test-and-set is not done atomically then multiple threads may erroneously come under the impression that they all hold the lock simultaneouly which will lead to data corruption. See the example below how this may happen.

Problem if test-and-set is not done atomically

thread-1 thread-2

Lock Variable = 0

1) Reads the value of lock variable (=0)

to its register.

2) Reads the value of lock variable (=0)
to its register.

(Both of them find the value of Lock variable to be 0, So they assume that the lock is free.)

3) Checks the value of local register.

Value is 0, So try to acquire the

lock by writing 1 to lock variable

   in memory.
   4) thread-2 also finds that the value of
    global variable read in its register
   = 0, which means lock is free. So,

writes 1 to the lock variable to
acquire the lock.

(Both of them now feel that have acquired the lock which is wrong)

So, test and set should be done atomically.

Lets first see a simple implementation of test_and_set()

int test_and_set (int *lock)
{

int temp = *lock; // save the original value of lock

*x = 1; // always set lock to 1

return temp; // return the original value

}

Note: this is not atomic.

test_and_set() saves the original value of lock, sets the lock to 1 and returns the old value of lock. Of course, this should be done atomically.

The return value indicates whether it was already locked or not.

If return value is 0, it means that lock was free and by writing 1, the caller has acquired the lock.

If return value is 1, it means that the lock was already acquired by someone else. So, the caller needs to call this function again. The caller will keep calling this in a loop until this function returns 0.

Why it always set lock to 1 ?

You may be the thinking why this function always sets the value of lock to 1 ? What if the value is already 1? Why don’t we first check the value of lock? We should set it to 1 only if the initial value of lock is 0.

Well, answer is that there is nothing wrong in testing the value before setting it to 1. But, to test we need an extra instruction which can be avoided by always setting it to 1.

How to make test_and_set atomic ?

As I discussed my previous post, we need support from processor to achieve this. For eg i386 provides an “xchg” instruction that can atomically exchange the value of memory location with its internal register.

int atomic_test_and_set (int *lock)

.text

.globl atomic_test_and_set

atomic_test_and_set:
pushl %ebx // Preserve ebx. We will save original

                        // value of lock in ebx.

   movl    8(%esp), %ebx // save original value of lock to

                          // ebx
   movl    $1, %eax       // 1 (true) to eax

   xchg    %eax, (%ebx)   // Atomically exchange eax with

// lock. Sets lock = 1.

popl %ebx // Restor ebx

ret // return old value of lock

// which is already in eax.

Lets see how PowerPC does it.

int atomic_test_and_set (int *lock)

atomic_test_and_set: loop:

lwarx r4, 0, r3 // Save the initial value of lock.

// r3 = address of lock variable

li r5, 1 // 1 (true) to r5

stwcx r5, 0, r3 // store 1 to lock

bne- loop // go back to loop if store is not

// done atomically

mr r3, r4 // move original value of lock saved

// in r4 to r3.

blr // return valus is in r3

Lets come back to our original discussion on Spin Locks. We will see how we can implement a Spin Lock on i386 and PowerPC shortly.

Why the word “spin” is used before lock ?

The reason is that while one of the thread has acquired the lock, all other threads spin in a loop repeatedly checking until the lock becomes available.

Spin Locks should be acquired for a short time
As the threads spinning in loop are not doing any useful work they are wasting costly cpu cycles. So, spinlocks should be held only for a short duration of time.

Why not deschedule the thread instead of spinning ?

Descheduling a thread means two context switches which are far more costlier than to spin in a loop for a short duration.

Thread should never sleep/descheduled after acquiring the spin lock

If a thread goes to sleep or descheduled after acquiring the spin lock all other threads that need the same lock will be blocked spinning in a loop wasting cpu cycles.

Disabling interrupts

If a shared variable is also accessed by an interrupt handler then local interrupts should also be disabled. Otherwise it is possible that the thread after acquiring the lock is preempted by the interrupt which tries to acquire the same lock as well. The interrupt handler will stuck in the busy-wait loop as the thread which is holding the lock never gets the chance to run to release the lock. So, both of them deadlock.

Uniprocessor systems don’t need spin locks.

Disabling interrupts is sufficient on a single cpu system.

Spin Lock implementation

atomic_test_and_set() discussed above is the buliding block of implementing a spin lock.

void SpinLock(boolean *lock)

{

while (atomic_test_and_set(lock) == 1) /* If already locked spin */

;

}

The caller obtains the lock if the old value of lock is 0, else it spins calling atomic_test_and_set() in a while loop, until it return 0.

Spin Lock is implementation on i386

SpinLock:

mov eax, 0x1 // Set the register eax = 1

loop:

xchg eax, [lock] // Atomically exchange the values of lock and register eax

   test eax, eax // Check the old value of lock which is now in eax
   jnz loop // If old value of lock is not 0 (lock not free), spin in
   // loop

ret

SpinUnlock:

mov eax, 0x0 // Set eax register = 0

xchg eax, [lock] // Exchange the values of lock and eax register.
// This will set the value of lock to 0.

ret

Spin Lock implmentation on PowerPC

SpinLock:

loop:

lwarx r4, 0, r3 // register r3 has the address of lock variable
// Load the initial value of lock variable to register r4

// and place a reservation on lock address

cmpwi r4, 0 // compare value of lock with 0

bne- loop // if value of lock is 1, go to loop

li r5, 1 // Set r5 = 1

stwcx r5, 0, r3 // Store 1 to lock, if reservation exists

bne- loop // If reservation lost, try again

// lock acquired successfully

blr // return

Total Pageviews

Understanding Linux

Blog Archive

Sunday, March 21, 2010

Synchronization in Linux Kernel - Part 2 (Spin Locks)

2 comments:

Followers