Total Pageviews

Sunday, March 21, 2010

Synchronization in Linux Kernel - Part 2 (Spin Locks)


In continuation to my previous post, lets discuss other concurrency
related problems and their solutions. Earlier, we saw how concurrent
access to a shared variable may result in loss of information or data
corruption. In this article, we will explore more such problems  and
ways to solve them.

Consider a scenario where there is a need to update multiple shared
global variables atomically. No other thread should be allowed to
acess the global variables until the thread that is updating them is done.
For eg.  take the case of multiple threads adding/removing/reading
elements from a linked list concurrently. This may result in linked list
corruption if done without using appropriate synchronization mechanism
as shown in following example.

Consider a linked list
Head -----> Element1 -----> Element2 -----> Element3 -----> Element4 -----> Null

To add a new element (say NewElement) in this list after a specific element (say Element3) following operations need to be performed.

Step 1) First update NewElement to point to Element4
Head ----> Element1 ----> Element2 -----> Element3 -----> Element4 -----> Null
                                                            |
                                          NewElement -------+              

Step 2) Update Element3 to point to NewElement
Head ----> Element1 ----> Element2 ----> Element3         Element4 ----> Null
                                            |                 |
                                            +--- NewElement --+                                                       

Now, suppose thread-1 needs to add NewElement1 after Element3 and
                        thread-2 wants to add NewElement2 after Element3.
thread-1 is running on cpu-1 and thread-2 is running on cpu-2 concurrently.

1)  thread-1 executes Step-1 to make NewElement1 point to Element4. Just after that thread-2 executes and updates NewElement2 to point to Element4 as shown below.
                                             NewElement1 -----+
                                                              |
Head ----> Element1 ----> Element2 -----> Element3 -----> Element4 ----> Null
                                                              |
                                             NewElement2 -----+

2) thread-1 then updates Element3 to point to NewElement1.
                                              +--- NewElement1 ---+                   
                                              |                   |
Head ----> Element1 ----> Element2 -----> Element3            Element4 ----> Null
                                                                  |
                                                    NewElement2---+

3) And then thread2 also updates Element3 to point to the NewElement2
                                                   NewElement1 ---+
                                                                  |
Head ----> Element1 ----> Element2 ------> Element3           Element4 ----> Null
                                              |                   |
                                              +--- NewElement2 ---+

So, finally we have NewElement2 successfully inserted into the linked list. But, NewElement1 is not a part of the linked list, So we lost Element1.

How to solve the problem where multiple global shared variables need to be updated atomically ?
Answer is simple. We need to prevent all threads from accessing the linked list while one of the thread is manipulating the list.
To achieve this, use one global variable as a lock. Any thread that needs to access the list must acquire the lock by setting its value to 1. This thread will first test if the value of lock variable is 0, then set it to1 to acquire the lock. This test-and-set should be done atomically. If any thread is already holding a lock, all others threads should wait. The thread holding the lock will release the lock (by setting the lock to 0) after it is done with the list.

This locking variable is called Spin Lock.
A Spin Lock can be implemented as a single bit in an integer variable. The most important thing here is that the testing and setting of the lock bit should be done atomically. Performing read-modify-write operation atomically is one of the prime requirement of implementing a spin lock.
If test-and-set is not done atomically then multiple threads may erroneously come under the impression that they all hold the lock simultaneouly which will lead to data corruption. See the example below how this may happen.

Problem if test-and-set is not done atomically
                                                    
       thread-1                              thread-2

                       Lock Variable = 0
     
1) Reads the value of lock variable (=0)
    to its  register.                    
                                         2) Reads the value of lock variable (=0)
                                            to its register.

(Both of them find the value of Lock variable to be 0, So they assume that the lock is free.)
3) Checks the value of local register.
   Value is 0, So try to acquire the
   lock by  writing 1 to lock variable
   in memory.                                                                            
                                         4) thread-2 also finds that the value of
                                            global variable read in its register
                                            = 0, which means lock is free. So,
                                            writes 1 to the lock variable to
                                            acquire the lock.   

(Both of  them now feel that have acquired the lock which is wrong)


So, test and set should be done atomically.


Lets first see a simple implementation of test_and_set()
int test_and_set (int *lock)
{
   int temp = *lock;          // save the original value of lock
   *x = 1;                    // always set lock to 1
   return temp;               // return the original value
}
Note: this is not atomic.


test_and_set() saves the original value of lock, sets the lock to 1 and returns the old value of lock. Of course, this should be done atomically.
The return value indicates whether it was already locked or not.
If return value is 0, it means that lock was free and by writing 1, the caller has acquired the lock.
If return value is 1, it means that the lock was already acquired by someone else. So, the caller needs to call this function again. The caller will keep calling this in a loop until this function returns 0.


Why it always set lock to 1 ?
You may be the thinking why this function always sets the value of lock to 1 ? What if the value is already 1? Why don’t we first check the value of lock? We should set it to 1 only if the initial value of lock is 0.
Well, answer is that there is nothing wrong in testing the value before setting it to 1. But, to test we need an extra instruction which can be avoided by always setting it to 1.

How to make test_and_set atomic ?
As I discussed my previous post, we need support from processor to achieve this.  For eg i386 provides an “xchg” instruction that can atomically exchange the value of memory location with its internal register.

int atomic_test_and_set (int *lock)
.text 
.globl atomic_test_and_set

atomic_test_and_set:
   pushl   %ebx           // Preserve ebx. We will save original
                          // value of lock in ebx.
  
   movl    8(%esp), %ebx  // save original value of lock to
                          // ebx
   movl    $1, %eax       // 1 (true) to eax
      
   xchg    %eax, (%ebx)   // Atomically exchange eax with 
                          // lock. Sets lock = 1.
   popl    %ebx           // Restor ebx

   ret                    // return old value of lock
                          // which is already in eax.


Lets see how PowerPC does it.


int atomic_test_and_set (int *lock)
atomic_test_and_set: loop:
   lwarx  r4, 0, r3        // Save the initial value of lock.
                           // r3 = address of lock variable
   li     r5, 1            // 1 (true) to r5
   stwcx  r5, 0, r3        // store 1 to lock
   bne-   loop             // go back to loop if store is not
                           // done atomically
   mr     r3, r4           // move original value of lock saved 
                           // in r4 to r3.
   blr                     // return valus is in r3


Lets come back to our original discussion on Spin Locks. We will see how we can implement a Spin Lock on i386 and PowerPC shortly.

Why the word “spin” is used before lock ?
The reason is that while one of the thread has acquired the lock, all other threads spin in a loop repeatedly checking until the lock becomes available. 


Spin Locks should be acquired for a short time
As the threads spinning in loop are not doing any useful work they are wasting costly cpu cycles. So, spinlocks should be held only for a short duration of time.

Why not deschedule the thread instead of spinning ?
Descheduling a thread means two context switches which are far more costlier than to spin in a loop for a short duration.

Thread should never sleep/descheduled after acquiring the spin lock
If a thread goes to sleep or descheduled after acquiring the spin lock all other threads that need the same lock will be blocked spinning in a loop wasting cpu cycles.

Disabling interrupts
If a shared variable is also accessed by an interrupt handler then local interrupts should also be disabled. Otherwise it is possible that the thread after acquiring the lock is preempted by the interrupt which tries to acquire the same lock as well. The interrupt handler will stuck in the busy-wait loop as the thread which is holding the lock never gets the chance to run to release the lock. So, both of them deadlock.

Uniprocessor systems don’t need spin locks.
 Disabling interrupts is sufficient on a single cpu system.

Spin Lock implementation
atomic_test_and_set() discussed above is the buliding block of implementing a spin lock.

void SpinLock(boolean *lock)
{
   while (atomic_test_and_set(lock) == 1)       /* If already locked spin */
       ;
}


The caller obtains the lock if the old value of lock is 0, else it spins calling atomic_test_and_set() in a while loop, until it return 0.
.
Spin Lock is implementation on i386
SpinLock:
    mov   eax, 0x1    // Set the register eax = 1
loop:
    xchg  eax, [lock] // Atomically exchange the values of lock and register eax
    test  eax, eax    // Check the old value of lock which is now in eax
    jnz   loop        // If old value of lock is not 0 (lock not free), spin in
                      // loop
    ret

SpinUnlock:
    mov   eax, 0x0    // Set eax register = 0
    xchg  eax, [lock] // Exchange the values of lock and eax register.
                      // This will set the value of lock to 0.
    ret

Spin Lock implmentation on PowerPC
SpinLock:
loop:
    lwarx   r4, 0, r3   // register r3 has the address of lock variable
                        // Load the initial value of lock variable to register r4
                        // and place a reservation on lock address
    cmpwi   r4, 0       // compare value of lock with 0
    bne-    loop        // if value of lock is 1, go to loop
    li      r5, 1       // Set r5 = 1               
    stwcx   r5, 0, r3   // Store 1 to lock,  if reservation exists
    bne-    loop        // If reservation lost, try again
                        // lock acquired successfully
    blr                 // return             


2 comments:

  1. great post. The examples are simply superb.

    ReplyDelete
  2. In the first test_and_set function, there is not variable named 'x' as mentioned.

    ReplyDelete

Followers