Understanding Linux: 2010

Wednesday, June 2, 2010

Exceptions and Interrupts on x86 family of processors (Part 1)

Interrupts and exceptions are special kinds of control transfer; they alter the normal program flow to handle external events or to report errors or exceptional conditions.

The difference between interrupts and exceptions is that interrupts are used to handle events external to the processor, but exceptions handle conditions detected by the processor itself while executing instructions (like divide by 0, invalid opcode, floating point exception etc).

Identifying an interrupt/exception
Intel processor associates an identifying number (ranging from 0 to 255) with each different type of interrupt or exception. Intel calls this identifying number a vector.

Interrupt and Exception numbers on x86 processors

Vector Number Description

0 Divide by Zero

1 Debug exceptions

2 Nonmaskable interrupt

3 Breakpoint (one-byte INT 3 instruction)

4 Overflow (INTO instruction)

5 Bounds check (BOUND instruction)

6 Invalid opcode

7 Coprocessor not available

8 Double fault

9 Reserved

10 Invalid TSS

11 Segment not present

12 Stack exception

13 General protection

14 Page fault

15 Reserved

16 Coprocessor error

17-31 Reserved

32-255 Available for external interrupts via

INTR pin

The vectors of nonmaskable interrupt and exceptions are fixed.

IRQs and external interrupts
The external devices like scsi disks, sound cards, network cards etc may be assigned any vector in the range 32 – 228.

Linux uses vector 28 (0x80) to implement system calls.

The IBM compatible PC architecture requires that some devices should be allocated particular fixed vector number as shown in following table.

IRQ	INT	Hardware device
0	32	Timer
1	33	Keyboard
2	34	PIC cascading
3	35	Second serial port
4	36	First serial port
6	38	Floppy disk
8	40	System clock
10	42	Network card
11	43	USB port, sound card
12	44	PS/2 mouse
13	45	Mathematical coprocessor
14	46	EIDE disk controller's first chain
15	47	EIDE disk controller's second chain

Interrupt vectors as used by Linux on an x86 processor

Vector Range	Use
0 – 16	Nonmaskable interrupts and exceptions
17 - 31	Intel-reserved
32 - 127	Maskable External interrupts (IRQs)
128 (0x80)	Software interrupt for system calls (int 0x80)
129 - 238	External interrupts (IRQs)
239	Local APIC timer interrupt
240	Local APIC thermal interrupt
241 - 250	Reserved by Linux for future use
251 – 253	Interprocessor interrupts
254 (0xfe)	Local APIC error interrupt (generated when the local APIC detects an erroneous condition)
255 (0xff)	Local APIC spurious interrupt (generated if the CPU masks an interrupt while the hardware device raises it)

Enabling and Disabling Interrupts on intel x86 processor
The external interrupts that are signaled via the INTR pin of the processor can be enabled/disabled by the IF bit of flag register.

When IF=0, INTR interrupts are inhibited.

When IF=1, INTR interrupts are enabled.

The instructions CLI and STI alter the setting of IF.

CLI (Clear Interrupt-Enable Flag) and STI (Set Interrupt-Enable Flag) explicitly alter IF (bit 9 in the flag register).

These instructions may be executed only by the linux kernel code that runs at a higher privilege level as compared to the user program. The user process cannot enable/disable the interrupts.

Programmable Interrupt Controller (PIC 82596)Each hardware device (like keyboard, sound card, hard disk etc) capable of issuing an interrupt request has an output line known as IRQ line. These IRQ lines are connected to the input pins of programmable interrupt controller (PIC 82596).

The IRQ lines are sequentially numbered from 0. The vector numbers for external interrupts start from 32. So, the PIC should be programmed in a way that it generates the vector number 32 when the hardware device connected to IRQ0 generates an interrupt.

So, IRQ0 is associated with vector number 32.

IRQN is associated with vector number 32 + N

Each IRQ can be selectively disabled. The PIC can be programmed to disable IRQ. The disabled interrupts are not lost and are issued to the CPU as soon as they are enabled again.

Typically on a PC, the PICs are implemented by connecting them in cascade as shown in figure above. Each 82596 chip can handle up to 8 IRQ input lines.

The INT output of slave PIC is connected to the IRQ 2 of the master PIC as is used for cascading. So, only 15 IRQ lines are available to the external devices.

The 8 IRQ lines are first passed through the IMR (Interrupt Mask Register) to see if they are masked or not. If a particular interrupt is masked then it is not processed further. If it is not masked it will register its request with the Interrupt Request Register (IRR) by setting the corresponding bit in it. The priority resolver then selects the highest priority interrupt and set the corresponding bit in the ISR register.

Sequence of events in handling an interrupt request
1) One or more IRQ lines (IRQ0 to IRQ15) are raised high by the external device connected to these lines. This results in setting the corresponding bits in IRR (Interrupt Request Register).

2) The chip 82596 sends INT to CPU (ie INT line on the processor is asserted)

3) CPU acknowledges the INT and responds with INTA pulse

4) Upon receiving INTA from CPU, the highest priority bit from IRR is selected and the corresponding bit in ISR (Interrupt Service Register) is set. The selected bit in IRR is reset.

The ISR bit shows which IRQ is being currently served.

The corresponding bit in IRR is reset as it is no longer requesting service but actually getting service.

5) CPU then initiates a second INTA pulse to tell the PIC to place the 8 bit IRQ number on the the data bus (corresponding the vector number of IRQ being serviced).

6) CPU then reads the data bus to find out the interrupt vector number to call the associated Interrupt handler code.

7) Once the interrupt handler (ISR) is done, it sends an EOI (End of Interrupt) to the PIC. The PIC will then determine the next highest priority interrupt and repeat the same process.

This is all for today. In the next article we will see how interrupts are handled by linux kernel and discuss in more details about them.

Till then, Have Fun !!!

Monday, May 31, 2010

Data Structure Alignment and Padding

You must have heard that the fundamental data types must be aligned to a specific byte boundary. Most of you must have seen your programs being killed due to alignment exception generated by the processor. For any C programmer it is very important to understand the reasons behind the alignment restrictions.
In this article we will see why the compiler align certain data types to a certain boundary.

Why alignment restrictions ?
Typically a 32 bit processor reads or writes from the memory in chunks of 4 bytes (32 bits). The reads and writes can be performed at addresses that are divisible by 4.
Even if you want to read a single byte, 4 bytes are read from the memory and then the processor hardware circuitry will copy the requested byte to the specific position in the register.
For example, on m68K processor, a multiplexer sits in between an internal register and the external bus. The multiplexer selects the specified byte and routes it to the appropriate byte position in the register.

Lets try to see why processor restricts the data to be aligned to a specific byte boundary. Suppose the processor needs to read an integer (4 bytes in size).

Case 1: Integer is stored at address that is 4 byte aligned (say 0x100)

As the processor can read/write only at addresses that are divisible by 4, all four bytes can be read in a single bus cycle. Note that the address 0x100 is divisible by 4.

Address	Byte 0	Byte 1	Byte 2	Byte 3
0x100	X0	X1	X2	X3

Case 2: Integer is stored at an unaligned address (say 0x101)

This read cannot be performed in a single 32 bit bus cycle. The processor will have to issue two different reads at addresses 0x100 and 0x104.

Thus, it takes twice the time to read a misaligned data.

Address	Byte 0	Byte 1	Byte 2	Byte 3
0x100		X0	X1	X2
0x104	X3

It is important to note that

· Some processors allow unaligned access but at a performance penalty as we saw above. Total time to read a misaligned data is more/double as compared to the aligned data Intel processor allow reading/writing data which is unaligned but at the cost of reduced performance.

· Some processors generate an alignment exception on accessing a misaligned data. Then it is up to the exception handler either to report an error (kill the user process) or perform two read cycles to read the misaligned data (user process will run normally in this case but at reduced performance).

C compiler allocates addresses such that the data types are properly aligned. This helps in speeding up the read/write operation without causing any exception.
Typically on a 32 bit processor (like 32 bit x86, m68k, PowerPC, ARM etc)

· A char (one byte) will be 1-byte aligned.

· A short (two bytes) will be 2-byte aligned.

· An int (four bytes) will be 4-byte aligned.

· Any pointer (four bytes) will be 4-byte aligned (e.g.: char *, int *)

Structure alignment and padding

Compiler add pad bytes in the user defined structures so that the various fields of a structure are properly aligned.

There are certain rules that can be kept in mind to understand padding in a structure.

1) No padding before the first element

The first element of a struct must come first, and must be preceded by no padding. This allows a struct pointer to be converted to a pointer to the struct's first element and vice versa, which is a useful property.

2) Pad bytes at the end of structure

It is obvious to add bytes in between the elements of a structure to keep the elements aligned. But why the compiler will put pad bytes at the end of a structure ?

This is because if an array of structure is declared, start of each structure should be properly aligned.

For an array of structure, struct A array[10],

ð array[1] should be equivalent to *(array + 1).

ð difference between array + 1 and array + 0, should exactly equal to the size of struct A

To satisfy above constraints, compiler may have to put padding bytes at the end of structure. The rule is that the last member inside a structure should be padded with the number of bytes to make the structure aligned to the size of largest member of structure.

Lets see few examples for structure alignment and padding

struct exampleStruct {

char data1;

short data2;

int data 3;

char data 4;

};

On a typical 32 bit processor (say m68k or PowerPC), the compiler will insert padding in between various members so that they are properly aligned.

struct exampleStruct {

char data1;

char padding[1]; // pad of 1 byte to keep short to be 2

// byte aligned

short data2;

int data3; // already aligned to 4 byte boundary,

// so no padding before this

char data4;

char padding[3]; // padding at the end of structure so

// that if an array of such structures is

// declared, the start of each structure

// in the array is properly aligned.

// Padding should be done such that

// start of structure address is aligned

// to the size of largest data member of

// structure. In this case, the size of

// largest data type (int data3) is 4.

// So, 3 pad bytes are added to the end.

};

How to arrange members so as to minimize padding?

By changing the ordering of members in a structure, it is possible to minimize the amount of padding. If members are sorted by ascending or descending aligment requirements, minimal amount of padding is required.

So, if you have a structure with multiple integer, short and char members, then keep all integer members at the beginning, keep all short members after interger members and all chars at the end.

How to prevent compiler from putting pad bytes in between or at the end of structure?

Compiler provides #pragma directives to specify the alignment of members inside a structure.

#pragma pack(push) // save original alignment on stack

#pragma pack(1) // set alignment to 1 byte boundary

struct packedStruct {

char data1;

int data2; // no padding is inserted,

// as alignment specified is 1 byte.

char data3;

};

#pragma pack(pop) // restore original alignment

Here, we tell the compiler that each element should be aligned to 1 byte boundary. So, no padding is inserted in between elements. The size of this structure aftre compilation will be 6 bytes.

Thats all for today.
Have Fun !!!

Total Pageviews

Understanding Linux

Blog Archive

Wednesday, June 2, 2010

Exceptions and Interrupts on x86 family of processors (Part 1)

Monday, May 31, 2010

Data Structure Alignment and Padding

Followers