INTEL 80386 PROGRAMMER'S REFERENCE MANUAL 1986 Table of Contents PART II SYSTEMS PROGRAMMING Chapter 4 Systems Architecture 4.1 Systems Registers 4.1.1 Systems Flags 4.1.2 Memory-Management Registers 4.1.3 Control Registers 4.1.4 Debug Register 4.1.5 Test Registers 4.2 Systems Instructions Chapter 5 Memory Management 5.1 Segment Translation 5.1.1 Descriptors 5.1.2 Descriptor Tables 5.1.3 Selectors 5.1.4 Segment Registers 5.2 Page Translation 5.2.1 Page Frame 5.2.2 Linear Address 5.2.3 Page Tables 5.2.4 Page-Table Entries 5.2.4.1 Page Frame Address 5.2.4.2 Present Bit 5.2.4.3 Accessed and Dirty Bits 5.2.4.4 Read/Write and User/Supervisor Bits 5.2.5 Page Translation Cache 5.3 Combining Segment and Page Translation 5.3.1 "Flat" Architecture 5.3.2 Segments Spanning Several Pages 5.3.3 Pages Spanning Several Segments 5.3.4 Non-Aligned Page and Segment Boundaries 5.3.5 Aligned Page and Segment Boundaries 5.3.6 Page-Table per Segment Chapter 6 Protection 6.1 Why Protection? 6.2 Overview of 80386 Protection Mechanisms 6.3 Segment-Level Protection 6.3.1 Descriptors Store Protection Parameters 6.3.1.1 Type Checking 6.3.1.2 Limit Checking 6.3.1.3 Privilege Levels 6.3.2 Restricting Access to Data 6.3.2.1 Accessing Data in Code Segments 6.3.3 Restricting Control Transfers 6.3.4 Gate Descriptors Guard Procedure Entry Points 6.3.4.1 Stack Switching 6.3.4.2 Returning from a Procedure 6.3.5 Some Instructions are Reserved for Operating System 6.3.5.1 Privileged Instructions 6.3.5.2 Sensitive Instructions 6.3.6 Instructions for Pointer Validation 6.3.6.1 Descriptor Validation 6.3.6.2 Pointer Integrity and RPL 6.4 Page-Level Protection 6.4.1 Page-Table Entries Hold Protection Parameters 6.4.1.1 Restricting Addressable Domain 6.4.1.2 Type Checking 6.4.2 Combining Protection of Both Levels of Page Tables 6.4.3 Overrides to Page Protection 6.5 Combining Page and Segment Protection Chapter 7 Multitasking 7.1 Task State Segment 7.2 TSS Descriptor 7.3 Task Register 7.4 Task Gate Descriptor 7.5 Task Switching 7.6 Task Linking 7.6.1 Busy Bit Prevents Loops 7.6.2 Modifying Task Linkages 7.7 Task Address Space 7.7.1 Task Linear-to-Physical Space Mapping 7.7.2 Task Logical Address Space Chapter 8 Input/Output 8.1 I/O Addressing 8.1.1 I/O Address Space 8.1.2 Memory-Mapped I/O 8.2 I/O Instructions 8.2.1 Register I/O Instructions 8.2.2 Block I/O Instructions 8.3 Protection and I/O 8.3.1 I/O Privilege Level 8.3.2 I/O Permission Bit Map Chapter 9 Exceptions and Interrupts 9.1 Identifying Interrupts 9.2 Enabling and Disabling Interrupts 9.2.1 NMI Masks Further NMls 9.2.2 IF Masks INTR 9.2.3 RF Masks Debug Faults 9.2.4 MOV or POP to SS Masks Some Interrupts and Exceptions 9.3 Priority Among Simultaneous Interrupts and Exceptions 9.4 Interrupt Descriptor Table 9.5 IDT Descriptors 9.6 Interrupt Tasks and Interrupt Procedures 9.6.1 Interrupt Procedures 9.6.1.1 Stack of Interrupt Procedure 9.6.1.2 Returning from an Interrupt Procedure 9.6.1.3 Flags Usage by Interrupt Procedure 9.6.1.4 Protection in Interrupt Procedures 9.6.2 Interrupt Tasks 9.7 Error Code 9.8 Exception Conditions 9.8.1 Interrupt 0 ÄÄ Divide Error 9.8.2 Interrupt 1 ÄÄ Debug Exceptions 9.8.3 Interrupt 3 ÄÄ Breakpoint 9.8.4 Interrupt 4 ÄÄ Overflow 9.8.5 Interrupt 5 ÄÄ Bounds Check 9.8.6 Interrupt 6 ÄÄ Invalid Opcode 9.8.7 Interrupt 7 ÄÄ Coprocessor Not Available 9.8.8 Interrupt 8 ÄÄ Double Fault 9.8.9 Interrupt 9 ÄÄ Coprocessor Segment Overrun 9.8.10 Interrupt 10 ÄÄ Invalid TSS 9.8.11 Interrupt 11 ÄÄ Segment Not Present 9.8.12 Interrupt 12 ÄÄ Stack Exception 9.8.13 Interrupt 13 ÄÄ General Protection Exception 9.8.14 Interrupt 14 ÄÄ Page Fault 9.8.14.1 Page Fault during Task Switch 9.8.14.2 Page Fault with Inconsistent Stack Pointer 9.8.15 Interrupt 16 ÄÄ Coprocessor Error 9.9 Exception Summary 9.10 Error Code Summary Chapter 10 Initialization 10.1 Processor State after Reset 10.2 Software Initialization for Real-Address Mode 10.2.1 Stack 10.2.2 Interrupt Table 10.2.3 First Instructions 10.3 Switching to Protected Mode 10.4 Software Initialization for Protected Mode 10.4.1 Interrupt Descriptor Table 10.4.2 Stack 10.4.3 Global Descriptor Table 10.4.4 Page Tables 10.4.5 First Task 10.5 Initialization Example 10.6 TLB Testing 10.6.1 Structure of the TLB 10.6.2 Test Registers 10.6.3 Test Operations Chapter 11 Coprocessing and Multiprocessing 11.1 Coprocessing 11.1.1 Coprocessor Identification 11.1.2 ESC and WAIT Instructions 11.1.3 EM and MP Flags 11.1.4 The Task-Switched Flag 11.1.5 Coprocessor Exceptions 11.1.5.1 Interrupt 7 ÄÄ Coprocessor Not Available 11.1.5.2 Interrupt 9 ÄÄ Coprocessor Segment Overrun 11.1.5.3 Interrupt 16 ÄÄ Coprocessor Error 11.2 General Multiprocessing 11.2.1 LOCK and the LOCK# Signal 11.2.2 Automatic Locking 11.2.3 Cache Considerations Chapter 12 Debugging 12.1 Debugging Features of the Architecture 12.2 Debug Registers 12.2.1 Debug Address Registers (DRO-DR3) 12.2.2 Debug Control Register (DR7) 12.2.3 Debug Status Register (DR6) 12.2.4 Breakpoint Field Recognition 12.3 Debug Exceptions 12.3.1 Interrupt 1 ÄÄ Debug Exceptions 12.3.1.1 Instruction Address Breakpoint 12.3.1.2 Data Address Breakpoint 12.3.1.3 General Detect Fault 12.3.1.4 Single-Step Trap 12.3.1.5 Task Switch Breakpoint 12.3.2 Interrupt 3 ÄÄ Breakpoint Exception ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ PART II SYSTEMS PROGRAMMING ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Chapter 4 Systems Architecture ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The systems-level features of the 80386 architecture include: Memory Management Protection Multitasking Input/Output Exceptions and Interrupts Initialization Coprocessing and Multiprocessing Debugging These features are implemented by registers and instructions, all of which are introduced in the following sections. The purpose of this chapter is not to explain each feature in detail, but rather to place the remaining chapters of Part II in perspective. Each mention in this chapter of a register or instruction is either accompanied by an explanation or a reference to a following chapter where detailed information can be obtained. 4.1 Systems Registers The registers designed for use by systems programmers fall into these classes: EFLAGS Memory-Management Registers Control Registers Debug Registers Test Registers 4.1.1 Systems Flags The systems flags of the EFLAGS register control I/O, maskable interrupts, debugging, task switching, and enabling of virtual 8086 execution in a protected, multitasking environment. These flags are highlighted in Figure 4-1. IF (Interrupt-Enable Flag, bit 9) Setting IF allows the CPU to recognize external (maskable) interrupt requests. Clearing IF disables these interrupts. IF has no effect on either exceptions or nonmaskable external interrupts. Refer to Chapter 9 for more details about interrupts. NT (Nested Task, bit 14) The processor uses the nested task flag to control chaining of interrupted and called tasks. NT influences the operation of the IRET instruction. Refer to Chapter 7 and Chapter 9 for more information on nested tasks. RF (Resume Flag, bit 16) The RF flag temporarily disables debug exceptions so that an instruction can be restarted after a debug exception without immediately causing another debug exception. Refer to Chapter 12 for details. TF (Trap Flag, bit 8) Setting TF puts the processor into single-step mode for debugging. In this mode, the CPU automatically generates an exception after each instruction, allowing a program to be inspected as it executes each instruction. Single-stepping is just one of several debugging features of the 80386. Refer to Chapter 12 for additional information. VM (Virtual 8086 Mode, bit 17) When set, the VM flag indicates that the task is executing an 8086 program. Refer to Chapter 14 for a detailed discussion of how the 80386 executes 8086 tasks in a protected, multitasking environment. Figure 4-1. System Flags of EFLAGS Register 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÑÍÑÍÑØÑÍÑÍÍÍÍÑÍÑÍÑÍÑÍÑØÑÍÑÍÑÍÑÍÑÍÑÍÑÍ» º±±±±±±±±±±±±±±±±±±±±±±±±±±±³V³R³±³N³ID ³O³D³I³T³S³Z³±³A³±³P³±³Cº º0 0 0 0 0 0 0 0 0 0 0 0 0 0³ ³ ³0³ ³ ³±³±³ ³±³±³±³0³±³0³±³1³±º º±±±±±±±±±±±±±±±±±±±±±±±±±±±³M³F³±³T³ PL³F³F³F³F³F³F³±³F³±³F³±³Fº ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÏÑÏÑÏØÏÑÏÍÑÍÍÏÍÏÍÏÑÏÍÏØÏÍÏÍÏÍÏÍÏÍÏÍÏͼ ³ ³ ³ ³ ³ VIRTUAL 8086 MODEÄÄÄÄÙ ³ ³ ³ ³ RESUME FLAGÄÄÄÄÄÄÙ ³ ³ ³ NESTED TASK FLAGÄÄÄÄÄÄÄÄÄÄÙ ³ ³ I/O PRIVILEGE LEVELÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ INTERRUPT ENABLEÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 4.1.2 Memory-Management Registers Four registers of the 80386 locate the data structures that control segmented memory management: GDTR Global Descriptor Table Register LDTR Local Descriptor Table Register These registers point to the segment descriptor tables GDT and LDT. Refer to Chapter 5 for an explanation of addressing via descriptor tables. IDTR Interrupt Descriptor Table Register This register points to a table of entry points for interrupt handlers (the IDT). Refer to Chapter 9 for details of the interrupt mechanism. TR Task Register This register points to the information needed by the processor to define the current task. Refer to Chapter 7 for a description of the multitasking features of the 80386. 4.1.3 Control Registers Figure 4-2 shows the format of the 80386 control registers CR0, CR2, and CR3. These registers are accessible to systems programmers only via variants of the MOV instruction, which allow them to be loaded from or stored in general registers; for example: MOV EAX, CR0 MOV CR3, EBX CR0 contains system control flags, which control or indicate conditions that apply to the system as a whole, not to an individual task. EM (Emulation, bit 2) EM indicates whether coprocessor functions are to be emulated. Refer to Chapter 11 for details. ET (Extension Type, bit 4) ET indicates the type of coprocessor present in the system (80287 or 80387). Refer to Chapter 11 and Chapter 10 for details. MP (Math Present, bit 1) MP controls the function of the WAIT instruction, which is used to coordinate a coprocessor. Refer to Chapter 11 for details. PE (Protection Enable, bit 0) Setting PE causes the processor to begin executing in protected mode. Resetting PE returns to real-address mode. Refer to Chapter 14 and Chapter 10 for more information on changing processor modes. PG (Paging, bit 31) PG indicates whether the processor uses page tables to translate linear addresses into physical addresses. Refer to Chapter 5 for a description of page translation; refer to Chapter 10 for a discussion of how to set PG. TS (Task Switched, bit 3) The processor sets TS with every task switch and tests TS when interpreting coprocessor instructions. Refer to Chapter 11 for details. CR2 is used for handling page faults when PG is set. The processor stores in CR2 the linear address that triggers the fault. Refer to Chapter 9 for a description of page-fault handling. CR3 is used when PG is set. CR3 enables the processor to locate the page table directory for the current task. Refer to Chapter 5 for a description of page tables and page translation. Figure 4-2. Control Registers 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º PAGE DIRECTORY BASE REGISTER (PDBR) º RESERVED ºCR3 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÐÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º PAGE FAULT LINEAR ADDRESS ºCR2 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º RESERVED ºCR1 ÇÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÂÄÂÄÂÄÂĶ ºP³ ³E³T³E³M³Pº ºG³ RESERVED ³T³S³M³P³EºCR0 ÈÍÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÏÍÏÍÏÍÏÍÏͼ 4.1.4 Debug Register The debug registers bring advanced debugging abilities to the 80386, including data breakpoints and the ability to set instruction breakpoints without modifying code segments. Refer to Chapter 12 for a complete description of formats and usage. 4.1.5 Test Registers The test registers are not a standard part of the 80386 architecture. They are provided solely to enable confidence testing of the translation lookaside buffer (TLB), the cache used for storing information from page tables. Chapter 12 explains how to use these registers. 4.2 Systems Instructions Systems instructions deal with such functions as: 1. Verification of pointer parameters (refer to Chapter 6): ARPL ÄÄ Adjust RPL LAR ÄÄ Load Access Rights LSL ÄÄ Load Segment Limit VERR ÄÄ Verify for Reading VERW ÄÄ Verify for Writing 2. Addressing descriptor tables (refer to Chaper 5): LLDT ÄÄ Load LDT Register SLDT ÄÄ Store LDT Register LGDT ÄÄ Load GDT Register SGDT ÄÄ Store GDT Register 3. Multitasking (refer to Chapter 7): LTR ÄÄ Load Task Register STR ÄÄ Store Task Register 4. Coprocessing and Multiprocessing (refer to Chapter 11): CLTS ÄÄ Clear Task-Switched Flag ESC ÄÄ Escape instructions WAIT ÄÄ Wait until Coprocessor not Busy LOCK ÄÄ Assert Bus-Lock Signal 5. Input and Output (refer to Chapter 8): IN ÄÄ Input OUT ÄÄ Output INS ÄÄ Input String OUTS ÄÄ Output String 6. Interrupt control (refer to Chapter 9): CLI ÄÄ Clear Interrupt-Enable Flag STI ÄÄ Set Interrupt-Enable Flag LIDT ÄÄ Load IDT Register SIDT ÄÄ Store IDT Register 7. Debugging (refer to Chapter 12): MOV ÄÄ Move to and from debug registers 8. TLB testing (refer to Chapter 10): MOV ÄÄ Move to and from test registers 9. System Control: SMSW ÄÄ Set MSW LMSW ÄÄ Load MSW HLT ÄÄ Halt Processor MOV ÄÄ Move to and from control registers The instructions SMSW and LMSW are provided for compatibility with the 80286 processor. 80386 programs access the MSW in CR0 via variants of the MOV instruction. HLT stops the processor until receipt of an INTR or RESET signal. In addition to the chapters cited above, detailed information about each of these instructions can be found in the instruction reference chapter, Chapter 17. Chapter 5 Memory Management ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The 80386 transforms logical addresses (i.e., addresses as viewed by programmers) into physical address (i.e., actual addresses in physical memory) in two steps: þ Segment translation, in which a logical address (consisting of a segment selector and segment offset) are converted to a linear address. þ Page translation, in which a linear address is converted to a physical address. This step is optional, at the discretion of systems-software designers. These translations are performed in a way that is not visible to applications programmers. Figure 5-1 illustrates the two translations at a high level of abstraction. Figure 5-1 and the remainder of this chapter present a simplified view of the 80386 addressing mechanism. In reality, the addressing mechanism also includes memory protection features. For the sake of simplicity, however, the subject of protection is taken up in another chapter, Chapter 6. Figure 5-1. Address Translation Overview 15 0 31 0 LOGICAL ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ADDRESS º SELECTOR º º OFFSET º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ  ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º SEGMENT TRANSLATION º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÉÍÍÏÍ» PAGING ENABLED ºPG ?ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÈÍÍÑͼ ³ 31 PAGING  DISABLED 0 ³ LINEAR ÉÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍ» ³ ADDRESS º DIR º PAGE º OFFSET º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÑÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍͼ ³  ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ º PAGE TRANSLATION º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ³ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 31  0 PHYSICAL ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ADDRESS º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 5.1 Segment Translation Figure 5-2 shows in more detail how the processor converts a logical address into a linear address. To perform this translation, the processor uses the following data structures: þ Descriptors þ Descriptor tables þ Selectors þ Segment Registers 5.1.1 Descriptors The segment descriptor provides the processor with the data it needs to map a logical address into a linear address. Descriptors are created by compilers, linkers, loaders, or the operating system, not by applications programmers. Figure 5-3 illustrates the two general descriptor formats. All types of segment descriptors take one of these formats. Segment-descriptor fields are: BASE: Defines the location of the segment within the 4 gigabyte linear address space. The processor concatenates the three fragments of the base address to form a single 32-bit value. LIMIT: Defines the size of the segment. When the processor concatenates the two parts of the limit field, a 20-bit value results. The processor interprets the limit field in one of two ways, depending on the setting of the granularity bit: 1. In units of one byte, to define a limit of up to 1 megabyte. 2. In units of 4 Kilobytes, to define a limit of up to 4 gigabytes. The limit is shifted left by 12 bits when loaded, and low-order one-bits are inserted. Granularity bit: Specifies the units with which the LIMIT field is interpreted. When thebit is clear, the limit is interpreted in units of one byte; when set, the limit is interpreted in units of 4 Kilobytes. TYPE: Distinguishes between various kinds of descriptors. DPL (Descriptor Privilege Level): Used by the protection mechanism (refer to Chapter 6). Segment-Present bit: If this bit is zero, the descriptor is not valid for use in address transformation; the processor will signal an exception when a selector for the descriptor is loaded into a segment register. Figure 5-4 shows the format of a descriptor when the present-bit is zero. The operating system is free to use the locations marked AVAILABLE. Operating systems that implement segment-based virtual memory clear the present bit in either of these cases: þ When the linear space spanned by the segment is not mapped by the paging mechanism. þ When the segment is not present in memory. Accessed bit: The processor sets this bit when the segment is accessed; i.e., a selector for the descriptor is loaded into a segment register or used by a selector test instruction. Operating systems that implement virtual memory at the segment level may, by periodically testing and clearing this bit, monitor frequency of segment usage. Creation and maintenance of descriptors is the responsibility of systems software, usually requiring the cooperation of compilers, program loaders or system builders, and therating system. Figure 5-2. Segment Translation 15 0 31 0 LOGICAL ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ADDRESS º SELECTOR º º OFFSET º ÈÍÍÍÑÍÍÍÍÍÍÍÍÍÑÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÚÄÄÄÄÄÄÙ  ³ ³ DESCRIPTOR TABLE ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ º º ³ ³ º º ³ ³ º º ³ ³ º º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ º SEGMENT º BASE ÉÍÍÍ» ³ Àĺ DESCRIPTOR ÇÄÄÄÄÄÄÄÄÄÄÄÄÄĺ + ºÄÄÄÄÄÄÙ ÌÍÍÍÍÍÍÍÍÍÍÍ͹ ADDRESS ÈÍÑͼ º º ³ ÈÍÍÍÍÍÍÍÍÍÍÍͼ ³  LINEAR ÉÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ADDRESS º DIR º PAGE º OFFSET º ÈÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 5-3. General Segment-Descriptor Format DESCRIPTORS USED FOR APPLICATIONS CODE AND DATA SEGMENTS 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÑÍÍÍÍÍÑÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³A³ ³ ³ ³ ³ ³ ³ º º BASE 31..24 ³G³X³O³V³ LIMIT ³P³ DPL ³1³ TYPE³A³ BASE 23..16 º 4 º ³ ³ ³ ³L³ 19..16 ³ ³ ³ ³ ³ ³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÄÄÄÄÁÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ³ º º SEGMENT BASE 15..0 ³ SEGMENT LIMIT 15..0 º 0 º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ DESCRIPTORS USED FOR SPECIAL SYSTEM SEGMENTS 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÑÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³A³ ³ ³ ³ ³ ³ º º BASE 31..24 ³G³X³O³V³ LIMIT ³P³ DPL ³0³ TYPE ³ BASE 23..16 º 4 º ³ ³ ³ ³L³ 19..16 ³ ³ ³ ³ ³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ³ º º SEGMENT BASE 15..0 ³ SEGMENT LIMIT 15..0 º 0 º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ A - ACCESSED AVL - AVAILABLE FOR USE BY SYSTEMS PROGRAMMERS DPL - DESCRIPTOR PRIVILEGE LEVEL G - GRANULARITY P - SEGMENT PRESENT 5.1.2 Descriptor Tables Segment descriptors are stored in either of two kinds of descriptor table: þ The global descriptor table (GDT) þ A local descriptor table (LDT) A descriptor table is simply a memory array of 8-byte entries that contain descriptors, as Figure 5-5 shows. A descriptor table is variable in length and may contain up to 8192 (2^(13)) descriptors. The first entry of the GDT (INDEX=0) is not used by the processor, however. The processor locates the GDT and the current LDT in memory by means of the GDTR and LDTR registers. These registers store the base addresses of the tables in the linear address space and store the segment limits. The instructions LGDT and SGDT give access to the GDTR; the instructions LLDT and SLDT give access to the LDTR. Figure 5-4. Format of Not-Present Descriptor 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÑÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³ ³ º º AVAILABLE ³O³ DPL ³S³ TYPE ³ AVAILABLE º 4 º ³ ³ ³ ³ ³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÄÄÄÄÁÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º AVAILABLE º 0 º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 5-5. Descriptor Tables GLOBAL DESCRIPTOR TABLE LOCAL DESCRIPTOR TABLE ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» º ³ ³ ³ º º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º ³ º M º ³ º M ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ | | | | | | | | ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» º ³ ³ ³ º º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º ³ º N + 3 º ³ º N + 3 ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ º ³ ³ ³ º º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º ³ º N + 2 º ³ º N + 2 ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ º ³ ³ ³ º º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º ³ º N + 1 º ³ º N + 1 ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ º ³ ³ ³ º º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º ³ º N º ³ º N ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ | | | | | | | | ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» º ³ ³ ³ º º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄ(UNUSED)ÄÁÄÄÄÄÄĶ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º ³ º º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ   ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ º GDTR ÇÄÄÙ º LDTR ÇÄÄÙ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 5.1.3 Selectors The selector portion of a logical address identifies a descriptor by specifying a descriptor table and indexing a descriptor within that table. Selectors may be visible to applications programs as a field within a pointer variable, but the values of selectors are usually assigned (fixed up) by linkers or linking loaders. Figure 5-6 shows the format of a selector. Index: Selects one of 8192 descriptors in a descriptor table. The processor simply multiplies this index value by 8 (the length of a descriptor), and adds the result to the base address of the descriptor table in order to access the appropriate segment descriptor in the table. Table Indicator: Specifies to which descriptor table the selector refers. A zero indicates the GDT; a one indicates the current LDT. Requested Privilege Level: Used by the protection mechanism. (Refer to Chapter 6.) Because the first entry of the GDT is not used by the processor, a selector that has an index of zero and a table indicator of zero (i.e., a selector that points to the first entry of the GDT), can be used as a null selector. The processor does not cause an exception when a segment register (other than CS or SS) is loaded with a null selector. It will, however, cause an exception when the segment register is used to access memory. This feature is useful for initializing unused segment registers so as to trap accidental references. Figure 5-6. Format of a Selector 15 4 3 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÑÍÍÍ» º ³T³ º º INDEX ³ ³RPLº º ³I³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÏÍÍͼ TI - TABLE INDICATOR RPL - REQUESTOR'S PRIVILEGE LEVEL Figure 5-7. Segment Registers 16-BIT VISIBLE SELECTOR HIDDEN DESCRIPTOR ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» CS º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ SS º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ DS º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ES º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ FS º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ GS º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 5.1.4 Segment Registers The 80386 stores information from descriptors in segment registers, thereby avoiding the need to consult a descriptor table every time it accesses memory. Every segment register has a "visible" portion and an "invisible" portion, as Figure 5-7 illustrates. The visible portions of these segment address registers are manipulated by programs as if they were simply 16-bit registers. The invisible portions are manipulated by the processor. The operations that load these registers are normal program instructions (previously described in Chapter 3). These instructions are of two classes: 1. Direct load instructions; for example, MOV, POP, LDS, LSS, LGS, LFS. These instructions explicitly reference the segment registers. 2. Implied load instructions; for example, far CALL and JMP. These instructions implicitly reference the CS register, and load it with a new value. Using these instructions, a program loads the visible part of the segment register with a 16-bit selector. The processor automatically fetches the base address, limit, type, and other information from a descriptor table and loads them into the invisible part of the segment register. Because most instructions refer to data in segments whose selectors have already been loaded into segment registers, the processor can add the segment-relative offset supplied by the instruction to the segment base address with no additional overhead. 5.2 Page Translation In the second phase of address transformation, the 80386 transforms a linear address into a physical address. This phase of address transformation implements the basic features needed for page-oriented virtual-memory systems and page-level protection. The page-translation step is optional. Page translation is in effect only when the PG bit of CR0 is set. This bit is typically set by the operating system during software initialization. The PG bit must be set if the operating system is to implement multiple virtual 8086 tasks, page-oriented protection, or page-oriented virtual memory. 5.2.1 Page Frame A page frame is a 4K-byte unit of contiguous addresses of physical memory. Pages begin onbyte boundaries and are fixed in size. 5.2.2 Linear Address A linear address refers indirectly to a physical address by specifying a page table, a page within that table, and an offset within that page. Figure 5-8 shows the format of a linear address. Figure 5-9 shows how the processor converts the DIR, PAGE, and OFFSET fields of a linear address into the physical address by consulting two levels of page tables. The addressing mechanism uses the DIR field as an index into a page directory, uses the PAGE field as an index into the page table determined by the page directory, and uses the OFFSET field to address a byte within the page determined by the page table. Figure 5-8. Format of a Linear Address 31 22 21 12 11 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º DIR º PAGE º OFFSET º º º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 5-9. Page Translation PAGE FRAME ÉÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º DIR º PAGE º OFFSET º º º ÈÍÍÍÍÍÑÍÍÍÍÍÊÍÍÍÍÍÑÍÍÍÍÍÊÍÍÍÍÍÑÍÍÍͼ º º ³ ³ ³ º º ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄĺ PHYSICAL º ³ ³ º ADDRESS º ³ PAGE DIRECTORY ³ PAGE TABLE º º ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º ³ º º ³ º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ º º ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹  ³ º º ÀÄĺ PG TBL ENTRY ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ Àĺ DIR ENTRY ÇÄÄ¿ º º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ º º º º ³ º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ  ³  ÉÍÍÍÍÍÍÍ» ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ º CR3 ÇÄÄÄÄÄÄÄÄÙ ÈÍÍÍÍÍÍͼ 5.2.3 Page Tables A page table is simply an array of 32-bit page specifiers. A page table is itself a page, and therefore contains 4 Kilobytes of memory or at most 1K 32-bit entries. Two levels of tables are used to address a page of memory. At the higher level is a page directory. The page directory addresses up to 1K page tables of the second level. A page table of the second level addresses up to 1K pages. All the tables addressed by one page directory, therefore, can address 1M pages (2^(20)). Because each page contains 4K bytes 2^(12) bytes), the tables of one page directory can span the entire physical address space of the 80386 (2^(20) times 2^(12) = 2^(32)). The physical address of the current page directory is stored in the CPU register CR3, also called the page directory base register (PDBR). Memory management software has the option of using one page directory for all tasks, one page directory for each task, or some combination of the two. Refer to Chapter 10 for information on initialization of CR3. Refer to Chapter 7 to see how CR3 can change for each task. 5.2.4 Page-Table Entries Entries in either level of page tables have the same format. Figure 5-10 illustrates this format. 5.2.4.1 Page Frame Address The page frame address specifies the physical starting address of a page. Because pages are located on 4K boundaries, the low-order 12 bits are always zero. In a page directory, the page frame address is the address of a page table. In a second-level page table, the page frame address is the address of the page frame that contains the desired memory operand. 5.2.4.2 Present Bit The Present bit indicates whether a page table entry can be used in address translation. P=1 indicates that the entry can be used. When P=0 in either level of page tables, the entry is not valid for address translation, and the rest of the entry is available for software use; none of the other bits in the entry is tested by the hardware. Figure 5-11 illustrates the format of a page-table entry when P=0. If P=0 in either level of page tables when an attempt is made to use a page-table entry for address translation, the processor signals a page exception. In software systems that support paged virtual memory, the page-not-present exception handler can bring the required page into physical memory. The instruction that caused the exception can then be reexecuted. Refer to Chapter 9 for more information on exception handlers. Note that there is no present bit for the page directory itself. The page directory may be not-present while the associated task is suspended, but the operating system must ensure that the page directory indicated by the CR3 image in the TSS is present in physical memory before the task is dispatched. Refer to Chapter 7 for an explanation of the TSS and task dispatching. Figure 5-10. Format of a Page Table Entry 31 12 11 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÑÍÍÍÑÍÑÍÑÍÍÍÑÍÑÍÑÍ» º ³ ³ ³ ³ ³ ³U³R³ º º PAGE FRAME ADDRESS 31..12 ³ AVAIL ³0 0³D³A³0 0³/³/³Pº º ³ ³ ³ ³ ³ ³S³W³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÏÍÍÍÏÍÏÍÏÍÍÍÏÍÏÍÏͼ P - PRESENT R/W - READ/WRITE U/S - USER/SUPERVISOR D - DIRTY AVAIL - AVAILABLE FOR SYSTEMS PROGRAMMER USE NOTE: 0 INDICATES INTEL RESERVED. DO NOT DEFINE. Figure 5-11. Invalid Page Table Entry 31 1 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍ» º ³ º º AVAILABLE ³0º º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏͼ 5.2.4.3 Accessed and Dirty Bits These bits provide data about page usage in both levels of the page tables. With the exception of the dirty bit in a page directory entry, these bits are set by the hardware; however, the processor does not clear any of these bits. The processor sets the corresponding accessed bits in both levels of page tables to one before a read or write operation to a page. The processor sets the dirty bit in the second-level page table to one before a write to an address covered by that page table entry. The dirty bit in directory entries is undefined. An operating system that supports paged virtual memory can use these bits to determine what pages to eliminate from physical memory when the demand for memory exceeds the physical memory available. The operating system is responsible for testing and clearing these bits. Refer to Chapter 11 for how the 80386 coordinates updates to the accessed and dirty bits in multiprocessor systems. 5.2.4.4 Read/Write and User/Supervisor Bits These bits are not used for address translation, but are used for page-level protection, which the processor performs at the same time as address translation. Refer to Chapter 6 where protection is discussed in detail. 5.2.5 Page Translation Cache For greatest efficiency in address translation, the processor stores the most recently used page-table data in an on-chip cache. Only if the necessary paging information is not in the cache must both levels of page tables be referenced. The existence of the page-translation cache is invisible to applications programmers but not to systems programmers; operating-system programmers must flush the cache whenever the page tables are changed. The page-translation cache can be flushed by either of two methods: 1. By reloading CR3 with a MOV instruction; for example: MOV CR3, EAX 2. By performing a task switch to a TSS that has a different CR3 image than the current TSS. (Refer to Chapter 7 for more information on task switching.) 5.3 Combining Segment and Page Translation Figure 5-12 combines Figure 5-2 and Figure 5-9 to summarize both phases of the transformation from a logical address to a physical address when paging is enabled. By appropriate choice of options and parameters to both phases, memory-management software can implement several different styles of memory management. 5.3.1 "Flat" Architecture When the 80386 is used to execute software designed for architectures that don't have segments, it may be expedient to effectively "turn off" the segmentation features of the 80386. The 80386 does not have a mode that disables segmentation, but the same effect can be achieved by initially loading the segment registers with selectors for descriptors that encompass the entire 32-bit linear address space. Once loaded, the segment registers don't need to be changed. The 32-bit offsets used by 80386 instructions are adequate to address the entire linear-address space. 5.3.2 Segments Spanning Several Pages The architecture of the 80386 permits segments to be larger or smaller than the size of a page (4 Kilobytes). For example, suppose a segment is used to address and protect a large data structure that spans 132 Kilobytes. In a software system that supports paged virtual memory, it is not necessary for the entire structure to be in physical memory at once. The structure is divided into 33 pages, any number of which may not be present. The applications programmer does not need to be aware that the virtual memory subsystem is paging the structure in this manner. Figure 5-12. 80306 Addressing Machanism 16 0 32 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» LOGICAL º SELECTOR º OFFSET º ADDRESS ÈÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÚÄÄÄÄÄÄÄÙ  ³ ³ DESCRIPTOR TABLE ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ º º ³ ³ º º ³ ³ º º ³ ³ º º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ º SEGMENT º ÉÍÍÍ» ³ Àĺ DESCRIPTOR ÇÄÄÄÄÄÄÄĺ + ºÄÄÄÄÄÄÄÄÄÄÙ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÈÍÑͼ º º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³  PAGE FRAME LINEAR ÉÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ADDRESS º DIR º PAGE º OFFSET º º º ÈÍÍÍÍÍÑÍÍÍÍÍÊÍÍÍÍÍÑÍÍÍÍÍÊÍÍÍÍÍÑÍÍÍͼ º º ³ ³ ³ º º ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄĺ PHYSICAL º ³ ³ º ADDRESS º ³ PAGE DIRECTORY ³ PAGE TABLE º º ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º ³ º º ³ º º º º ³ º º ³ º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ º º ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹  ³ º º ÀÄĺ PG TBL ENTRY ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ Àĺ DIR ENTRY ÇÄÄ¿ º º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ º º º º ³ º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ  ³  ÉÍÍÍÍÍÍÍ» ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ º CR3 ÇÄÄÄÄÄÄÄÄÙ ÈÍÍÍÍÍÍͼ 5.3.3 Pages Spanning Several Segments On the other hand, segments may be smaller than the size of a page. For example, consider a small data structure such as a semaphore. Because of the protection and sharing provided by segments (refer to Chapter 6), it may be useful to create a separate segment for each semaphore. But, because a system may need many semaphores, it is not efficient to allocate a page for each. Therefore, it may be useful to cluster many related segments within a page. 5.3.4 Non-Aligned Page and Segment Boundaries The architecture of the 80386 does not enforce any correspondence between the boundaries of pages and segments. It is perfectly permissible for a page to contain the end of one segment and the beginning of another. Likewise, a segment may contain the end of one page and the beginning of another. 5.3.5 Aligned Page and Segment Boundaries Memory-management software may be simpler, however, if it enforces some correspondence between page and segment boundaries. For example, if segments are allocated only in units of one page, the logic for segment and page allocation can be combined. There is no need for logic to account for partially used pages. 5.3.6 Page-Table per Segment An approach to space management that provides even further simplification of space-management software is to maintain a one-to-one correspondence between segment descriptors and page-directory entries, as Figure 5-13 illustrates. Each descriptor has a base address in which the low-order 22 bits are zero; in other words, the base address is mapped by the first entry of a page table. A segment may have any limit from 1 to 4 megabytes. Depending on the limit, the segment is contained in from 1 to 1K page frames. A task is thus limited to 1K segments (a sufficient number for many applications), each containing up to 4 Mbytes. The descriptor, the corresponding page-directory entry, and the corresponding page table can be allocated and deallocated simultaneously. Figure 5-13. Descriptor per Page Table PAGE FRAMES ÉÍÍÍÍÍÍÍÍÍÍÍ» LDT PAGE DIRECTORY PAGE TABLES º º ÉÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍ» º º º º º º º º ÚÄÈÍÍÍÍÍÍÍÍÍÍͼ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ³ º º º º º PTE ÇÄÄÄÙ ÉÍÍÍÍÍÍÍÍÍÍÍ» ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ º º º º º º º PTE ÇÄÄÄ¿ º º ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ÀÄÈÍÍÍÍÍÍÍÍÍÍͼ º º º º º PTE ÇÄÄÄ¿ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ÚÄÄÄÈÍÍÍÍÍÍÍÍÍͼ ³ ÉÍÍÍÍÍÍÍÍÍÍÍ» ºDESCRIPTORÇÄÄÄÄÄĺ PDE ÇÄÄÄÙ ³ º º ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ³ º º ºDESCRIPTORÇÄÄÄÄÄĺ PDE ÇÄÄÄ¿ ÀÄÈÍÍÍÍÍÍÍÍÍÍͼ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ³ ÉÍÍÍÍÍÍÍÍÍÍ» º º º º ³ º º ÉÍÍÍÍÍÍÍÍÍÍÍ» ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ³ ÌÍÍÍÍÍÍÍÍÍ͹ º º º º º º ³ º º º º ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ³ ÌÍÍÍÍÍÍÍÍÍ͹ ÚÄÈÍÍÍÍÍÍÍÍÍÍͼ º º º º ³ º PTE ÇÄÄÄÙ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍ͹ ³ ÌÍÍÍÍÍÍÍÍÍ͹ ÉÍÍÍÍÍÍÍÍÍÍÍ» º º º º ³ º PTE ÇÄÄÄ¿ º º ÈÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍͼ ÀÄÄÄÈÍÍÍÍÍÍÍÍÍͼ ³ º º LDT PAGE DIRECTORY PAGE TABLES ÀÄÈÍÍÍÍÍÍÍÍÍÍͼ PAGE FRAMES Chapter 6 Protection ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 6.1 Why Protection? The purpose of the protection features of the 80386 is to help detect and identify bugs. The 80386 supports sophisticated applications that may consist of hundreds or thousands of program modules. In such applications, the question is how bugs can be found and eliminated as quickly as possible and how their damage can be tightly confined. To help debug applications faster and make them more robust in production, the 80386 contains mechanisms to verify memory accesses and instruction execution for conformance to protection criteria. These mechanisms may be used or ignored, according to system design objectives. 6.2 Overview of 80386 Protection Mechanisms Protection in the 80386 has five aspects: 1. Type checking 2. Limit checking 3. Restriction of addressable domain 4. Restriction of procedure entry points 5. Restriction of instruction set The protection hardware of the 80386 is an integral part of the memory management hardware. Protection applies both to segment translation and to page translation. Each reference to memory is checked by the hardware to verify that it satisfies the protection criteria. All these checks are made before the memory cycle is started; any violation prevents that cycle from starting and results in an exception. Since the checks are performed concurrently with address formation, there is no performance penalty. Invalid attempts to access memory result in an exception. Refer to Chapter 9 for an explanation of the exception mechanism. The present chapter defines the protection violations that lead to exceptions. The concept of "privilege" is central to several aspects of protection (numbers 3, 4, and 5 in the preceeding list). Applied to procedures, privilege is the degree to which the procedure can be trusted not to make a mistake that might affect other procedures or data. Applied to data, privilege is the degree of protection that a data structure should have from less trusted procedures. The concept of privilege applies both to segment protection and to page protection. 6.3 Segment-Level Protection All five aspects of protection apply to segment translation: 1. Type checking 2. Limit checking 3. Restriction of addressable domain 4. Restriction of procedure entry points 5. Restriction of instruction set The segment is the unit of protection, and segment descriptors store protection parameters. Protection checks are performed automatically by the CPU when the selector of a segment descriptor is loaded into a segment register and with every segment access. Segment registers hold the protection parameters of the currently addressable segments. 6.3.1 Descriptors Store Protection Parameters Figure 6-1 highlights the protection-related fields of segment descriptors. The protection parameters are placed in the descriptor by systems software at the time a descriptor is created. In general, applications programmers do not need to be concerned about protection parameters. When a program loads a selector into a segment register, the processor loads not only the base address of the segment but also protection information. Each segment register has bits in the invisible portion for storing base, limit, type, and privilege level; therefore, subsequent protection checks on the same segment do not consume additional clock cycles. Figure 6-1. Protection Fields of Segment Descriptors DATA SEGMENT DESCRIPTOR 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º±±±±±±±±±±±±±±±±±³±³±³±³A³ LIMIT ³±³ ³ TYPE ³±±±±±±±±±±±±±±±±±º º±±±BASE 31..24±±±³G³B³0³V³ 19..16 ³P³ DPL ³ ³±±±BASE 23..16±±±º 4 º±±±±±±±±±±±±±±±±±³±³±³±³L³ ³±³ ³1³0³E³W³A³±±±±±±±±±±±±±±±±±º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ º º±±±±±±±±SEGMENT BASE 15..0±±±±±±±±±³ SEGMENT LIMIT 15..0 º 0 º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ EXECUTABLE SEGMENT DESCRIPTOR 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º±±±±±±±±±±±±±±±±±³±³±³±³A³ LIMIT ³±³ ³ TYPE ³±±±±±±±±±±±±±±±±±º º±±±BASE 31..24±±±³G³D³0³V³ 19..16 ³P³ DPL ³ ³±±±BASE 23..16±±±º 4 º±±±±±±±±±±±±±±±±±³±³±³±³L³ ³±³ ³1³0³C³R³A³±±±±±±±±±±±±±±±±±º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ º º±±±±±±±±SEGMENT BASE 15..0±±±±±±±±±³ SEGMENT LIMIT 15..0 º 0 º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ SYSTEM SEGMENT DESCRIPTOR 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÑÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º±±±±±±±±±±±±±±±±±³±³±³±³A³ LIMIT ³±³ ³ ³ ³±±±±±±±±±±±±±±±±±º º±±±BASE 31..24±±±³G³X³0³V³ 19..16 ³P³ DPL ³0³ TYPE ³±±±BASE 23..16±±±º 4 º±±±±±±±±±±±±±±±±±³±³±³±³L³ ³±³ ³ ³ ³±±±±±±±±±±±±±±±±±º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ º º±±±±±±±±SEGMENT BASE 15..0±±±±±±±±±³ SEGMENT LIMIT 15..0 º 0 º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ A - ACCESSED E - EXPAND-DOWN AVL - AVAILABLE FOR PROGRAMMERS USE G - GRANULARITY B - BIG P - SEGMENT PRESENT C - CONFORMING R - READABLE D - DEFAULT W - WRITABLE DPL - DESCRIPTOR PRIVILEGE LEVEL 6.3.1.1 Type Checking The TYPE field of a descriptor has two functions: 1. It distinguishes among different descriptor formats. 2. It specifies the intended usage of a segment. Besides the descriptors for data and executable segments commonly used by applications programs, the 80386 has descriptors for special segments used by the operating system and for gates. Table 6-1 lists all the types defined for system segments and gates. Note that not all descriptors define segments; gate descriptors have a different purpose that is discussed later in this chapter. The type fields of data and executable segment descriptors include bits which further define the purpose of the segment (refer to Figure 6-1): þ The writable bit in a data-segment descriptor specifies whether instructions can write into the segment. þ The readable bit in an executable-segment descriptor specifies whether instructions are allowed to read from the segment (for example, to access constants that are stored with instructions). A readable, executable segment may be read in two ways: 1. Via the CS register, by using a CS override prefix. 2. By loading a selector of the descriptor into a data-segment register (DS, ES, FS,or GS). Type checking can be used to detect programming errors that would attempt to use segments in ways not intended by the programmer. The processor examines type information on two kinds of occasions: 1. When a selector of a descriptor is loaded into a segment register. Certain segment registers can contain only certain descriptor types; for example: þ The CS register can be loaded only with a selector of an executable segment. þ Selectors of executable segments that are not readable cannot be loaded into data-segment registers. þ Only selectors of writable data segments can be loaded into SS. 2. When an instruction refers (implicitly or explicitly) to a segment register. Certain segments can be used by instructions only in certain predefined ways; for example: þ No instruction may write into an executable segment. þ No instruction may write into a data segment if the writable bit is not set. þ No instruction may read an executable segment unless the readable bit is set. Table 6-1. System and Gate Descriptor Types Code Type of Segment or Gate 0 -reserved 1 Available 286 TSS 2 LDT 3 Busy 286 TSS 4 Call Gate 5 Task Gate 6 286 Interrupt Gate 7 286 Trap Gate 8 -reserved 9 Available 386 TSS A -reserved B Busy 386 TSS C 386 Call Gate D -reserved E 386 Interrupt Gate F 386 Trap Gate 6.3.1.2 Limit Checking The limit field of a segment descriptor is used by the processor to prevent programs from addressing outside the segment. The processor's interpretation of the limit depends on the setting of the G (granularity) bit. For data segments, the processor's interpretation of the limit depends also on the E-bit (expansion-direction bit) and the B-bit (big bit) (refer to Table 6-2). When G=0, the actual limit is the value of the 20-bit limit field as it appears in the descriptor. In this case, the limit may range from 0 to 0FFFFFH (2^(20) - 1 or 1 megabyte). When G=1, the processor appends 12 low-order one-bits to the value in the limit field. In this case the actual limit may range from 0FFFH (2^(12) - 1 or 4 kilobytes) to 0FFFFFFFFH(2^(32) - 1 or 4 gigabytes). For all types of segments except expand-down data segments, the value of the limit is one less than the size (expressed in bytes) of the segment. The processor causes a general-protection exception in any of these cases: þ Attempt to access a memory byte at an address > limit. þ Attempt to access a memory word at an address òlimit. þ Attempt to access a memory doubleword at an address ò(limit-2). For expand-down data segments, the limit has the same function but is interpreted differently. In these cases the range of valid addresses is from limit + 1 to either 64K or 2^(32) - 1 (4 Gbytes) depending on the B-bit. An expand-down segment has maximum size when the limit is zero. The expand-down feature makes it possible to expand the size of a stack by copying it to a larger segment without needing also to update intrastack pointers. The limit field of descriptors for descriptor tables is used by the processor to prevent programs from selecting a table entry outside the descriptor table. The limit of a descriptor table identifies the last valid byte of the last descriptor in the table. Since each descriptor is eight bytes long, the limit value is N * 8 - 1 for a table that can contain up to N descriptors. Limit checking catches programming errors such as runaway subscripts and invalid pointer calculations. Such errors are detected when they occur, so that identification of the cause is easier. Without limit checking, such errors could corrupt other modules; the existence of such errors would not be discovered until later, when the corrupted module behaves incorrectly, and when identification of the cause is difficult. Table 6-2. Useful Combinations of E, G, and B Bits Case: 1 2 3 4 Expansion Direction U U D D G-bit 0 1 0 1 B-bit X X 0 1 Lower bound is: 0 X X LIMIT+1 X shl(LIMIT,12,1)+1 X Upper bound is: LIMIT X shl(LIMIT,12,1) X 64K-1 X 4G-1 X Max seg size is: 64K X 64K-1 X 4G-4K X 4G X Min seg size is: 0 X X 4K X X shl (X, 12, 1) = shift X left by 12 bits inserting one-bits on the right 6.3.1.3 Privilege Levels The concept of privilege is implemented by assigning a value from zero to three to key objects recognized by the processor. This value is called the privilege level. The value zero represents the greatest privilege, the value three represents the least privilege. The following processor-recognized objects contain privilege levels: þ Descriptors contain a field called the descriptor privilege level (DPL). þ Selectors contain a field called the requestor's privilege level (RPL). The RPL is intended to represent the privilege level of the procedure that originates a selector. þ An internal processor register records the current privilege level (CPL). Normally the CPL is equal to the DPL of the segment that the processor is currently executing. CPL changes as control is transferred to segments with differing DPLs. The processor automatically evaluates the right of a procedure to access another segment by comparing the CPL to one or more other privilege levels. The evaluation is performed at the time the selector of a descriptor is loaded into a segment register. The criteria used for evaluating access to data differs from that for evaluating transfers of control to executable segments; therefore, the two types of access are considered separately in the following sections. Figure 6-2 shows how these levels of privilege can be interpreted as rings of protection. The center is for the segments containing the most critical software, usually the kernel of the operating system. Outer rings are for the segments of less critical software. It is not necessary to use all four privilege levels. Existing software that was designed to use only one or two levels of privilege can simply ignore the other levels offered by the 80386. A one-level system should use privilege level zero; a two-level system should use privilege levels zero and three. Figure 6-2. Levels of Privilege TASK C ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ º APPLICATIONS º ³ ³ º ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ º º CUSTOM EXTENSIONS º º ³ ³ º º ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º ³ ³ º º º SYSTEM SERVICES º º º ³ ³ º º º ÉÍÍÍÍÍÍÍÍÍÍÍ» º º º ³ ³ º º º º KERNAL º º º º ³ ÆÍÇÄÄÄÄÄ×ÄÄÄÄÄ×ÄÄÄÄÄ×ÄÄÄÄÄÂÄÄÄÄÄ×ÄÄÄÄÄ×ÄÄÄÄÄ×ÄÄÄÄĶ͵ ³ º º º º ³LEVELºLEVELºLEVELºLEVELº ³ ³ º º º º ³ 0 º 1 º 2 º 3 º ³ ³ º º º ÈÍÍÍÍÍØÍÍÍÍͼ º º º ³ ³ º º º ³ º º º ³ ³ º º ÈÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍͼ º º ³ ³ º º ³ º º ³ ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ º ³ ³ º ³ º ³ TASK B´ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÃTASK A ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 6.3.2 Restricting Access to Data To address operands in memory, an 80386 program must load the selector of a data segment into a data-segment register (DS, ES, FS, GS, SS). The processor automatically evaluates access to a data segment by comparing privilege levels. The evaluation is performed at the time a selector for the descriptor of the target segment is loaded into the data-segment register. As Figure 6-3 shows, three different privilege levels enter into this type of privilege check: 1. The CPL (current privilege level). 2. The RPL (requestor's privilege level) of the selector used to specify the target segment. 3. The DPL of the descriptor of the target segment. Instructions may load a data-segment register (and subsequently use the target segment) only if the DPL of the target segment is numerically greater than or equal to the maximum of the CPL and the selector's RPL. In other words, a procedure can only access data that is at the same or less privileged level. The addressable domain of a task varies as CPL changes. When CPL is zero, data segments at all privilege levels are accessible; when CPL is one, only data segments at privilege levels one through three are accessible; when CPL is three, only data segments at privilege level three are accessible. This property of the 80386 can be used, for example, to prevent applications procedures from reading or changing tables of the operating system. Figure 6-3. Privilege Check for Data Access 16-BIT VISIBLE SELECTOR INVISIBLE DESCRIPTOR ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍËÍÍÍÍÍÍÍÍÍÍÍ» CS º º ºCPLº º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÑÍÊÍÍÍÍÍÍÍÍÍÍͼ ³ TARGET SEGMENT SELECTOR ³ ÉÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍËÍÍÍ» ÀÄÄÄÄÄÄĺ PRIVILEGE º º INDEX º ºRPLÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĺ CHECK º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÊÍÍͼ ÚÄÄÄÄÄÄĺ BY CPU º ³ ÈÍÍÍÍÍÍÍÍÍÍͼ DATA SEGMENT DESCRIPTOR ÚÄÄÄÙ ³ 31 23 15 ³ 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÏÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³A³ LIMIT ³ ³ ³ TYPE ³ º º BASE 31..24 ³G³B³0³V³ ³P³ DPL ³ ³ BASE 23..16 º 4 º ³ ³ ³ ³L³ 19..16 ³ ³ ³1³0³E³W³A³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ³ º º SEGMENT BASE 15..0 ³ SEGMENT LIMIT 15..0 º 0 º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ CPL - CURRENT PRIVILEGE LEVEL RPL - REQUESTOR'S PRIVILEGE LEVEL DPL - DESCRIPTOR PRIVILEGE LEVEL 6.3.2.1 Accessing Data in Code Segments Less common than the use of data segments is the use of code segments to store data. Code segments may legitimately hold constants; it is not possible to write to a segment described as a code segment. The following methods of accessing data in code segments are possible: 1. Load a data-segment register with a selector of a nonconforming, readable, executable segment. 2. Load a data-segment register with a selector of a conforming, readable, executable segment. 3. Use a CS override prefix to read a readable, executable segment whose selector is already loaded in the CS register. The same rules as for access to data segments apply to case 1. Case 2 is always valid because the privilege level of a segment whose conforming bit is set is effectively the same as CPL regardless of its DPL. Case 3 always valid because the DPL of the code segment in CS is, by definition, equal to CPL. 6.3.3 Restricting Control Transfers With the 80386, control transfers are accomplished by the instructions JMP, CALL, RET, INT, and IRET, as well as by the exception and interrupt mechanisms. Exceptions and interrupts are special cases that Chapter 9 covers. This chapter discusses only JMP, CALL, and RET instructions. The "near" forms of JMP, CALL, and RET transfer within the current code segment, and therefore are subject only to limit checking. The processor ensures that the destination of the JMP, CALL, or RET instruction does not exceed the limit of the current executable segment. This limit is cached in the CS register; therefore, protection checks for near transfers require no extra clock cycles. The operands of the "far" forms of JMP and CALL refer to other segments; therefore, the processor performs privilege checking. There are two ways a JMP or CALL can refer to another segment: 1. The operand selects the descriptor of another executable segment. 2. The operand selects a call gate descriptor. This gated form of transfer is discussed in a later section on call gates. As Figure 6-4 shows, two different privilege levels enter into a privilege check for a control transfer that does not use a call gate: 1. The CPL (current privilege level). 2. The DPL of the descriptor of the target segment. Normally the CPL is equal to the DPL of the segment that the processor is currently executing. CPL may, however, be greater than DPL if the conforming bit is set in the descriptor of the current executable segment. The processor keeps a record of the CPL cached in the CS register; this value can be different from the DPL in the descriptor of the code segment. The processor permits a JMP or CALL directly to another segment only if one of the following privilege rules is satisfied: þ DPL of the target is equal to CPL. þ The conforming bit of the target code-segment descriptor is set, and the DPL of the target is less than or equal to CPL. An executable segment whose descriptor has the conforming bit set is called a conforming segment. The conforming-segment mechanism permits sharing of procedures that may be called from various privilege levels but should execute at the privilege level of the calling procedure. Examples of such procedures include math libraries and some exception handlers. When control is transferred to a conforming segment, the CPL does not change. This is the only case when CPL may be unequal to the DPL of the current executable segment. Most code segments are not conforming. The basic rules of privilege above mean that, for nonconforming segments, control can be transferred without a gate only to executable segments at the same level of privilege. There is a need, however, to transfer control to (numerically) smaller privilege levels; this need is met by the CALL instruction when used with call-gate descriptors, which are explained in the next section. The JMP instruction may never transfer control to a nonconforming segment whose DPL does not equal CPL. Figure 6-4. Privilege Check for Control Transfer without Gate 16-BIT VISIBLE SELECTOR INVISIBLE PART ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍËÍÍÍÍÍÍÍÍÍÍÍ» º º ºCPLº º CS ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÑÍÊÍÍÍÍÍÍÍÍÍÍͼ ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍ» ÀÄÄÄÄÄÄĺ PRIVILEGE º ÚÄÄÄÄÄÄÄÄÄÄĺ CHECK º ³ ÚÄÄĺ BY CPU º CODE-SEGMENT DESCRIPTOR ³ ³ ÈÍÍÍÍÍÍÍÍÍÍͼ ³ ³ 31 23 15 ³ ³ 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÏÍÍÑÍÍÍÍØÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³A³ LIMIT ³ ³ ³ ³ ³ º º BASE 31..24 ³G³D³0³V³ ³P³ DPL ³ ³ ³ BASE 23..16 º 4 º ³ ³ ³ ³L³ 19..16 ³ ³ ³1³1³C³R³A³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ³ º º SEGMENT BASE 15..0 ³ SEGMENT LIMIT 15..0 º 0 º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ CPL - CURRENT PRIVILEGE LEVEL DPL - DESCRIPTOR PRIVILEGE LEVEL C - CONFORMING BIT 6.3.4 Gate Descriptors Guard Procedure Entry Points To provide protection for control transfers among executable segments at different privilege levels, the 80386 uses gate descriptors. There are four kinds of gate descriptors: þ Call gates þ Trap gates þ Interrupt gates þ Task gates This chapter is concerned only with call gates. Task gates are used for task switching, and therefore are discussed in Chapter 7. Chapter 9 explains how trap gates and interrupt gates are used by exceptions and interrupts. Figure 6-5 illustrates the format of a call gate. A call gate descriptor may reside in the GDT or in an LDT, but not in the IDT. A call gate has two primary functions: 1. To define an entry point of a procedure. 2. To specify the privilege level of the entry point. Call gate descriptors are used by call and jump instructions in the same manner as code segment descriptors. When the hardware recognizes that the destination selector refers to a gate descriptor, the operation of the instruction is expanded as determined by the contents of the call gate. The selector and offset fields of a gate form a pointer to the entry point of a procedure. A call gate guarantees that all transitions to another segment go to a valid entry point, rather than possibly into the middle of a procedure (or worse, into the middle of an instruction). The far pointer operand of the control transfer instruction does not point to the segment and offset of the target instruction; rather, the selector part of the pointer selects a gate, and the offset is not used. Figure 6-6 illustrates this style of addressing. As Figure 6-7 shows, four different privilege levels are used to check the validity of a control transfer via a call gate: 1. The CPL (current privilege level). 2. The RPL (requestor's privilege level) of the selector used to specify the call gate. 3. The DPL of the gate descriptor. 4. The DPL of the descriptor of the target executable segment. The DPL field of the gate descriptor determines what privilege levels can use the gate. One code segment can have several procedures that are intended for use by different privilege levels. For example, an operating system may have some services that are intended to be used by applications, whereas others may be intended only for use by other systems software. Gates can be used for control transfers to numerically smaller privilege levels or to the same privilege level (though they are not necessary for transfers to the same level). Only CALL instructions can use gates to transfer to smaller privilege levels. A gate may be used by a JMP instruction only to transfer to an executable segment with the same privilege level or to a conforming segment. For a JMP instruction to a nonconforming segment, both of the following privilege rules must be satisfied; otherwise, a general protection exception results. MAX (CPL,RPL) ó gate DPL target segment DPL = CPL For a CALL instruction (or for a JMP instruction to a conforming segment), both of the following privilege rules must be satisfied; otherwise, a general protection exception results. MAX (CPL,RPL) ó gate DPL target segment DPL ó CPL Figure 6-5. Format of 80386 Call Gate 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ TYPE ³ ³ DWORD º º OFFSET 31..16 ³P³ DPL ³ ³0 0 0³ º 4 º ³ ³ ³0 1 1 0 0³ ³ COUNT º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÁÄÄÄÄÄÄÄÄĶ º ³ º º SELECTOR ³ OFFSET 15..0 º 0 º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 6-6. Indirect Transfer via Call Gate OPCODE OFFSET SELECTOR ÉÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍËÍËÍÍÍ» º CALL º (NOT USED) º INDEX º ºRPLº ÈÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÑÍÍÍÊÍÊÍÍͼ ³ DESCRIPTOR TABLE ³ ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ³ º ³ ³ ³ º ³ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ³ º ³ º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ ³   ³   ³   ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ³ GATE º OFFSET ³ DPL ³COUNT ºÄÄÄÄÄÄÄÄÄÄÄÄÄÙ EXECUTABLE DESCRIPTOR ÇÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ SEGMENT ÚÄÄÄÄÄĶ SELECTOR ³ OFFSET ÇÄÄÄÄÄ¿ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ ³ º º ³ º ³ ³ ³ º ³ º º ³ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ³ º º ³ º ³ º ³ º º ³ ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ ³ º º ³ º ³ ³ ³ º ÀÄÄÄÄÄÄÄÄĺ PROCEDURE º ³ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º º ³ º ³ º º º  ÌÍÍÍÍÍÍÑÍÍÍÍÍØÍÍÍÍÍÑÍÍÍÍÍ͹ º º EXECUTABLE º BASE ³ ³ DPL ³ BASE º º º SEGMENT ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÚÄÄÄÄÄÄÄÄÄÈÍÍÍÍÍÍÍÍÍÍÍÍÍͼ DESCRIPTOR º BASE ³ ÇÄÄÄÄÄÙ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ       ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ Figure 6-7. Privilege Check via Call Gate 16-BIT VISIBLE SELECTOR INVISIBLE DESCRIPTOR ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍËÍÍÍÍÍÍÍÍÍÍÍ» CS º º ºCPLº º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÑÍÊÍÍÍÍÍÍÍÍÍÍͼ ³ TARGET SELECTOR ³ ÉÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍËÍÍÍ» ÀÄÄÄÄÄÄĺ PRIVILEGE º º INDEX º ºRPLÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĺ CHECK º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÊÍÍͼ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĺ BY º ³ ÚÄĺ CPU º ÚÄÄÄÄÄÄÙ ³ ÈÍÍÍÍÍÍÍÍÍÍͼ ³ ³ GATE DESCRIPTOR  ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍ» ³ º OFFSET º DPL º COUNT º ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍ͹ ³ º SELECTOR º OFFSET º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÏÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍ» EXECUTABLE º BASE º LIMIT º DPL º BASE º SEGMENT ÌÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍ͹ DESCRIPTOR º BASE º LIMIT º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ CPL - CURRENT PRIVILEGE LEVEL RPL - REQUESTOR'S PRIVILEGE LEVEL DPL - DESCRIPTOR PRIVILEGE LEVEL 6.3.4.1 Stack Switching If the destination code segment of the call gate is at a different privilege level than the CPL, an interlevel transfer is being requested. To maintain system integrity, each privilege level has a separate stack. These stacks assure sufficient stack space to process calls from less privileged levels. Without them, a trusted procedure would not work correctly if the calling procedure did not provide sufficient space on the caller's stack. The processor locates these stacks via the task state segment (see Figure 6-8). Each task has a separate TSS, thereby permitting tasks to have separate stacks. Systems software is responsible for creating TSSs and placing correct stack pointers in them. The initial stack pointers in the TSS are strictly read-only values. The processor never changes them during the course of execution. When a call gate is used to change privilege levels, a new stack is selected by loading a pointer value from the Task State Segment (TSS). The processor uses the DPL of the target code segment (the new CPL) to index the initial stack pointer for PL 0, PL 1, or PL 2. The DPL of the new stack data segment must equal the new CPL; if it does not, a stack exception occurs. It is the responsibility of systems software to create stacks and stack-segment descriptors for all privilege levels that are used. Each stack must contain enough space to hold the old SS:ESP, the return address, and all parameters and local variables that may be required to process a call. As with intralevel calls, parameters for the subroutine are placed on the stack. To make privilege transitions transparent to the called procedure, the processor copies the parameters to the new stack. The count field of a call gate tells the processor how many doublewords (up to 31) to copy from the caller's stack to the new stack. If the count is zero, no parameters are copied. The processor performs the following stack-related steps in executing an interlevel CALL. 1. The new stack is checked to assure that it is large enough to hold the parameters and linkages; if it is not, a stack fault occurs with an error code of 0. 2. The old value of the stack registers SS:ESP is pushed onto the new stack as two doublewords. 3. The parameters are copied. 4. A pointer to the instruction after the CALL instruction (the former value of CS:EIP) is pushed onto the new stack. The final value of SS:ESP points to this return pointer on the new stack. Figure 6-9 illustrates the stack contents after a successful interlevel call. The TSS does not have a stack pointer for a privilege level 3 stack, because privilege level 3 cannot be called by any procedure at any other privilege level. Procedures that may be called from another privilege level and that require more than the 31 doublewords for parameters must use the saved SS:ESP link to access all parameters beyond the last doubleword copied. A call via a call gate does not check the values of the words copied onto the new stack. The called procedure should check each parameter for validity. A later section discusses how the ARPL, VERR, VERW, LSL, and LAR instructions can be used to check pointer values. Figure 6-8. Initial Stack Pointers of TSS 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍ»64       º º ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ͹ º EFLAGS º24 ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ͹ º INSTRUCTION POINTER (EIP) º20 ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ͹ º CR3 (PDBR) º1C ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍØÍÍÍÍÍËÍ͹ Ä¿ º00000000 00000000º SS2 º10º18 ³ ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍØÍÍÍÍÍÊÍ͹ ³ º ESP2 º14 ³ ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍØÍÍÍÍÍËÍ͹ ³ º00000000 00000000º SS1 º01º10 ³ INITIAL ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍØÍÍÍÍÍÊÍ͹ ÃÄ STACK º ESP1 º0C ³ POINTERS ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍØÍÍÍÍÍËÍ͹ ³ º00000000 00000000º SS0 º00º8 ³ ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍØÍÍÍÍÍÊÍ͹ ³ º ESP0 º4 ³ ÌÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ͹ ÄÙ º00000000 00000000º TSS BACK LINK º0 ÈÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍͼ Figure 6-9. Stack Contents after an Interlevel Call 31 0 SS:ESP ÉÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ»ÄÄFROM TSS 31 0 º±±±±±±±³OLD SS º ÉÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ» ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ D O º º º OLD ESP º I F º º ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ R º º º PARM 3 º E E º º ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ C X º º º PARM 2 º T P ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ I A º PARM 3 º º PARM 1 º O N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ N S º PARM 2 º º±±±±±±±³OLD CS º NEW I ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ OLD ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ SS:ESP ³ O º PARM 1 º SS:ESP º OLD EIP º ³ ³ N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ÄÄÄÙ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ÄÄÄÄÄÙ ³ º º º º  º º º º ÈÍÍÍÍÍÍÍØÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍØÍÍÍÍÍÍͼ OLD STACK NEW STACK 6.3.4.2 Returning from a Procedure The "near" forms of the RET instruction transfer control within the current code segment and therefore are subject only to limit checking. The offset of the instruction following the corresponding CALL, is popped from the stack. The processor ensures that this offset does not exceed the limit of the current executable segment. The "far" form of the RET instruction pops the return pointer that was pushed onto the stack by a prior far CALL instruction. Under normal conditions, the return pointer is valid, because of its relation to the prior CALL or INT. Nevertheless, the processor performs privilege checking because of the possibility that the current procedure altered the pointer or failed to properly maintain the stack. The RPL of the CS selector popped off the stack by the return instruction identifies the privilege level of the calling procedure. An intersegment return instruction can change privilege levels, but only toward procedures of lesser privilege. When the RET instruction encounters a saved CS value whose RPL is numerically greater than the CPL, an interlevel return occurs. Such a return follows these steps: 1. The checks shown in Table 6-3 are made, and CS:EIP and SS:ESP are loaded with their former values that were saved on the stack. 2. The old SS:ESP (from the top of the current stack) value is adjusted by the number of bytes indicated in the RET instruction. The resulting ESP value is not compared to the limit of the stack segment. If ESP is beyond the limit, that fact is not recognized until the next stack operation. (The SS:ESP value of the returning procedure is not preserved; normally, this value is the same as that contained in the TSS.) 3. The contents of the DS, ES, FS, and GS segment registers are checked. If any of these registers refer to segments whose DPL is greater than the new CPL (excluding conforming code segments), the segment register is loaded with the null selector (INDEX = 0, TI = 0). The RET instruction itself does not signal exceptions in these cases; however, any subsequent memory reference that attempts to use a segment register that contains the null selector will cause a general protection exception. This prevents less privileged code from accessing more privileged segments using selectors left in the segment registers by the more privileged procedure. 6.3.5 Some Instructions are Reserved for Operating System Instructions that have the power to affect the protection mechanism or to influence general system performance can only be executed by trusted procedures. The 80386 has two classes of such instructions: 1. Privileged instructions ÄÄ those used for system control. 2. Sensitive instructions ÄÄ those used for I/O and I/O related activities. Table 6-3. Interlevel Return Checks Type of Check Exception SF Stack Fault GP General Protection Exception NP Segment-Not-Present Exception Error Code ESP is within current SS segment SF 0 ESP + 7 is within current SS segment SF 0 RPL of return CS is greater than CPL GP Return CS Return CS selector is not null GP Return CS Return CS segment is within descriptor table limit GP Return CS Return CS descriptor is a code segment GP Return CS Return CS segment is present NP Return CS DPL of return nonconforming code segment = RPL of return CS, or DPL of return conforming code segment ó RPL of return CS GP Return CS ESP + N + 15 is within SS segment N Immediate Operand of RET N Instruction SF Return SS SS selector at ESP + N + 12 is not null GP Return SS SS selector at ESP + N + 12 is within descriptor table limit GP Return SS SS descriptor is writable data segment GP Return SS SS segment is present SF Return SS Saved SS segment DPL = RPL of saved CS GP Return SS Saved SS selector RPL = Saved SS segment DPL GP Return SS 6.3.5.1 Privileged Instructions The instructions that affect system data structures can only be executed when CPL is zero. If the CPU encounters one of these instructions when CPL is greater than zero, it signals a general protection exception. These instructions include: CLTS ÄÄ Clear TaskÄSwitched Flag HLT ÄÄ Halt Processor LGDT ÄÄ Load GDL Register LIDT ÄÄ Load IDT Register LLDT ÄÄ Load LDT Register LMSW ÄÄ Load Machine Status Word LTR ÄÄ Load Task Register MOV to/from CRn ÄÄ Move to Control Register n MOV to /from DRn ÄÄ Move to Debug Register n MOV to/from TRn ÄÄ Move to Test Register n 6.3.5.2 Sensitive Instructions Instructions that deal with I/O need to be restricted but also need to be executed by procedures executing at privilege levels other than zero. The mechanisms for restriction of I/O operations are covered in detail in Chapter 8, "Input/Output". 6.3.6 Instructions for Pointer Validation Pointer validation is an important part of locating programming errors. Pointer validation is necessary for maintaining isolation between the privilege levels. Pointer validation consists of the following steps: 1. Check if the supplier of the pointer is entitled to access the segment. 2. Check if the segment type is appropriate to its intended use. 3. Check if the pointer violates the segment limit. Although the 80386 processor automatically performs checks 2 and 3 during instruction execution, software must assist in performing the first check. The unprivileged instruction ARPL is provided for this purpose. Software can also explicitly perform steps 2 and 3 to check for potential violations (rather than waiting for an exception). The unprivileged instructions LAR, LSL, VERR, and VERW are provided for this purpose. LAR (Load Access Rights) is used to verify that a pointer refers to a segment of the proper privilege level and type. LAR has one operandÄÄa selector for a descriptor whose access rights are to be examined. The descriptor must be visible at the privilege level which is the maximum of the CPL and the selector's RPL. If the descriptor is visible, LAR obtains a masked form of the second doubleword of the descriptor, masks this value with 00FxFF00H, stores the result into the specified 32-bit destination register, and sets the zero flag. (The x indicates that the corresponding four bits of the stored value are undefined.) Once loaded, the access-rights bits can be tested. All valid descriptor types can be tested by the LAR instruction. If the RPL or CPL is greater than DPL, or if the selector is outside the table limit, no access-rights value is returned, and the zero flag is cleared. Conforming code segments may be accessed from any privilege level. LSL (Load Segment Limit) allows software to test the limit of a descriptor. If the descriptor denoted by the given selector (in memory or a register) is visible at the CPL, LSL loads the specified 32-bit register with a 32-bit, byte granular, unscrambled limit that is calculated from fragmented limit fields and the G-bit of that descriptor. This can only be done for segments (data, code, task state, and local descriptor tables); gate descriptors are inaccessible. (Table 6-4 lists in detail which types are valid and which are not.) Interpreting the limit is a function of the segment type. For example, downward expandable data segments treat the limit differently than code segments do. For both LAR and LSL, the zero flag (ZF) is set if the loading was performed; otherwise, the ZF is cleared. Table 6-4. Valid Descriptor Types for LSL Type Descriptor Type Valid? Code 0 (invalid) NO 1 Available 286 TSS YES 2 LDT YES 3 Busy 286 TSS YES 4 286 Call Gate NO 5 Task Gate NO 6 286 Trap Gate NO 7 286 Interrupt Gate NO 8 (invalid) NO 9 Available 386 TSS YES A (invalid) NO B Busy 386 TSS YES C 386 Call Gate NO D (invalid) NO E 386 Trap Gate NO F 386 Interrupt Gate NO 6.3.6.1 Descriptor Validation The 80386 has two instructions, VERR and VERW, which determine whether a selector points to a segment that can be read or written at the current privilege level. Neither instruction causes a protection fault if the result is negative. VERR (Verify for Reading) verifies a segment for reading and loads ZF with 1 if that segment is readable from the current privilege level. VERR checks that: þ The selector points to a descriptor within the bounds of the GDT or LDT. þ It denotes a code or data segment descriptor. þ The segment is readable and of appropriate privilege level. The privilege check for data segments and nonconforming code segments is that the DPL must be numerically greater than or equal to both the CPL and the selector's RPL. Conforming segments are not checked for privilege level. VERW (Verify for Writing) provides the same capability as VERR for verifying writability. Like the VERR instruction, VERW loads ZF if the result of the writability check is positive. The instruction checks that the descriptor is within bounds, is a segment descriptor, is writable, and that its DPL is numerically greater or equal to both the CPL and the selector's RPL. Code segments are never writable, conforming or not. 6.3.6.2 Pointer Integrity and RPL The Requestor's Privilege Level (RPL) feature can prevent inappropriate use of pointers that could corrupt the operation of more privileged code or data from a less privileged level. A common example is a file system procedure, FREAD (file_id, n_bytes, buffer_ptr). This hypothetical procedure reads data from a file into a buffer, overwriting whatever is there. Normally, FREAD would be available at the user level, supplying only pointers to the file system procedures and data located and operating at a privileged level. Normally, such a procedure prevents user-level procedures from directly changing the file tables. However, in the absence of a standard protocol for checking pointer validity, a user-level procedure could supply a pointer into the file tables in place of its buffer pointer, causing the FREAD procedure to corrupt them unwittingly. Use of RPL can avoid such problems. The RPL field allows a privilege attribute to be assigned to a selector. This privilege attribute would normally indicate the privilege level of the code which generated the selector. The 80386 processor automatically checks the RPL of any selector loaded into a segment register to determine whether the RPL allows access. To take advantage of the processor's checking of RPL, the called procedure need only ensure that all selectors passed to it have an RPL at least as high (numerically) as the original caller's CPL. This action guarantees that selectors are not more trusted than their supplier. If one of the selectors is used to access a segment that the caller would not be able to access directly, i.e., the RPL is numerically greater than the DPL, then a protection fault will result when that selector is loaded into a segment register. ARPL (Adjust Requestor's Privilege Level) adjusts the RPL field of a selector to become the larger of its original value and the value of the RPL field in a specified register. The latter is normally loaded from the image of the caller's CS register which is on the stack. If the adjustment changes the selector's RPL, ZF (the zero flag) is set; otherwise, ZF is cleared. 6.4 Page-Level Protection Two kinds of protection are related to pages: 1. Restriction of addressable domain. 2. Type checking. 6.4.1 Page-Table Entries Hold Protection Parameters Figure 6-10 highlights the fields of PDEs and PTEs that control access to pages. Figure 6-10. Protection Fields of Page Table Entries 31 12 11 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÑÍÍÍÑÍÑÍÑÍÍÍÑÍÑÍÑÍ» º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³±±±±±±±³±±±³±³±³±±±³U³R³±º º±±±±±±PAGE FRAME ADDRESS 31..12±±±±±±±³±AVAIL±³0±0³D³A³0±0³/³/³Pº º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³±±±±±±±³±±±³±³±³±±±³S³W³±º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÏÍÍÍÏÍÏÍÏÍÍÍÏÍÏÍÏͼ R/W - READ/WRITE U/S - USER/SUPERVISOR 6.4.1.1 Restricting Addressable Domain The concept of privilege for pages is implemented by assigning each page to one of two levels: 1. Supervisor level (U/S=0) ÄÄ for the operating system and other systems software and related data. 2. User level (U/S=1) ÄÄ for applications procedures and data. The current level (U or S) is related to CPL. If CPL is 0, 1, or 2, the processor is executing at supervisor level. If CPL is 3, the processor is executing at user level. When the processor is executing at supervisor level, all pages are addressable, but, when the processor is executing at user level, only pages that belong to the user level are addressable. 6.4.1.2 Type Checking At the level of page addressing, two types are defined: 1. Read-only access (R/W=0) 2. Read/write access (R/W=1) When the processor is executing at supervisor level, all pages are both readable and writable. When the processor is executing at user level, only pages that belong to user level and are marked for read/write access are writable; pages that belong to supervisor level are neither readable nor writable from user level. 6.4.2 Combining Protection of Both Levels of Page Tables For any one page, the protection attributes of its page directory entry may differ from those of its page table entry. The 80386 computes the effective protection attributes for a page by examining the protection attributes in both the directory and the page table. Table 6-5 shows the effective protection provided by the possible combinations of protection attributes. 6.4.3 Overrides to Page Protection Certain accesses are checked as if they are privilege-level 0 references, even if CPL = 3: þ LDT, GDT, TSS, IDT references. þ Access to inner stack during ring-crossing CALL/INT. 6.5 Combining Page and Segment Protection When paging is enabled, the 80386 first evaluates segment protection, then evaluates page protection. If the processor detects a protection violation at either the segment or the page level, the requested operation cannot proceed; a protection exception occurs instead. For example, it is possible to define a large data segment which has some subunits that are read-only and other subunits that are read-write. In this case, the page directory (or page table) entries for the read-only subunits would have the U/S and R/W bits set to x0, indicating no write rights for all the pages described by that directory entry (or for individual pages). This technique might be used, for example, in a UNIX-like system to define a large data segment, part of which is read only (for shared data or ROMmed constants). This enables UNIX-like systems to define a "flat" data space as one large segment, use "flat" pointers to address within this "flat" space, yet be able to protect shared data, shared files mapped into the virtual space, and supervisor areas. Table 6-5. Combining Directory and Page Protection Page Directory Entry Page Table Entry Combined Protection U/S R/W U/S R/W U/S R/W S-0 R-0 S-0 R-0 S x S-0 R-0 S-0 W-1 S x S-0 R-0 U-1 R-0 S x S-0 R-0 U-1 W-1 S x S-0 W-1 S-0 R-0 S x S-0 W-1 S-0 W-1 S x S-0 W-1 U-1 R-0 S x S-0 W-1 U-1 W-1 S x U-1 R-0 S-0 R-0 S x U-1 R-0 S-0 W-1 S x U-1 R-0 U-1 R-0 U R U-1 R-0 U-1 W-1 U R U-1 W-1 S-0 R-0 S x U-1 W-1 S-0 W-1 S x U-1 W-1 U-1 R-0 U R U-1 W-1 U-1 W-1 U W ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTE S ÄÄ Supervisor R ÄÄ Read only U ÄÄ User W ÄÄ Read and Write x indicates that when the combined U/S attribute is S, the R/W attribute is not checked. Chapter 7 Multitasking ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ To provide efficient, protected multitasking, the 80386 employs several special data structures. It does not, however, use special instructions to control multitasking; instead, it interprets ordinary control-transfer instructions differently when they refer to the special data structures. The registers and data structures that support multitasking are: þ Task state segment þ Task state segment descriptor þ Task register þ Task gate descriptor With these structures the 80386 can rapidly switch execution from one task to another, saving the context of the original task so that the task can be restarted later. In addition to the simple task switch, the 80386 offers two other task-management features: 1. Interrupts and exceptions can cause task switches (if needed in the system design). The processor not only switches automatically to the task that handles the interrupt or exception, but it automatically switches back to the interrupted task when the interrupt or exception has been serviced. Interrupt tasks may interrupt lower-priority interrupt tasks to any depth. 2. With each switch to another task, the 80386 can also switch to another LDT and to another page directory. Thus each task can have a different logical-to-linear mapping and a different linear-to-physical mapping. This is yet another protection feature, because tasks can be isolated and prevented from interfering with one another. 7.1 Task State Segment All the information the processor needs in order to manage a task is stored in a special type of segment, a task state segment (TSS). Figure 7-1 shows the format of a TSS for executing 80386 tasks. (Another format is used for executing 80286 tasks; refer to Chapter 13.) The fields of a TSS belong to two classes: 1. A dynamic set that the processor updates with each switch from the task. This set includes the fields that store: þ The general registers (EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI). þ The segment registers (ES, CS, SS, DS, FS, GS). þ The flags register (EFLAGS). þ The instruction pointer (EIP). þ The selector of the TSS of the previously executing task (updated only when a return is expected). 2. A static set that the processor reads but does not change. This set includes the fields that store: þ The selector of the task's LDT. þ The register (PDBR) that contains the base address of the task's page directory (read only when paging is enabled). þ Pointers to the stacks for privilege levels 0-2. þ The T-bit (debug trap bit) which causes the processor to raise a debug exception when a task switch occurs. (Refer to Chapter 12 for more information on debugging.) þ The I/O map base (refer to Chapter 8 for more information on the use of the I/O map). Task state segments may reside anywhere in the linear space. The only case that requires caution is when the TSS spans a page boundary and the higher-addressed page is not present. In this case, the processor raises an exception if it encounters the not-present page while reading the TSS during a task switch. Such an exception can be avoided by either of two strategies: 1. By allocating the TSS so that it does not cross a page boundary. 2. By ensuring that both pages are either both present or both not-present at the time of a task switch. If both pages are not-present, then the page-fault handler must make both pages present before restarting the instruction that caused the task switch. Figure 7-1. 80386 32-Bit Task State Segment 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍËÍ» º I/O MAP BASE º 0 0 0 0 0 0 0 0 0 0 0 0 0 ºTº64 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÐĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º LDT º60 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º GS º5C ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º FS º58 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º DS º54 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º SS º50 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º CS º4C ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º ES º48 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º EDI º44 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ESI º40 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º EBP º3C ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ESP º38 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º EBX º34 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º EDX º30 ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º ECX º2C ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º EAX º28 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º EFLAGS º24 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º INSTRUCTION POINTER (EIP) º20 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º CR3 (PDPR) º1C ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º SS2 º18 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ESP2 º14 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º SS1 º10 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ESP1 º0C ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º SS0 º8 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ESP0 º4 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ×ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0º BACK LINK TO PREVIOUS TSS º0 ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTE 0 MEANS INTEL RESERVED. DO NOT DEFINE. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 7.2 TSS Descriptor The task state segment, like all other segments, is defined by a descriptor. Figure 7-2 shows the format of a TSS descriptor. The B-bit in the type field indicates whether the task is busy. A type code of 9 indicates a non-busy task; a type code of 11 indicates a busy task. Tasks are not reentrant. The B-bit allows the processor to detect an attempt to switch to a task that is already busy. The BASE, LIMIT, and DPL fields and the G-bit and P-bit have functions similar to their counterparts in data-segment descriptors. The LIMIT field, however, must have a value equal to or greater than 103. An attempt to switch to a task whose TSS descriptor has a limit less that 103 causes an exception. A larger limit is permissible, and a larger limit is required if an I/O permission map is present. A larger limit may also be convenient for systems software if additional data is stored in the same segment as the TSS. A procedure that has access to a TSS descriptor can cause a task switch. In most systems the DPL fields of TSS descriptors should be set to zero, so that only trusted software has the right to perform task switching. Having access to a TSS-descriptor does not give a procedure the right to read or modify a TSS. Reading and modification can be accomplished only with another descriptor that redefines the TSS as a data segment. An attempt to load a TSS descriptor into any of the segment registers (CS, SS, DS, ES, FS, GS) causes an exception. TSS descriptors may reside only in the GDT. An attempt to identify a TSS with a selector that has TI=1 (indicating the current LDT) results in an exception. Figure 7-2. TSS Descriptor for 32-bit TSS 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÑÍÑÍÑÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³A³ LIMIT ³ ³ ³ TYPE ³ º º BASE 31..24 ³G³0³0³V³ ³P³ DPL ³ ³ BASE 23..16 º 4 º ³ ³ ³ ³L³ 19..16 ³ ³ ³0³1³0³B³1³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ³ º º BASE 15..0 ³ LIMIT 15..0 º 0 º ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 7.3 Task Register The task register (TR) identifies the currently executing task by pointing to the TSS. Figure 7-3 shows the path by which the processor accesses the current TSS. The task register has both a "visible" portion (i.e., can be read and changed by instructions) and an "invisible" portion (maintained by the processor to correspond to the visible portion; cannot be read by any instruction). The selector in the visible portion selects a TSS descriptor in the GDT. The processor uses the invisible portion to cache the base and limit values from the TSS descriptor. Holding the base and limit in a register makes execution of the task more efficient, because the processor does not need to repeatedly fetch these values from memory when it references the TSS of the current task. The instructions LTR and STR are used to modify and read the visible portion of the task register. Both instructions take one operand, a 16-bit selector located in memory or in a general register. LTR (Load task register) loads the visible portion of the task register with the selector operand, which must select a TSS descriptor in the GDT. LTR also loads the invisible portion with information from the TSS descriptor selected by the operand. LTR is a privileged instruction; it may be executed only when CPL is zero. LTR is generally used during system initialization to give an initial value to the task register; thereafter, the contents of TR are changed by task switch operations. STR (Store task register) stores the visible portion of the task register in a general register or memory word. STR is not privileged. Figure 7-3. Task Register ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º TASK STATE º º SEGMENT ºÄÄÄÄÄÄÄÄÄ¿ º º ³ º º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ 16-BIT VISIBLE  ³ REGISTER ³ HIDDEN REGISTER ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍ» TR º SELECTOR º (BASE) º (LIMT) º ÈÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³   ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ ³ GLOBAL DESCRIPTOR TABLE ³ ³ ³ ÕÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͸ ³ ³ ³ | TSS DESCRIPTOR | ³ ³ ³ ÉÍÍÍÍÍÍËÍÍÍÍÍËÍÍÍÍÍËÍÍÍÍÍÍ» ³ ³ ³ º º º º ÇÄÄÄÙ ³ ³ ÌÍÍÍÍÍÍÊÍÍÍÍÍÎÍÍÍÍÍÊÍÍÍÍÍ͹ ³ ÀÄÄÄÄÄÄĺ º ÇÄÄÄÄÄÄÄÙ ÈÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍͼ | | ÔÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ; 7.4 Task Gate Descriptor A task gate descriptor provides an indirect, protected reference to a TSS. Figure 7-4 illustrates the format of a task gate. The SELECTOR field of a task gate must refer to a TSS descriptor. The value of the RPL in this selector is not used by the processor. The DPL field of a task gate controls the right to use the descriptor to cause a task switch. A procedure may not select a task gate descriptor unless the maximum of the selector's RPL and the CPL of the procedure is numerically less than or equal to the DPL of the descriptor. This constraint prevents untrusted procedures from causing a task switch. (Note that when a task gate is used, the DPL of the target TSS descriptor is not used for privilege checking.) A procedure that has access to a task gate has the power to cause a task switch, just as a procedure that has access to a TSS descriptor. The 80386 has task gates in addition to TSS descriptors to satisfy three needs: 1. The need for a task to have a single busy bit. Because the busy-bit is stored in the TSS descriptor, each task should have only one such descriptor. There may, however, be several task gates that select the single TSS descriptor. 2. The need to provide selective access to tasks. Task gates fulfill this need, because they can reside in LDTs and can have a DPL that is different from the TSS descriptor's DPL. A procedure that does not have sufficient privilege to use the TSS descriptor in the GDT (which usually has a DPL of 0) can still switch to another task if it has access to a task gate for that task in its LDT. With task gates, systems software can limit the right to cause task switches to specific tasks. 3. The need for an interrupt or exception to cause a task switch. Task gates may also reside in the IDT, making it possible for interrupts and exceptions to cause task switching. When interrupt or exception vectors to an IDT entry that contains a task gate, the 80386 switches to the indicated task. Thus, all tasks in the system can benefit from the protection afforded by isolation from interrupt tasks. Figure 7-5 illustrates how both a task gate in an LDT and a task gate in the IDT can identify the same task. Figure 7-4. Task Gate Descriptor 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÑÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ ³ ³ ³±±±±±±±±±±±±±±±±±º º±±±±±±±±±±±±(NOT USED)±±±±±±±±±±±±³P³ DPL ³0 0 1 0 1³±±±(NOT USED)±±±±º 4 º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ ³ ³ ³±±±±±±±±±±±±±±±±±º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÁÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º ³±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±º º SELECTOR ³±±±±±±±±±±±±(NOT USED)±±±±±±±±±±±±±º 0 º ³±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 7-5. Task Gate Indirectly Identifies Task LOCAL DESCRIPTOR TABLE INTERRUPT DESCRIPTOR TABLE ÕÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͸ ÕÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͸ | | | | | TASK GATE | | TASK GATE | ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» º ³ ³ ³ º º ³ ³ ³ º ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÚÄĶ ³ º ÚÄĶ ³ º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ ³ | | ³ | | ³ | | ³ | | ³ ÔÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ; ³ ÔÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ; ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ GLOBAL DESCRIPTOR TABLE ³ ³ ÕÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͸ ³ ³ | | ³ ³ | TASK DESCRIPTOR | ³ ³ ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ³ ³ º ³ ³ ³ º ³ ÀÄÇÄÄÄÄÄÄÁÄÄÄÄÄÅÄÄÄÄÄÁÄÄÄÄÄĶ ÀÄÄÄĺ ³ ÇÄÄ¿ ÈÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍͼ ³ | | ³ | | ³ ÔÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ; ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ º º ³ º º ³ º º ³ º TASK STATE º ³ º SEGMENT º ³ º º ³ º º ³ º º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼÄÙ 7.5 Task Switching The 80386 switches execution to another task in any of four cases: 1. The current task executes a JMP or CALL that refers to a TSS descriptor. 2. The current task executes a JMP or CALL that refers to a task gate. 3. An interrupt or exception vectors to a task gate in the IDT. 4. The current task executes an IRET when the NT flag is set. JMP, CALL, IRET, interrupts, and exceptions are all ordinary mechanisms of the 80386 that can be used in circumstances that do not require a task switch. Either the type of descriptor referenced or the NT (nested task) bit in the flag word distinguishes between the standard mechanism and the variant that causes a task switch. To cause a task switch, a JMP or CALL instruction can refer either to a TSS descriptor or to a task gate. The effect is the same in either case: the 80386 switches to the indicated task. An exception or interrupt causes a task switch when it vectors to a task gate in the IDT. If it vectors to an interrupt or trap gate in the IDT, a task switch does not occur. Refer to Chapter 9 for more information on the interrupt mechanism. Whether invoked as a task or as a procedure of the interrupted task, an interrupt handler always returns control to the interrupted procedure in the interrupted task. If the NT flag is set, however, the handler is an interrupt task, and the IRET switches back to the interrupted task. A task switching operation involves these steps: 1. Checking that the current task is allowed to switch to the designated task. Data-access privilege rules apply in the case of JMP or CALL instructions. The DPL of the TSS descriptor or task gate must be less than or equal to the maximum of CPL and the RPL of the gate selector. Exceptions, interrupts, and IRETs are permitted to switch tasks regardless of the DPL of the target task gate or TSS descriptor. 2. Checking that the TSS descriptor of the new task is marked present and has a valid limit. Any errors up to this point occur in the context of the outgoing task. Errors are restartable and can be handled in a way that is transparent to applications procedures. 3. Saving the state of the current task. The processor finds the base address of the current TSS cached in the task register. It copies the registers into the current TSS (EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, ES, CS, SS, DS, FS, GS, and the flag register). The EIP field of the TSS points to the instruction after the one that caused the task switch. 4. Loading the task register with the selector of the incoming task's TSS descriptor, marking the incoming task's TSS descriptor as busy, and setting the TS (task switched) bit of the MSW. The selector is either the operand of a control transfer instruction or is taken from a task gate. 5. Loading the incoming task's state from its TSS and resuming execution. The registers loaded are the LDT register; the flag register; the general registers EIP, EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI; the segment registers ES, CS, SS, DS, FS, and GS; and PDBR. Any errors detected in this step occur in the context of the incoming task. To an exception handler, it appears that the first instruction of the new task has not yet executed. Note that the state of the outgoing task is always saved when a task switch occurs. If execution of that task is resumed, it starts after the instruction that caused the task switch. The registers are restored to the values they held when the task stopped executing. Every task switch sets the TS (task switched) bit in the MSW (machine status word). The TS flag is useful to systems software when a coprocessor (such as a numerics coprocessor) is present. The TS bit signals that the context of the coprocessor may not correspond to the current 80386 task. Chapter 11 discusses the TS bit and coprocessors in more detail. Exception handlers that field task-switch exceptions in the incoming task (exceptions due to tests 4 thru 16 of Table 7-1) should be cautious about taking any action that might load the selector that caused the exception. Such an action will probably cause another exception, unless the exception handler first examines the selector and fixes any potential problem. The privilege level at which execution resumes in the incoming task is neither restricted nor affected by the privilege level at which the outgoing task was executing. Because the tasks are isolated by their separate address spaces and TSSs and because privilege rules can be used to prevent improper access to a TSS, no privilege rules are needed to constrain the relation between the CPLs of the tasks. The new task begins executing at the privilege level indicated by the RPL of the CS selector value that is loaded from the TSS. Table 7-1. Checks Made during a Task Switch Test Test Description Exception NP = Segment-not-present exception, GP = General protection fault, TS = Invalid TSS, SF = Stack fault Error Code Selects 1 Incoming TSS descriptor is NP Incoming TSS present 2 Incoming TSS descriptor is GP Incoming TSS marked not-busy 3 Limit of incoming TSS is TS Incoming TSS greater than or equal to 103 ÄÄ All register and selector values are loaded ÄÄ 4 LDT selector of incoming TS Incoming TSS task is valid 5 LDT of incoming task is TS Incoming TSS present 6 CS selector is valid Validity tests of a selector check that the selector is in the proper table (eg., the LDT selector refers to the GDT), lies within the bounds of the table, and refers to the proper type of descriptor (e.g., the LDT selector refers to an LDT descriptor). TS Code segment 7 Code segment is present NP Code segment 8 Code segment DPL matches TS Code segment CS RPL 9 Stack segment is valid Validity tests of a selector check that the selector is in the proper table (eg., the LDT selector refers to the GDT), lies within the bounds of the table, and refers to the proper type of descriptor (e.g., the LDT selector refers to an LDT descriptor). GP Stack segment 10 Stack segment is present SF Stack segment 11 Stack segment DPL = CPL SF Stack segment 12 Stack-selector RPL = CPL GP Stack segment 13 DS, ES, FS, GS selectors are GP Segment valid Validity tests of a selector check that the selector is in the proper table (eg., the LDT selector refers to the GDT), lies within the bounds of the table, and refers to the proper type of descriptor (e.g., the LDT selector refers to an LDT descriptor). 14 DS, ES, FS, GS segments GP Segment are readable 15 DS, ES, FS, GS segments NP Segment are present 16 DS, ES, FS, GS segment DPL GP Segment ò CPL (unless these are conforming segments) 7.6 Task Linking The back-link field of the TSS and the NT (nested task) bit of the flag word together allow the 80386 to automatically return to a task that CALLed another task or was interrupted by another task. When a CALL instruction, an interrupt instruction, an external interrupt, or an exception causes a switch to a new task, the 80386 automatically fills the back-link of the new TSS with the selector of the outgoing task's TSS and, at the same time, sets the NT bit in the new task's flag register. The NT flag indicates whether the back-link field is valid. The new task releases control by executing an IRET instruction. When interpreting an IRET, the 80386 examines the NT flag. If NT is set, the 80386 switches back to the task selected by the back-link field. Table 7-2 summarizes the uses of these fields. Table 7-2. Effect of Task Switch on BUSY, NT, and Back-Link Affected Field Effect of JMP Effect of Effect of Instruction CALL Instruction IRET Instruction Busy bit of Set, must be Set, must be 0 Unchanged, incoming task 0 before before must be set Busy bit of Cleared Unchanged Cleared outgoing task (already set) NT bit of Cleared Set Unchanged incoming task NT bit of Unchanged Unchanged Cleared outgoing task Back-link of Unchanged Set to outgoing Unchanged incoming task TSS selector Back-link of Unchanged Unchanged Unchanged outgoing task 7.6.1 Busy Bit Prevents Loops The B-bit (busy bit) of the TSS descriptor ensures the integrity of the back-link. A chain of back-links may grow to any length as interrupt tasks interrupt other interrupt tasks or as called tasks call other tasks. The busy bit ensures that the CPU can detect any attempt to create a loop. A loop would indicate an attempt to reenter a task that is already busy; however, the TSS is not a reentrable resource. The processor uses the busy bit as follows: 1. When switching to a task, the processor automatically sets the busy bit of the new task. 2. When switching from a task, the processor automatically clears the busy bit of the old task if that task is not to be placed on the back-link chain (i.e., the instruction causing the task switch is JMP or IRET). If the task is placed on the back-link chain, its busy bit remains set. 3. When switching to a task, the processor signals an exception if the busy bit of the new task is already set. By these actions, the processor prevents a task from switching to itself or to any task that is on a back-link chain, thereby preventing invalid reentry into a task. The busy bit is effective even in multiprocessor configurations, because the processor automatically asserts a bus lock when it sets or clears the busy bit. This action ensures that two processors do not invoke the same task at the same time. (Refer to Chapter 11 for more on multiprocessing.) 7.6.2 Modifying Task Linkages Any modification of the linkage order of tasks should be accomplished only by software that can be trusted to correctly update the back-link and the busy-bit. Such changes may be needed to resume an interrupted task before the task that interrupted it. Trusted software that removes a task from the back-link chain must follow one of the following policies: 1. First change the back-link field in the TSS of the interrupting task, then clear the busy-bit in the TSS descriptor of the task removed from the list. 2. Ensure that no interrupts occur between updating the back-link chain and the busy bit. 7.7 Task Address Space The LDT selector and PDBR fields of the TSS give software systems designers flexibility in utilization of segment and page mapping features of the 80386. By appropriate choice of the segment and page mappings for each task, tasks may share address spaces, may have address spaces that are largely distinct from one another, or may have any degree of sharing between these two extremes. The ability for tasks to have distinct address spaces is an important aspect of 80386 protection. A module in one task cannot interfere with a module in another task if the modules do not have access to the same address spaces. The flexible memory management features of the 80386 allow systems designers to assign areas of shared address space to those modules of different tasks that are designed to cooperate with each other. 7.7.1 Task Linear-to-Physical Space Mapping The choices for arranging the linear-to-physical mappings of tasks fall into two general classes: 1. One linear-to-physical mapping shared among all tasks. When paging is not enabled, this is the only possibility. Without page tables, all linear addresses map to the same physical addresses. When paging is enabled, this style of linear-to-physical mapping results from using one page directory for all tasks. The linear space utilized may exceed the physical space available if the operating system also implements page-level virtual memory. 2. Several partially overlapping linear-to-physical mappings. This style is implemented by using a different page directory for each task. Because the PDBR (page directory base register) is loaded from the TSS with each task switch, each task may have a different page directory. In theory, the linear address spaces of different tasks may map to completely distinct physical addresses. If the entries of different page directories point to different page tables and the page tables point to different pages of physical memory, then the tasks do not share any physical addresses. In practice, some portion of the linear address spaces of all tasks must map to the same physical addresses. The task state segments must lie in a common space so that the mapping of TSS addresses does not change while the processor is reading and updating the TSSs during a task switch. The linear space mapped by the GDT should also be mapped to a common physical space; otherwise, the purpose of the GDT is defeated. Figure 7-6 shows how the linear spaces of two tasks can overlap in the physical space by sharing page tables. 7.7.2 Task Logical Address Space By itself, a common linear-to-physical space mapping does not enable sharing of data among tasks. To share data, tasks must also have a common logical-to-linear space mapping; i.e., they must also have access to descriptors that point into a shared linear address space. There are three ways to create common logical-to-physical address-space mappings: 1. Via the GDT. All tasks have access to the descriptors in the GDT. If those descriptors point into a linear-address space that is mapped to a common physical-address space for all tasks, then the tasks can share data and instructions. 2. By sharing LDTs. Two or more tasks can use the same LDT if the LDT selectors in their TSSs select the same LDT segment. Those LDT-resident descriptors that point into a linear space that is mapped to a common physical space permit the tasks to share physical memory. This method of sharing is more selective than sharing by the GDT; the sharing can be limited to specific tasks. Other tasks in the system may have different LDTs that do not give them access to the shared areas. 3. By descriptor aliases in LDTs. It is possible for certain descriptors of different LDTs to point to the same linear address space. If that linear address space is mapped to the same physical space by the page mapping of the tasks involved, these descriptors permit the tasks to share the common space. Such descriptors are commonly called "aliases". This method of sharing is even more selective than the prior two; other descriptors in the LDTs may point to distinct linear addresses or to linear addresses that are not shared. Figure 7-6. Partially-Overlapping Linear Spaces TSSs PAGE FRAMES ÉÍÍÍÍÍÍÍÍÍÍ» TASK A TSS PAGE DIRECTORIES PAGE TABLES º TASK A º ÉÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍ» Úĺ PAGE º º º º º º º ³ ÈÍÍÍÍÍÍÍÍÍͼ º º ÌÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ ÉÍÍÍÍÍÍÍÍÍÍ» º º º º º PTE ÇÄÄÙ º TASK A º º º ÌÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ Úĺ PAGE º º º º º º PTE ÇÄÄÙ ÈÍÍÍÍÍÍÍÍÍͼ ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ ÉÍÍÍÍÍÍÍÍÍÍ» º PDBR ÇÄÄÄĺ PDE ÇÄÄÄĺ PTE ÇÄÄ¿ º TASK A º ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ ÈÍÍÍÍÍÍÍÍÍÍͼ Àĺ PAGE º º º º PDE ÇÄÄ¿ SHARED PT ÈÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍͼ ³ ÉÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍ» ³ º º º SHARED º ³ ÌÍÍÍÍÍÍÍÍÍÍ͹ Úĺ PAGE º ³ º º ³ ÈÍÍÍÍÍÍÍÍÍͼ ³ ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ ÉÍÍÍÍÍÍÍÍÍÍ» ³ º PTE ÇÄÄÙ º SHARED º ³ ÌÍÍÍÍÍÍÍÍÍÍ͹ Úĺ PAGE º Ãĺ PTE ÇÄÄÙ ÈÍÍÍÍÍÍÍÍÍͼ TASK B TSS ³ ÈÍÍÍÍÍÍÍÍÍÍͼ ÉÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍ» ³ º TASK B º º º º º ³ ÚÄĺ PAGE º º º ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ ÉÍÍÍÍÍÍÍÍÍÍÍ» ³ ÈÍÍÍÍÍÍÍÍÍͼ º º º º ³ º º ³ ÉÍÍÍÍÍÍÍÍÍÍ» º º ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ º TASK B º º º º º ³ º º ³ Úº PAGE º ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ ÈÍÍÍÍÍÍÍÍÍͼ º PDBR ÇÄÄÄĺ PDE ÇÄÄÙ º PTE ÇÄÙ ³ PAGE FRAMES ÌÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍ͹ ³ º º º PDE ÇÄÄÄĺ PTE ÇÄÄÄÙ ÈÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍͼ TSSs PAGE DIRECTORIES PAGE TABLES Chapter 8 Input/Output ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ This chapter presents the I/O features of the 80386 from the following perspectives: þ Methods of addressing I/O ports þ Instructions that cause I/O operations þ Protection as it applies to the use of I/O instructions and I/O port addresses. 8.1 I/O Addressing The 80386 allows input/output to be performed in either of two ways: þ By means of a separate I/O address space (using specific I/O instructions) þ By means of memory-mapped I/O (using general-purpose operand manipulationinstructions). 8.1.1 I/O Address Space The 80386 provides a separate I/O address space, distinct from physical memory, that can be used to address the input/output ports that are used for external 16 devices. The I/O address space consists of 2^(16) (64K) individually addressable 8-bit ports; any two consecutive 8-bit ports can be treated as a 16-bit port; and four consecutive 8-bit ports can be treated as a 32-bit port. Thus, the I/O address space can accommodate up to 64K 8-bit ports, up to 32K 16-bit ports, or up to 16K 32-bit ports. The program can specify the address of the port in two ways. Using an immediate byte constant, the program can specify: þ 256 8-bit ports numbered 0 through 255. þ 128 16-bit ports numbered 0, 2, 4, . . . , 252, 254. þ 64 32-bit ports numbered 0, 4, 8, . . . , 248, 252. Using a value in DX, the program can specify: þ 8-bit ports numbered 0 through 65535 þ 16-bit ports numbered 0, 2, 4, . . . , 65532, 65534 þ 32-bit ports numbered 0, 4, 8, . . . , 65528, 65532 The 80386 can transfer 32, 16, or 8 bits at a time to a device located in the I/O space. Like doublewords in memory, 32-bit ports should be aligned at addresses evenly divisible by four so that the 32 bits can be transferred in a single bus access. Like words in memory, 16-bit ports should be aligned at even-numbered addresses so that the 16 bits can be transferred in a single bus access. An 8-bit port may be located at either an even or odd address. The instructions IN and OUT move data between a register and a port in the I/O address space. The instructions INS and OUTS move strings of data between the memory address space and ports in the I/O address space. 8.1.2 Memory-Mapped I/O I/O devices also may be placed in the 80386 memory address space. As long as the devices respond like memory components, they are indistinguishable to the processor. Memory-mapped I/O provides additional programming flexibility. Any instruction that references memory may be used to access an I/O port located in the memory space. For example, the MOV instruction can transfer data between any register and a port; and the AND, OR, and TEST instructions may be used to manipulate bits in the internal registers of a device (see Figure 8-1). Memory-mapped I/O performed via the full instruction set maintains the full complement of addressing modes for selecting the desired I/O device (e.g., direct address, indirect address, base register, index register, scaling). Memory-mapped I/O, like any other memory reference, is subject to access protection and control when executing in protected mode. Refer to Chapter 6 for a discussion of memory protection. 8.2 I/O Instructions The I/O instructions of the 80386 provide access to the processor's I/O ports for the transfer of data to and from peripheral devices. These instructions have as one operand the address of a port in the I/O address space. There are two classes of I/O instruction: 1. Those that transfer a single item (byte, word, or doubleword) located in a register. 2. Those that transfer strings of items (strings of bytes, words, or doublewords) located in memory. These are known as "string I/O instructions" or "block I/O instructions". 8.2.1 Register I/O Instructions The I/O instructions IN and OUT are provided to move data between I/O ports and the EAX (32-bit I/O), the AX (16-bit I/O), or AL (8-bit I/O) general registers. IN and OUT instructions address I/O ports either directly, with the address of one of up to 256 port addresses coded in the instruction, or indirectly via the DX register to one of up to 64K port addresses. IN (Input from Port) transfers a byte, word, or doubleword from an input port to AL, AX, or EAX. If a program specifies AL with the IN instruction, the processor transfers 8 bits from the selected port to AL. If a program specifies AX with the IN instruction, the processor transfers 16 bits from the port to AX. If a program specifies EAX with the IN instruction, the processor transfers 32 bits from the port to EAX. OUT (Output to Port) transfers a byte, word, or doubleword to an output port from AL, AX, or EAX. The program can specify the number of the port using the same methods as the IN instruction. Figure 8-1. Memory-Mapped I/O MEMORY ADDRESS SPACE I/O DEVICE 1 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º INTERNAL REGISTER º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ Ä Ä Ä Ä Ä Ä Ä Ä ÄºÄÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ Ä Ä Ä Ä Ä Ä Ä Ä ÄºÄÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ º º º º º º º º I/O DEVICE 2 º º ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º INTERNAL REGISTER º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ Ä Ä Ä Ä Ä Ä Ä Ä ÄºÄÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ Ä Ä Ä Ä Ä Ä Ä Ä ÄºÄÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 8.2.2 Block I/O Instructions The block (or string) I/O instructions INS and OUTS move blocks of data between I/O ports and memory space. Block I/O instructions use the DX register to specify the address of a port in the I/O address space. INS and OUTS use DX to specify: þ 8-bit ports numbered 0 through 65535 þ 16-bit ports numbered 0, 2, 4, . . . , 65532, 65534 þ 32-bit ports numbered 0, 4, 8, . . . , 65528, 65532 Block I/O instructions use either SI or DI to designate the source or destination memory address. For each transfer, SI or DI are automatically either incremented or decremented as specified by the direction bit in the flags register. INS and OUTS, when used with repeat prefixes, cause block input or output operations. REP, the repeat prefix, modifies INS and OUTS to provide a means of transferring blocks of data between an I/O port and memory. These block I/O instructions are string primitives (refer also to Chapter 3 for more on string primitives). They simplify programming and increase the speed of data transfer by eliminating the need to use a separate LOOP instruction or an intermediate register to hold the data. The string I/O primitives can operate on byte strings, word strings, or doubleword strings. After each transfer, the memory address in ESI or EDI is updated by 1 for byte operands, by 2 for word operands, or by 4 for doubleword operands. The value in the direction flag (DF) determines whether the processor automatically increments ESI or EDI (DF=0) or whether it automatically decrements these registers (DF=1). INS (Input String from Port) transfers a byte or a word string element from an input port to memory. The mnemonics INSB, INSW, and INSD are variants that explicitly specify the size of the operand. If a program specifies INSB, the processor transfers 8 bits from the selected port to the memory location indicated by ES:EDI. If a program specifies INSW, the processor transfers 16 bits from the port to the memory location indicated by ES:EDI. If a program specifies INSD, the processor transfers 32 bits from the port to the memory location indicated by ES:EDI. The destination segment register choice (ES) cannot be changed for the INS instruction. Combined with the REP prefix, INS moves a block of information from an input port to a series of consecutive memory locations. OUTS (Output String to Port) transfers a byte, word, or doubleword string element to an output port from memory. The mnemonics OUTSB, OUTSW, and OUTSD are variants that explicitly specify the size of the operand. If a program specifies OUTSB, the processor transfers 8 bits from the memory location indicated by ES:EDI to the the selected port. If a program specifies OUTSW, the processor transfers 16 bits from the memory location indicated by ES:EDI to the the selected port. If a program specifies OUTSD, the processor transfers 32 bits from the memory location indicated by ES:EDI to the the selected port. Combined with the REP prefix, OUTS moves a block of information from a series of consecutive memory locations indicated by DS:ESI to an output port. 8.3 Protection and I/O Two mechanisms provide protection for I/O functions: 1. The IOPL field in the EFLAGS register defines the right to use I/O-related instructions. 2. The I/O permission bit map of a 80386 TSS segment defines the right to use ports in the I/O address space. These mechanisms operate only in protected mode, including virtual 8086 mode; they do not operate in real mode. In real mode, there is no protection of the I/O space; any procedure can execute I/O instructions, and any I/O port can be addressed by the I/O instructions. 8.3.1 I/O Privilege Level Instructions that deal with I/O need to be restricted but also need to be executed by procedures executing at privilege levels other than zero. For this reason, the processor uses two bits of the flags register to store the I/O privilege level (IOPL). The IOPL defines the privilege level needed to execute I/O-related instructions. The following instructions can be executed only if CPL ó IOPL: IN ÄÄ Input INS ÄÄ Input String OUT ÄÄ Output OUTS ÄÄ Output String CLI ÄÄ Clear Interrupt-Enable Flag STI ÄÄ Set Interrupt-Enable These instructions are called "sensitive" instructions, because they are sensitive to IOPL. To use sensitive instructions, a procedure must execute at a privilege level at least as privileged as that specified by the IOPL (CPL ó IOPL). Any attempt by a less privileged procedure to use a sensitive instruction results in a general protection exception. Because each task has its own unique copy of the flags register, each task can have a different IOPL. A task whose primary function is to perform I/O (a device driver) can benefit from having an IOPL of three, thereby permitting all procedures of the task to performI/O. Other tasks typically have IOPL set to zero or one, reserving the right to perform I/O instructions for the most privileged procedures. A task can change IOPL only with the POPF instruction; however, such changes are privileged. No procedure may alter IOPL (the I/O privilege level in the flag register) unless the procedure is executing at privilege level 0. An attempt by a less privileged procedure to alter IOPL does not result in an exception; IOPL simply remains unaltered. The POPF instruction may be used in addition to CLI and STI to alter the interrupt-enable flag (IF); however, changes to IF by POPF are IOPL-sensitive. A procedure may alter IF with a POPF instruction only when executing at a level that is at least as privileged as IOPL. An attempt by a less privileged procedure to alter IF in this manner does not result in an exception; IF simply remains unaltered. 8.3.2 I/O Permission Bit Map The I/O instructions that directly refer to addresses in the processor's I/O space are IN, INS, OUT, OUTS. The 80386 has the ability to selectively trap references to specific I/O addresses. The structure that enables selective trapping is the I/O Permission Bit Map in the TSS segment (see Figure 8-2). The I/O permission map is a bit vector. The size of the map and its location in the TSS segment are variable. The processor locates the I/O permission map by means of the I/O map base field in the fixed portion of the TSS. The I/O map base field is 16 bits wide and contains the offset of the beginning of the I/O permission map. The upper limit of the I/O permission map is the same as the limit of the TSS segment. In protected mode, when it encounters an I/O instruction (IN, INS, OUT, or OUTS), the processor first checks whether CPL ó IOPL. If this condition is true, the I/O operation may proceed. If not true, the processor checks the I/O permission map. (In virtual 8086 mode, the processor consults the map without regard for IOPL. Refer to Chapter 15.) Each bit in the map corresponds to an I/O port byte address; for example, the bit for port 41 is found at I/O map base + 5, bit offset 1. The processor tests all the bits that correspond to the I/O addresses spanned by an I/O operation; for example, a doubleword operation tests four bits corresponding to four adjacent byte addresses. If any tested bit is set, the processor signals a general protection exception. If all the tested bits are zero, the I/O operation may proceed. It is not necessary for the I/O permission map to represent all the I/O addresses. I/O addresses not spanned by the map are treated as if they had one bits in the map. For example, if TSS limit is equal to I/O map base + 31, the first 256 I/O ports are mapped; I/O operations on any port greater than 255 cause an exception. If I/O map base is greater than or equal to TSS limit, the TSS segment has no I/O permission map, and all I/O instructions in the 80386 program cause exceptions when CPL > IOPL. Because the I/O permission map is in the TSS segment, different tasks can have different maps. Thus, the operating system can allocate ports to a task by changing the I/O permission map in the task's TSS. Figure 8-2. I/O Address Bit Map TSS SEGMEMT 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍ» LIMITÄÄĺ º º Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä º    I/O PERMISSION BIT MAP    º Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä º ÚÄÄÄĺ º ³ ÇÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄĶ ³   ³   ³   ³ ÇÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄĶ ÀÄÄÄÄĶ I/O MAP BASE ³uuuuuuuu uuuuuuuTº64 ÇÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄĶ º00000000 00000000³ LOT º60 ÇÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄĶ º00000000 00000000³ GS º5C ÇÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄĶ º º58       º º4 ÇÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄĶ º00000000 00000000³ TSS BACK LINK º0 ÈÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍͼ Chapter 9 Exceptions and Interrupts ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Interrupts and exceptions are special kinds of control transfer; they work somewhat like unprogrammed CALLs. They alter the normal program flow to handle external events or to report errors or exceptional conditions. The difference between interrupts and exceptions is that interrupts are used to handle asynchronous events external to the processor, but exceptions handle conditions detected by the processor itself in the course of executing instructions. There are two sources for external interrupts and two sources for exceptions: 1. Interrupts þ Maskable interrupts, which are signalled via the INTR pin. þ Nonmaskable interrupts, which are signalled via the NMI (Non-Maskable Interrupt) pin. 2. Exceptions þ Processor detected. These are further classified as faults, traps, and aborts. þ Programmed. The instructions INTO, INT 3, INT n, and BOUND can trigger exceptions. These instructions are often called "software interrupts", but the processor handles them as exceptions. This chapter explains the features that the 80386 offers for controlling and responding to interrupts when it is executing in protected mode. 9.1 Identifying Interrupts The processor associates an identifying number with each different type of interrupt or exception. The NMI and the exceptions recognized by the processor are assigned predetermined identifiers in the range 0 through 31. Not all of these numbers are currently used by the 80386; unassigned identifiers in this range are reserved by Intel for possible future expansion. The identifiers of the maskable interrupts are determined by external interrupt controllers (such as Intel's 8259A Programmable Interrupt Controller) and communicated to the processor during the processor's interrupt-acknowledge sequence. The numbers assigned by an 8259A PIC can be specified by software. Any numbers in the range 32 through 255 can be used. Table 9-1 shows the assignment of interrupt and exception identifiers. Exceptions are classified as faults, traps, or aborts depending on the way they are reported and whether restart of the instruction that caused the exception is supported. Faults Faults are exceptions that are reported "before" the instruction causingthe exception. Faults are either detected before the instruction begins to execute, or during execution of the instruction. If detected during the instruction, the fault is reported with the machine restored to a state that permits the instruction to be restarted. Traps A trap is an exception that is reported at the instruction boundary immediately after the instruction in which the exception was detected. Aborts An abort is an exception that permits neither precise location of the instruction causing the exception nor restart of the program that caused the exception. Aborts are used to report severe errors, such as hardware errors and inconsistent or illegal values in system tables. Table 9-1. Interrupt and Exception ID Assignments Identifier Description 0 Divide error 1 Debug exceptions 2 Nonmaskable interrupt 3 Breakpoint (one-byte INT 3 instruction) 4 Overflow (INTO instruction) 5 Bounds check (BOUND instruction) 6 Invalid opcode 7 Coprocessor not available 8 Double fault 9 (reserved) 10 Invalid TSS 11 Segment not present 12 Stack exception 13 General protection 14 Page fault 15 (reserved) 16 Coprecessor error 17-31 (reserved) 32-255 Available for external interrupts via INTR pin 9.2 Enabling and Disabling Interrupts The processor services interrupts and exceptions only between the end of one instruction and the beginning of the next. When the repeat prefix is used to repeat a string instruction, interrupts and exceptions may occur between repetitions. Thus, operations on long strings do not delay interrupt response. Certain conditions and flag settings cause the processor to inhibit certain interrupts and exceptions at instruction boundaries. 9.2.1 NMI Masks Further NMIs While an NMI handler is executing, the processor ignores further interrupt signals at the NMI pin until the next IRET instruction is executed. 9.2.2 IF Masks INTR The IF (interrupt-enable flag) controls the acceptance of external interrupts signalled via the INTR pin. When IF=0, INTR interrupts are inhibited; when IF=1, INTR interrupts are enabled. As with the other flag bits, the processor clears IF in response to a RESET signal. The instructions CLI and STI alter the setting of IF. CLI (Clear Interrupt-Enable Flag) and STI (Set Interrupt-Enable Flag) explicitly alter IF (bit 9 in the flag register). These instructions may be executed only if CPL ó IOPL. A protection exception occurs if they are executed when CPL > IOPL. The IF is also affected implicitly by the following operations: þ The instruction PUSHF stores all flags, including IF, in the stack where they can be examined. þ Task switches and the instructions POPF and IRET load the flags register; therefore, they can be used to modify IF. þ Interrupts through interrupt gates automatically reset IF, disabling interrupts. (Interrupt gates are explained later in this chapter.) 9.2.3 RF Masks Debug Faults The RF bit in EFLAGS controls the recognition of debug faults. This permits debug faults to be raised for a given instruction at most once, no matter how many times the instruction is restarted. (Refer to Chapter 12 for more information on debugging.) 9.2.4 MOV or POP to SS Masks Some Interrupts and Exceptions Software that needs to change stack segments often uses a pair of instructions; for example: MOV SS, AX MOV ESP, StackTop If an interrupt or exception is processed after SS has been changed but before ESP has received the corresponding change, the two parts of the stack pointer SS:ESP are inconsistent for the duration of the interrupt handler or exception handler. To prevent this situation, the 80386, after both a MOV to SS and a POP to SS instruction, inhibits NMI, INTR, debug exceptions, and single-step traps at the instruction boundary following the instruction that changes SS. Some exceptions may still occur; namely, page fault and general protection fault. Always use the 80386 LSS instruction, and the problem will not occur. 9.3 Priority Among Simultaneous Interrupts and Exceptions If more than one interrupt or exception is pending at an instruction boundary, the processor services one of them at a time. The priority among classes of interrupt and exception sources is shown in Table 9-2. The processor first services a pending interrupt or exception from the class that has the highest priority, transferring control to the first instruction of the interrupt handler. Lower priority exceptions are discarded; lower priority interrupts are held pending. Discarded exceptions will be rediscovered when the interrupt handler returns control to the point of interruption. 9.4 Interrupt Descriptor Table The interrupt descriptor table (IDT) associates each interrupt or exception identifier with a descriptor for the instructions that service the associated event. Like the GDT and LDTs, the IDT is an array of 8-byte descriptors. Unlike the GDT and LDTs, the first entry of the IDT may contain a descriptor. To form an index into the IDT, the processor multiplies the interrupt or exception identifier by eight. Because there are only 256 identifiers, the IDT need not contain more than 256 descriptors. It can contain fewer than 256 entries; entries are required only for interrupt identifiers that are actually used. The IDT may reside anywhere in physical memory. As Figure 9-1 shows, the processor locates the IDT by means of the IDT register (IDTR). The instructions LIDT and SIDT operate on the IDTR. Both instructions have one explicit operand: the address in memory of a 6-byte area. Figure 9-2 shows the format of this area. LIDT (Load IDT register) loads the IDT register with the linear base address and limit values contained in the memory operand. This instruction can be executed only when the CPL is zero. It is normally used by the initialization logic of an operating system when creating an IDT. An operating system may also use it to change from one IDT to another. SIDT (Store IDT register) copies the base and limit value stored in IDTR to a memory location. This instruction can be executed at any privilege level. Table 9-2. Priority Among Simultaneous Interrupts and Exceptions Priority Class of Interrupt or Exception HIGHEST Faults except debug faults Trap instructions INTO, INT n, INT 3 Debug traps for this instruction Debug faults for next instruction NMI interrupt LOWEST INTR interrupt Figure 9-1. IDT Register and Table INTERRUPT DESCRIPTOR TABLE ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ÚÄÄÄĺ ³ ³ ³ º ³ ÇÄ GATE FOR INTERRUPT #N Ķ ³ º ³ ³ ³ º ³ ÈÍÍÍÍÍÍÏÍÍÍÍÍÏÍÍÍÍÍÏÍÍÍÍÍͼ ³   ³   ³   ³ ÉÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍ» ³ º ³ ³ ³ º ³ ÇÄ GATE FOR INTERRUPT #2 Ķ ³ º ³ ³ ³ º ³ ÌÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍ͹ IDT REGISTER ³ º ³ ³ ³ º ³ ÇÄ GATE FOR INTERRUPT #1 Ķ 15 0 ³ º ³ ³ ³ º ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ÌÍÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍ͹ º IDT LIMIT ÇÄÄÄÄÙ º ³ ³ ³ º ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÇÄ GATE FOR INTERRUPT #0 Ķ º IDT BASE ÇÄÄÄÄÄÄÄÄĺ ³ ³ ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÏÍÍÍÍÍÏÍÍÍÍÍÏÍÍÍÍÍͼ 31 0 Figure 9-2. Pseudo-Descriptor Format for LIDT and SIDT 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º BASE º2 ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º LIMIT º0 ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 9.5 IDT Descriptors The IDT may contain any of three kinds of descriptor: þ Task gates þ Interrupt gates þ Trap gates Figure 9-3 illustrates the format of task gates and 80386 interrupt gates and trap gates. (The task gate in an IDT is the same as the task gate already discussed in Chapter 7.) Figure 9-3. 80306 IDT Gate Descriptors 80386 TASK GATE 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÑÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º±±±±±±±±±±±±±(NOT USED)±±±±±±±±±±±±³ P ³DPL³0 0 1 0 1³±±±(NOT USED)±±±±º4 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÁÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º SELECTOR ³±±±±±±±±±±±±±(NOT USED)±±±±±±±±±±±±º0 ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 80386 INTERRUPT GATE 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÑÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍ» º OFFSET 31..16 ³ P ³DPL³0 1 1 1 0³0 0 0³(NOT USED) º4 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÁÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄĶ º SELECTOR ³ OFFSET 15..0 º0 ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 80386 TRAP GATE 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÑÍÍÍÑÍÍÍÍÍÍÍÍÍØÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍ» º OFFSET 31..16 ³ P ³DPL³0 1 1 1 1³0 0 0³(NOT USED) º4 ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÁÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄĶ º SELECTOR ³ OFFSET 15..0 º0 ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 9.6 Interrupt Tasks and Interrupt Procedures Just as a CALL instruction can call either a procedure or a task, so an interrupt or exception can "call" an interrupt handler that is either a procedure or a task. When responding to an interrupt or exception, the processor uses the interrupt or exception identifier to index a descriptor in the IDT. If the processor indexes to an interrupt gate or trap gate, it invokes the handler in a manner similar to a CALL to a call gate. If the processor finds a task gate, it causes a task switch in a manner similar to a CALL to a task gate. 9.6.1 Interrupt Procedures An interrupt gate or trap gate points indirectly to a procedure which will execute in the context of the currently executing task as illustrated by Figure 9-4. The selector of the gate points to an executable-segment descriptor in either the GDT or the current LDT. The offset field of the gate points to the beginning of the interrupt or exception handling procedure. The 80386 invokes an interrupt or exception handling procedure in much the same manner as it CALLs a procedure; the differences are explained in the following sections. Figure 9-4. Interrupt Vectoring for Procedures IDT EXECUTABLE SEGMENT ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º OFFSETº º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĺ ENTRY POINT º º º ³ LDT OR GDT º º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º ³ º º º º INTERRUPT ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º º IDÄÄÄÄĺ TRAP GATE OR ÇÄÄÙ º º º º ºINTERRUPT GATE ÇÄÄ¿ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ º º º º º º ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÀÄĺ SEGMENT ÇÄ¿ º º º º º DESCRIPTOR º ³ º º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ º º º º º º ³ º º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ º º º º º º ³BASEº º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÀÄÄÄÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ º º º º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 9.6.1.1 Stack of Interrupt Procedure Just as with a control transfer due to a CALL instruction, a control transfer to an interrupt or exception handling procedure uses the stack to store the information needed for returning to the original procedure. As Figure 9-5 shows, an interrupt pushes the EFLAGS register onto the stack before the pointer to the interrupted instruction. Certain types of exceptions also cause an error code to be pushed on the stack. An exception handler can use the error code to help diagnose the exception. 9.6.1.2 Returning from an Interrupt Procedure An interrupt procedure also differs from a normal procedure in the method of leaving the procedure. The IRET instruction is used to exit from an interrupt procedure. IRET is similar to RET except that IRET increments EIP by an extra four bytes (because of the flags on the stack) and moves the saved flags into the EFLAGS register. The IOPL field of EFLAGS is changed only if the CPL is zero. The IF flag is changed only if CPL ó IOPL. Figure 9-5. Stack Layout after Exception of Interrupt WITHOUT PRIVILEGE TRANSITION D O 31 0 31 0 I F ÌÍÍÍÍÍÍÍËÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍËÍÍÍÍÍÍ͹ R º±±±±±±±º±±±±±±±º OLD º±±±±±±±º±±±±±±±º OLD E E ÌÍÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ SS:ESP ÌÍÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ SS:ESP C X º±±±±±±±º±±±±±±±º ³ º±±±±±±±º±±±±±±±º ³ T P ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ÄÄÄÄÙ ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ÄÄÄÄÙ I A º OLD EFLAGS º º OLD EFLAGS º O N ÌÍÍÍÍÍÍÍËÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍËÍÍÍÍÍÍ͹ N S º±±±±±±±ºOLD CS º NEW º±±±±±±±ºOLD CS º I ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ SS:ESP ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ ³ O º OLD EIP º ³ º OLD EIP º NEW ³ N ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ÄÄÄÄÙ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ SS:ESP ³ º º º ERROR CODE º ³    ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ÄÄÄÄÙ   º º   WITHOUT ERROR CODE WITH ERROR CODE WITH PRIVILEGE TRANSITION D O 31 0 31 0 I F ÉÍÍÍÍÍÍÍËÍÍÍÍÍÍÍ»ÄÄÄÄ¿ ÉÍÍÍÍÍÍÍËÍÍÍÍÍÍÍ»ÄÄÄÄ¿ R º±±±±±±±ºOLD SS º ³ º±±±±±±±ºOLD SS º ³ E E ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ SS:ESP ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ SS:ESP C X º OLD ESP º FROM TSS º OLD ESP º FROM TSS T P ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ I A º OLD EFLAGS º º OLD EFLAGS º O N ÌÍÍÍÍÍÍÍËÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍËÍÍÍÍÍÍ͹ N S º±±±±±±±ºOLD CS º NEW º±±±±±±±ºOLD CS º I ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ SS:EIP ÌÍÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ ³ O º OLD EIP º ³ º OLD EIP º NEW ³ N ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ÄÄÄÄÙ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ SS:ESP ³ º º º ERROR CODE º ³    ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ÄÄÄÄÙ   º º   WITHOUT ERROR CODE WITH ERROR CODE 9.6.1.3 Flags Usage by Interrupt Procedure Interrupts that vector through either interrupt gates or trap gates cause TF (the trap flag) to be reset after the current value of TF is saved on the stack as part of EFLAGS. By this action the processor prevents debugging activity that uses single-stepping from affecting interrupt response. A subsequent IRET instruction restores TF to the value in the EFLAGS image on the stack. The difference between an interrupt gate and a trap gate is in the effect on IF (the interrupt-enable flag). An interrupt that vectors through an interrupt gate resets IF, thereby preventing other interrupts from interfering with the current interrupt handler. A subsequent IRET instruction restores IF to the value in the EFLAGS image on the stack. An interrupt through a trap gate does not change IF. 9.6.1.4 Protection in Interrupt Procedures The privilege rule that governs interrupt procedures is similar to that for procedure calls: the CPU does not permit an interrupt to transfer control to a procedure in a segment of lesser privilege (numerically greater privilege level) than the current privilege level. An attempt to violate this rule results in a general protection exception. Because occurrence of interrupts is not generally predictable, this privilege rule effectively imposes restrictions on the privilege levels at which interrupt and exception handling procedures can execute. Either of the following strategies can be employed to ensure that the privilege rule is never violated. þ Place the handler in a conforming segment. This strategy suits the handlers for certain exceptions (divide error, for example). Such a handler must use only the data available to it from the stack. If it needed data from a data segment, the data segment would have to have privilege level three, thereby making it unprotected. þ Place the handler procedure in a privilege level zero segment. 9.6.2 Interrupt Tasks A task gate in the IDT points indirectly to a task, as Figure 9-6 illustrates. The selector of the gate points to a TSS descriptor in the GDT. When an interrupt or exception vectors to a task gate in the IDT, a task switch results. Handling an interrupt with a separate task offers two advantages: þ The entire context is saved automatically. þ The interrupt handler can be isolated from other tasks by giving it a separate address space, either via its LDT or via its page directory. The actions that the processor takes to perform a task switch are discussed in Chapter 7. The interrupt task returns to the interrupted task by executing an IRET instruction. If the task switch is caused by an exception that has an error code, the processor automatically pushes the error code onto the stack that corresponds to the privilege level of the first instruction to be executed in the interrupt task. When interrupt tasks are used in an operating system for the 80386, there are actually two schedulers: the software scheduler (part of the operating system) and the hardware scheduler (part of the processor's interrupt mechanism). The design of the software scheduler should account for the fact that the hardware scheduler may dispatch an interrupt task whenever interrupts are enabled. Figure 9-6. Interrupt Vectoring for Tasks IDT GDT ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º TSS ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º º º º º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º ÚÄĺ TASK GATE ÇÄÄÄ¿ º º º º ³ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ³ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º ³ º º ÀÄÄĺ TSS DESCRIPTOR ÇÄÄÄ¿ º º ³ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ³ º º ³ º º º º ³ º º ³ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÀÄÄÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ º º º º ³ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ³ º º º º ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ÀÄINTERRUPT ID 9.7 Error Code With exceptions that relate to a specific segment, the processor pushes an error code onto the stack of the exception handler (whether procedure or task). The error code has the format shown in Figure 9-7. The format of the error code resembles that of a selector; however, instead of an RPL field, the error code contains two one-bit items: 1. The processor sets the EXT bit if an event external to the program caused the exception. 2. The processor sets the I-bit (IDT-bit) if the index portion of the error code refers to a gate descriptor in the IDT. If the I-bit is not set, the TI bit indicates whether the error code refers to the GDT (value 0) or to the LDT (value 1). The remaining 14 bits are the upper 14 bits of the segment selector involved. In some cases the error code on the stack is null, i.e., all bits in the low-order word are zero. Figure 9-7. Error Code Format 31 15 2 1 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÑÍÑÍÑÍ» º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ ³T³ ³Eº º±±±±±±±±±±±UNDEFINED±±±±±±±±±±±±³ SELECTOR INDEX ³ ³I³ º º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ ³I³ ³Xº ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÏÍÏÍÏͼ 9.8 Exception Conditions The following sections describe each of the possible exception conditions in detail. Each description classifies the exception as a fault, trap, or abort. This classification provides information needed by systems programmers for restarting the procedure in which the exception occurred: Faults The CS and EIP values saved when a fault is reported point to the instruction causing the fault. Traps The CS and EIP values stored when the trap is reported point to the instruction dynamically after the instruction causing the trap. If a trap is detected during an instruction that alters program flow, the reported values of CS and EIP reflect the alteration of program flow. For example, if a trap is detected in a JMP instruction, the CS and EIP values pushed onto the stack point to the target of the JMP, not to the instruction after the JMP. Aborts An abort is an exception that permits neither precise location of the instruction causing the exception nor restart of the program that caused the exception. Aborts are used to report severe errors, such as hardware errors and inconsistent or illegal values in system tables. 9.8.1 Interrupt 0 ÄÄ Divide Error The divide-error fault occurs during a DIV or an IDIV instruction when the divisor is zero. 9.8.2 Interrupt 1 ÄÄ Debug Exceptions The processor triggers this interrupt for any of a number of conditions; whether the exception is a fault or a trap depends on the condition: þ Instruction address breakpoint fault. þ Data address breakpoint trap. þ General detect fault. þ Single-step trap. þ Task-switch breakpoint trap. The processor does not push an error code for this exception. An exception handler can examine the debug registers to determine which condition caused the exception. Refer to Chapter 12 for more detailed information about debugging and the debug registers. 9.8.3 Interrupt 3 ÄÄ Breakpoint The INT 3 instruction causes this trap. The INT 3 instruction is one byte long, which makes it easy to replace an opcode in an executable segment with the breakpoint opcode. The operating system or a debugging subsystem can use a data-segment alias for an executable segment to place an INT 3 anywhere it is convenient to arrest normal execution so that some sort of special processing can be performed. Debuggers typically use breakpoints as a way of displaying registers, variables, etc., at crucial points in a task. The saved CS:EIP value points to the byte following the breakpoint. If a debugger replaces a planted breakpoint with a valid opcode, it must subtract one from the saved EIP value before returning. Refer also to Chapter 12 for more information on debugging. 9.8.4 Interrupt 4 ÄÄ Overflow This trap occurs when the processor encounters an INTO instruction and the OF (overflow) flag is set. Since signed arithmetic and unsigned arithmetic both use the same arithmetic instructions, the processor cannot determine which is intended and therefore does not cause overflow exceptions automatically. Instead it merely sets OF when the results, if interpreted as signed numbers, would be out of range. When doing arithmetic on signed operands, careful programmers and compilers either test OF directly or use the INTO instruction. 9.8.5 Interrupt 5 ÄÄ Bounds Check This fault occurs when the processor, while executing a BOUND instruction, finds that the operand exceeds the specified limits. A program can use the BOUND instruction to check a signed array index against signed limits defined in a block of memory. 9.8.6 Interrupt 6 ÄÄ Invalid Opcode This fault occurs when an invalid opcode is detected by the execution unit. (The exception is not detected until an attempt is made to execute the invalid opcode; i.e., prefetching an invalid opcode does not cause this exception.) No error code is pushed on the stack. The exception can be handled within the same task. This exception also occurs when the type of operand is invalid for the given opcode. Examples include an intersegment JMP referencing a register operand, or an LES instruction with a register source operand. 9.8.7 Interrupt 7 ÄÄ Coprocessor Not Available This exception occurs in either of two conditions: þ The processor encounters an ESC (escape) instruction, and the EM (emulate) bit ofCR0 (control register zero) is set. þ The processor encounters either the WAIT instruction or an ESC instruction, and both the MP (monitor coprocessor) and TS (task switched) bits of CR0 are set. Refer to Chapter 11 for information about the coprocessor interface. 9.8.8 Interrupt 8 ÄÄ Double Fault Normally, when the processor detects an exception while trying to invoke the handler for a prior exception, the two exceptions can be handled serially. If, however, the processor cannot handle them serially, it signals the double-fault exception instead. To determine when two faults are to be signalled as a double fault, the 80386 divides the exceptions into three classes: benign exceptions, contributory exceptions, and page faults. Table 9-3 shows this classification. Table 9-4 shows which combinations of exceptions cause a double fault and which do not. The processor always pushes an error code onto the stack of the double-fault handler; however, the error code is always zero. The faulting instruction may not be restarted. If any other exception occurs while attempting to invoke the double-fault handler, the processor shuts down. Table 9-3. Double-Fault Detection Classes Class ID Description 1 Debug exceptions 2 NMI 3 Breakpoint Benign 4 Overflow Exceptions 5 Bounds check 6 Invalid opcode 7 Coprocessor not available 16 Coprocessor error 0 Divide error 9 Coprocessor Segment Overrun Contributory 10 Invalid TSS Exceptions 11 Segment not present 12 Stack exception 13 General protection Page Faults 14 Page fault Table 9-4. Double-Fault Definition SECOND EXCEPTION Benign Contributory Page Exception Exception Fault Benign OK OK OK Exception FIRST Contributory OK DOUBLE OK EXCEPTION Exception Page Fault OK DOUBLE DOUBLE 9.8.9 Interrupt 9 ÄÄ Coprocessor Segment Overrun This exception is raised in protected mode if the 80386 detects a page or segment violation while transferring the middle portion of a coprocessor operand to the NPX. This exception is avoidable. Refer to Chapter 11 for more information about the coprocessor interface. 9.8.10 Interrupt 10 ÄÄ Invalid TSS Interrupt 10 occurs if during a task switch the new TSS is invalid. A TSS is considered invalid in the cases shown in Table 9-5. An error code is pushed onto the stack to help identify the cause of the fault. The EXT bit indicates whether the exception was caused by a condition outside the control of the program; e.g., an external interrupt via a task gate triggered a switch to an invalid TSS. This fault can occur either in the context of the original task or in the context of the new task. Until the processor has completely verified the presence of the new TSS, the exception occurs in the context of the original task. Once the existence of the new TSS is verified, the task switch is considered complete; i.e., TR is updated and, if the switch is due to a CALL or interrupt, the backlink of the new TSS is set to the old TSS. Any errors discovered by the processor after this point are handled in the context of the new task. To insure a proper TSS to process it, the handler for exception 10 must be a task invoked via a task gate. Table 9-5. Conditions That Invalidate the TSS Error Code Condition TSS id + EXT The limit in the TSS descriptor is less than 103 LTD id + EXT Invalid LDT selector or LDT not present SS id + EXT Stack segment selector is outside table limit SS id + EXT Stack segment is not a writable segment SS id + EXT Stack segment DPL does not match new CPL SS id + EXT Stack segment selector RPL < > CPL CS id + EXT Code segment selector is outside table limit CS id + EXT Code segment selector does not refer to code segment CS id + EXT DPL of non-conforming code segment < > new CPL CS id + EXT DPL of conforming code segment > new CPL DS/ES/FS/GS id + EXT DS, ES, FS, or GS segment selector is outside table limits DS/ES/FS/GS id + EXT DS, ES, FS, or GS is not readable segment 9.8.11 Interrupt 11 ÄÄ Segment Not Present Exception 11 occurs when the processor detects that the present bit of a descriptor is zero. The processor can trigger this fault in any of these cases: þ While attempting to load the CS, DS, ES, FS, or GS registers; loading the SS register, however, causes a stack fault. þ While attempting loading the LDT register with an LLDT instruction; loading the LDT register during a task switch operation, however, causes the "invalid TSS" exception. þ While attempting to use a gate descriptor that is marked not-present. This fault is restartable. If the exception handler makes the segment present and returns, the interrupted program will resume execution. If a not-present exception occurs during a task switch, not all the steps of the task switch are complete. During a task switch, the processor first loads all the segment registers, then checks their contents for validity. If a not-present exception is discovered, the remaining segment registers have not been checked and therefore may not be usable for referencing memory. The not-present handler should not rely on being able to use the values found in CS, SS, DS, ES, FS, and GS without causing another exception. The exception handler should check all segment registers before trying to resume the new task; otherwise, general protection faults may result later under conditions that make diagnosis more difficult. There are three ways to handle this case: 1. Handle the not-present fault with a task. The task switch back to the interrupted task will cause the processor to check the registers as it loads them from the TSS. 2. PUSH and POP all segment registers. Each POP causes the processor to check the new contents of the segment register. 3. Scrutinize the contents of each segment-register image in the TSS, simulating the test that the processor makes when it loads a segment register. This exception pushes an error code onto the stack. The EXT bit of the error code is set if an event external to the program caused an interrupt that subsequently referenced a not-present segment. The I-bit is set if the error code refers to an IDT entry, e.g., an INT instruction referencing a not-present gate. An operating system typically uses the "segment not present" exception to implement virtual memory at the segment level. A not-present indication in a gate descriptor, however, usually does not indicate that a segment is not present (because gates do not necessarily correspond to segments). Not-present gates may be used by an operating system to trigger exceptions of special significance to the operating system. 9.8.12 Interrupt 12 ÄÄ Stack Exception A stack fault occurs in either of two general conditions: þ As a result of a limit violation in any operation that refers to the SS register. This includes stack-oriented instructions such as POP, PUSH, ENTER, and LEAVE, as well as other memory references that implicitly use SS (for example, MOV AX, [BP+6]). ENTER causes this exception when the stack is too small for the indicated local-variable space. þ When attempting to load the SS register with a descriptor that is marked not-present but is otherwise valid. This can occur in a task switch, an interlevel CALL, an interlevel return, an LSS instruction, or a MOV or POP instruction to SS. When the processor detects a stack exception, it pushes an error code onto the stack of the exception handler. If the exception is due to a not-present stack segment or to overflow of the new stack during an interlevel CALL, the error code contains a selector to the segment in question (the exception handler can test the present bit in the descriptor to determine which exception occurred); otherwise the error code is zero. An instruction that causes this fault is restartable in all cases. The return pointer pushed onto the exception handler's stack points to the instruction that needs to be restarted. This instruction is usually the one that caused the exception; however, in the case of a stack exception due to loading of a not-present stack-segment descriptor during a task switch, the indicated instruction is the first instruction of the new task. When a stack fault occurs during a task switch, the segment registers may not be usable for referencing memory. During a task switch, the selector values are loaded before the descriptors are checked. If a stack fault is discovered, the remaining segment registers have not been checked and therefore may not be usable for referencing memory. The stack fault handler should not rely on being able to use the values found in CS, SS, DS, ES, FS, and GS without causing another exception. The exception handler should check all segment registers before trying to resume the new task; otherwise, general protection faults may result later under conditions that make diagnosis more difficult. 9.8.13 Interrupt 13 ÄÄ General Protection Exception All protection violations that do not cause another exception cause a general protection exception. This includes (but is not limited to): 1. Exceeding segment limit when using CS, DS, ES, FS, or GS 2. Exceeding segment limit when referencing a descriptor table 3. Transferring control to a segment that is not executable 4. Writing into a read-only data segment or into a code segment 5. Reading from an execute-only segment 6. Loading the SS register with a read-only descriptor (unless the selector comes from the TSS during a task switch, in which case a TSS exception occurs 7. Loading SS, DS, ES, FS, or GS with the descriptor of a system segment 8. Loading DS, ES, FS, or GS with the descriptor of an executable segment that is not also readable 9. Loading SS with the descriptor of an executable segment 10. Accessing memory via DS, ES, FS, or GS when the segment register contains a null selector 11. Switching to a busy task 12. Violating privilege rules 13. Loading CR0 with PG=1 and PE=0. 14. Interrupt or exception via trap or interrupt gate from V86 mode to privilege level other than zero. 15. Exceeding the instruction length limit of 15 bytes (this can occur only if redundant prefixes are placed before an instruction) The general protection exception is a fault. In response to a general protection exception, the processor pushes an error code onto the exception handler's stack. If loading a descriptor causes the exception, the error code contains a selector to the descriptor; otherwise, the error code is null. The source of the selector in an error code may be any of the following: 1. An operand of the instruction. 2. A selector from a gate that is the operand of the instruction. 3. A selector from a TSS involved in a task switch. 9.8.14 Interrupt 14 ÄÄ Page Fault This exception occurs when paging is enabled (PG=1) and the processor detects one of the following conditions while translating a linear address to a physical address: þ The page-directory or page-table entry needed for the address translation has zero in its present bit. þ The current procedure does not have sufficient privilege to access the indicated page. The processor makes available to the page fault handler two items of information that aid in diagnosing the exception and recovering from it: þ An error code on the stack. The error code for a page fault has a format different from that for other exceptions (see Figure 9-8). The error code tells the exception handler three things: 1. Whether the exception was due to a not present page or to an access rights violation. 2. Whether the processor was executing at user or supervisor level at the time of the exception. 3. Whether the memory access that caused the exception was a read or write. þ CR2 (control register two). The processor stores in CR2 the linear address used in the access that caused the exception (see Figure 9-9). The exception handler can use this address to locate the corresponding page directory and page table entries. If another page fault can occur during execution of the page fault handler, the handler should push CR2 onto the stack. Figure 9-8. Page-Fault Error Code Format ÉÍÍÍÍÍÑÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ºField³Value³ Description º ÇÄÄÄÄÄÅÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º U/S ³ 0 ³ The access causing the fault originated when the processor º º ³ ³ was executing in supervisor mode. º º ³ ³ º º ³ 1 ³ The access causing the fault originated when the processor º º ³ ³ was executing in user mode. º º ³ ³ º º W/R ³ 0 ³ The access causing the fault was a read. º º ³ ³ º º ³ 1 ³ The access causing the fault was a write. º º ³ ³ º º P ³ 0 ³ The fault was caused by a not-present page. º º ³ ³ º º ³ 1 ³ The fault was caused by a page-level protection violation. º ÈÍÍÍÍÍÏÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 31 15 7 3 2 1 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÑÍÑÍ» º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³U³W³ º º±±±±±±±±±±±±±±±±±±±±±±±±±±UNDEFINED±±±±±±±±±±±±±±±±±±±±±±±³/³/³Pº º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³S³R³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÏÍÏͼ 9.8.14.1 Page Fault During Task Switch The processor may access any of four segments during a task switch: 1. Writes the state of the original task in the TSS of that task. 2. Reads the GDT to locate the TSS descriptor of the new task. 3. Reads the TSS of the new task to check the types of segment descriptors from the TSS. 4. May read the LDT of the new task in order to verify the segment registers stored in the new TSS. A page fault can result from accessing any of these segments. In the latter two cases the exception occurs in the context of the new task. The instruction pointer refers to the next instruction of the new task, not to the instruction that caused the task switch. If the design of the operating system permits page faults to occur during task-switches, the page-fault handler should be invoked via a task gate. Figure 9-9. CR2 Format 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º PAGE FAULT LINEAR ADDRESS º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 9.8.14.2 Page Fault with Inconsistent Stack Pointer Special care should be taken to ensure that a page fault does not cause the processor to use an invalid stack pointer (SS:ESP). Software written for earlier processors in the 8086 family often uses a pair of instructions to change to a new stack; for example: MOV SS, AX MOV SP, StackTop With the 80386, because the second instruction accesses memory, it is possible to get a page fault after SS has been changed but before SP has received the corresponding change. At this point, the two parts of the stack pointer SS:SP (or, for 32-bit programs, SS:ESP) are inconsistent. The processor does not use the inconsistent stack pointer if the handling of the page fault causes a stack switch to a well defined stack (i.e., the handler is a task or a more privileged procedure). However, if the page fault handler is invoked by a trap or interrupt gate and the page fault occurs at the same privilege level as the page fault handler, the processor will attempt to use the stack indicated by the current (invalid) stack pointer. In systems that implement paging and that handle page faults within the faulting task (with trap or interrupt gates), software that executes at the same privilege level as the page fault handler should initialize a new stack by using the new LSS instruction rather than an instruction pair shown above. When the page fault handler executes at privilege level zero (the normal case), the scope of the problem is limited to privilege-level zero code, typically the kernel of the operating system. 9.8.15 Interrupt 16 ÄÄ Coprocessor Error The 80386 reports this exception when it detects a signal from the 80287 or 80387 on the 80386's ERROR# input pin. The 80386 tests this pin only at the beginning of certain ESC instructions and when it encounters a WAIT instruction while the EM bit of the MSW is zero (no emulation). Refer to Chapter 11 for more information on the coprocessor interface. 9.9 Exception Summary Table 9-6 summarizes the exceptions recognized by the 386. Table 9-6. Exception Summary Description Interrupt Return Address Exception Function That Can Generate Number Points to Type the Exception Faulting Instruction Divide error 0 YES FAULT DIV, IDIV Debug exceptions 1 Some debug exceptions are traps and some are faults. The exception handler can determine which has occurred by examining DR6. (Refer to Chapter 12.) Some debug exceptions are traps and some are faults. The exception handler can determine which has occurred by examining DR6. (Refer to Chapter 12.) Any instruction Breakpoint 3 NO TRAP One-byte INT 3 Overflow 4 NO TRAP INTO Bounds check 5 YES FAULT BOUND Invalid opcode 6 YES FAULT Any illegal instruction Coprocessor not available 7 YES FAULT ESC, WAIT Double fault 8 YES ABORT Any instruction that can generate an exception Coprocessor Segment Overrun 9 NO ABORT Any operand of an ESC instruction that wraps around the end of a segment. Invalid TSS 10 YES FAULT An invalid-TSS fault is not restartable if it occurs during the processing of an external interrupt. JMP, CALL, IRET, any interrupt Segment not present 11 YES FAULT Any segment-register modifier Stack exception 12 YES FAULT Any memory reference thru SS General Protection 13 YES FAULT/ABORT All GP faults are restartable. If the fault occurs while attempting to vector to the handler for an external interrupt, the interrupted program is restartable, but the interrupt may be lost. Any memory reference or code fetch Page fault 14 YES FAULT Any memory reference or code fetch Coprocessor error 16 YES FAULT Coprocessor errors are reported as a fault on the first ESC or WAIT instruction executed after the ESC instruction that caused the error. ESC, WAIT Two-byte SW Interrupt 0-255 NO TRAP INT n 9.10 Error Code Summary Table 9-7 summarizes the error information that is available with each exception. Table 9-7. Error-Code Summary Description Interrupt Error Code Number Divide error 0 No Debug exceptions 1 No Breakpoint 3 No Overflow 4 No Bounds check 5 No Invalid opcode 6 No Coprocessor not available 7 No System error 8 Yes (always 0) Coprocessor Segment Overrun 9 No Invalid TSS 10 Yes Segment not present 11 Yes Stack exception 12 Yes General protection fault 13 Yes Page fault 14 Yes Coprocessor error 16 No Two-byte SW interrupt 0-255 No Chapter 10 Initialization ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ After a signal on the RESET pin, certain registers of the 80386 are set to predefined values. These values are adequate to enable execution of a bootstrap program, but additional initialization must be performed by software before all the features of the processor can be utilized. 10.1 Processor State After Reset The contents of EAX depend upon the results of the power-up self test. The self-test may be requested externally by assertion of BUSY# at the end of RESET. The EAX register holds zero if the 80386 passed the test. A nonzero value in EAX after self-test indicates that the particular 80386 unit is faulty. If the self-test is not requested, the contents of EAX after RESET is undefined. DX holds a component identifier and revision number after RESET as Figure 10-1 illustrates. DH contains 3, which indicates an 80386 component. DL contains a unique identifier of the revision level. Control register zero (CR0) contains the values shown in Figure 10-2. The ET bit of CR0 is set if an 80387 is present in the configuration (according to the state of the ERROR# pin after RESET). If ET is reset, the configuration either contains an 80287 or does not contain a coprocessor. A software test is required to distinguish between these latter two possibilities. The remaining registers and flags are set as follows: EFLAGS =00000002H IP =0000FFF0H CS selector =000H DS selector =0000H ES selector =0000H SS selector =0000H FS selector =0000H GS selector =0000H IDTR: base =0 limit =03FFH All registers not mentioned above are undefined. These settings imply that the processor begins in real-address mode with interrupts disabled. Figure 10-1. Contents of EDX after RESET EDX REGISTER 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ DH ³ DL º º±±±±±±±±±±±±UNDEFINED±±±±±±±±±±±±³ DEVICE ID ³ STEPPING ID º º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ 3 ³ (UNIQUE) º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 10-2. Initial Contents of CR0 CONTROL REGISTER ZERO 31 23 15 7 4 3 1 0 ÉÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÑÍÑÍÑÍÑÍÑÍ» ºP³ ³E³T³E³M³Pº º ³ UNDEFINED ³ ³ ³ ³ ³ º ºG³ ³T³S³M³P³Eº ÈÑÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÏÑÏÑÏÑÏÑÏѼ ³ ³ ³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄ0 - PAGING DISABLED ³ ³ ³ ³ ³ * - INDICATES PRESENCE OF 80387ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³ ³ 0 - NO TASK SWITCHÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³ 0 - DO NOT MONITOR COPROCESSORÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ 0 - COPROCESSOR NOT PRESENTÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ 0 - PROTECTION NOT ENABLED (REAL ADDRESS MODE)ÄÄÄÄÄÄÄÄÄÄÙ 10.2 Software Initialization for Real-Address Mode In real-address mode a few structures must be initialized before a program can take advantage of all the features available in this mode. 10.2.1 Stack No instructions that use the stack can be used until the stack-segment register (SS) has been loaded. SS must point to an area in RAM. 10.2.2 Interrupt Table The initial state of the 80386 leaves interrupts disabled; however, the processor will still attempt to access the interrupt table if an exception or nonmaskable interrupt (NMI) occurs. Initialization software should take one of the following actions: þ Change the limit value in the IDTR to zero. This will cause a shutdown if an exception or nonmaskable interrupt occurs. (Refer to the 80386 Hardware Reference Manual to see how shutdown is signalled externally.) þ Put pointers to valid interrupt handlers in all positions of the interrupt table that might be used by exceptions or interrupts. þ Change the IDTR to point to a valid interrupt table. 10.2.3 First Instructions After RESET, address lines A{31-20} are automatically asserted for instruction fetches. This fact, together with the initial values of CS:IP, causes instruction execution to begin at physical address FFFFFFF0H. Near (intrasegment) forms of control transfer instructions may be used to pass control to other addresses in the upper 64K bytes of the address space. The first far (intersegment) JMP or CALL instruction causes A{31-20} to drop low, and the 80386 continues executing instructions in the lower one megabyte of physical memory. This automatic assertion of address lines A{31-20} allows systems designers to use a ROM at the high end of the address space to initialize the system. 10.3 Switching to Protected Mode Setting the PE bit of the MSW in CR0 causes the 80386 to begin executing in protected mode. The current privilege level (CPL) starts at zero. The segment registers continue to point to the same linear addresses as in real address mode (in real address mode, linear addresses are the same physical addresses). Immediately after setting the PE flag, the initialization code must flush the processor's instruction prefetch queue by executing a JMP instruction. The 80386 fetches and decodes instructions and addresses before they are used; however, after a change into protected mode, the prefetched instruction information (which pertains to real-address mode) is no longer valid. A JMP forces the processor to discard the invalid information. 10.4 Software Initialization for Protected Mode Most of the initialization needed for protected mode can be done either before or after switching to protected mode. If done in protected mode, however, the initialization procedures must not use protected-mode features that are not yet initialized. 10.4.1 Interrupt Descriptor Table The IDTR may be loaded in either real-address or protected mode. However, the format of the interrupt table for protected mode is different than that for real-address mode. It is not possible to change to protected mode and change interrupt table formats at the same time; therefore, it is inevitable that, if IDTR selects an interrupt table, it will have the wrong format at some time. An interrupt or exception that occurs at this time will have unpredictable results. To avoid this unpredictability, interrupts should remain disabled until interrupt handlers are in place and a valid IDT has been created in protected mode. 10.4.2 Stack The SS register may be loaded in either real-address mode or protected mode. If loaded in real-address mode, SS continues to point to the same linear base-address after the switch to protected mode. 10.4.3 Global Descriptor Table Before any segment register is changed in protected mode, the GDT register must point to a valid GDT. Initialization of the GDT and GDTR may be done in real-address mode. The GDT (as well as LDTs) should reside in RAM, because the processor modifies the accessed bit of descriptors. 10.4.4 Page Tables Page tables and the PDBR in CR3 can be initialized in either real-address mode or in protected mode; however, the paging enabled (PG) bit of CR0 cannot be set until the processor is in protected mode. PG may be set simultaneously with PE, or later. When PG is set, the PDBR in CR3 should already be initialized with a physical address that points to a valid page directory. The initialization procedure should adopt one of the following strategies to ensure consistent addressing before and after paging is enabled: þ The page that is currently being executed should map to the same physical addresses both before and after PG is set. þ A JMP instruction should immediately follow the setting of PG. 10.4.5 First Task The initialization procedure can run awhile in protected mode without initializing the task register; however, before the first task switch, the following conditions must prevail: þ There must be a valid task state segment (TSS) for the new task. The stack pointers in the TSS for privilege levels numerically less than or equal to the initial CPL must point to valid stack segments. þ The task register must point to an area in which to save the current task state. After the first task switch, the information dumped in this area is not needed, and the area can be used for other purposes. 10.5 Initialization Example $TITLE ('Initial Task') NAME INIT init_stack SEGMENT RW DW 20 DUP(?) tos LABEL WORD init_stack ENDS init_data SEGMENT RW PUBLIC DW 20 DUP(?) init_data ENDS init_code SEGMENT ER PUBLIC ASSUME DS:init_data nop nop nop init_start: ; set up stack mov ax, init_stack mov ss, ax mov esp, offset tos mov a1,1 blink: xor a1,1 out 0e4h,a1 mov cx,3FFFh here: dec cx jnz here jmp SHORT blink hlt init_code ends END init_start, SS:init_stack, DS:init_data $TITLE('Protected Mode Transition -- 386 initialization') NAME RESET ;***************************************************************** ; Upon reset the 386 starts executing at address 0FFFFFFF0H. The ; upper 12 address bits remain high until a FAR call or jump is ; executed. ; ; Assume the following: ; ; ; - a short jump at address 0FFFFFFF0H (placed there by the ; system builder) causes execution to begin at START in segment ; RESET_CODE. ; ; ; - segment RESET_CODE is based at physical address 0FFFF0000H, ; i.e. at the start of the last 64K in the 4G address space. ; Note that this is the base of the CS register at reset. If ; you locate ROMcode above this address, you will need to ; figure out an adjustment factor to address things within this ; segment. ; ;***************************************************************** $EJECT ; ; Define addresses to locate GDT and IDT in RAM. ; These addresses are also used in the BLD386 file that defines ; the GDT and IDT. If you change these addresses, make sure you ; change the base addresses specified in the build file. GDTbase EQU 00001000H ; physical address for GDT base IDTbase EQU 00000400H ; physical address for IDT base PUBLIC GDT_EPROM PUBLIC IDT_EPROM PUBLIC START DUMMY segment rw ; ONLY for ASM386 main module stack init DW 0 DUMMY ends ;***************************************************************** ; ; Note: RESET CODE must be USEl6 because the 386 initally executes ; in real mode. ; RESET_CODE segment er PUBLIC USE16 ASSUME DS:nothing, ES:nothing ; ; 386 Descriptor template DESC STRUC lim_0_15 DW 0 ; limit bits (0..15) bas_0_15 DW 0 ; base bits (0..15) bas_16_23 DB 0 ; base bits (16..23) access DB 0 ; access byte gran DB 0 ; granularity byte bas_24_31 DB 0 ; base bits (24..31) DESC ENDS ; The following is the layout of the real GDT created by BLD386. ; It is located in EPROM and will be copied to RAM. ; ; GDT[O] ... NULL ; GDT[1] ... Alias for RAM GDT ; GDT[2] ... Alias for RAM IDT ; GDT[2] ... initial task TSS ; GDT[3] ... initial task TSS alias ; GDT[4] ... initial task LDT ; GDT[5] ... initial task LDT alias ; ; define entries in GDT and IDT. GDT_ENTRIES EQU 8 IDT_ENTRIES EQU 32 ; define some constants to index into the real GDT GDT_ALIAS EQU 1*SIZE DESC IDT_ALIAS EQU 2*SIZE DESC INIT_TSS EQU 3*SIZE DESC INIT_TSS_A EQU 4*SIZE DESC INIT_LDT EQU 5*SIZE DESC INIT_LDT_A EQU 6*SIZE DESC ; ; location of alias in INIT_LDT INIT_LDT_ALIAS EQU 1*SIZE DESC ; ; access rights byte for DATA and TSS descriptors DS_ACCESS EQU 010010010B TSS_ACCESS EQU 010001001B ; ; This temporary GDT will be used to set up the real GDT in RAM. Temp_GDT LABEL BYTE ; tag for begin of scratch GDT NULL_DES DESC <> ; NULL descriptor ; 32-Gigabyte data segment based at 0 FLAT_DES DESC <0FFFFH,0,0,92h,0CFh,0> GDT_eprom DP ? ; Builder places GDT address and limit ; in this 6 byte area. IDT_eprom DP ? ; Builder places IDT address and limit ; in this 6 byte area. ; ; Prepare operand for loadings GDTR and LDTR. TGDT_pword LABEL PWORD ; for temp GDT DW end_Temp_GDT_Temp_GDT -1 DD 0 GDT_pword LABEL PWORD ; for GDT in RAM DW GDT_ENTRIES * SIZE DESC -1 DD GDTbase IDT_pword LABEL PWORD ; for IDT in RAM DW IDT_ENTRIES * SIZE DESC -1 DD IDTbase end_Temp_GDT LABEL BYTE ; ; Define equates for addressing convenience. GDT_DES_FLAT EQU DS:GDT_ALIAS +GDTbase IDT_DES_FLAT EQU DS:IDT_ALIAS +GDTbase INIT_TSS_A_OFFSET EQU DS:INIT_TSS_A INIT_TSS_OFFSET EQU DS:INIT_TSS INIT_LDT_A_OFFSET EQU DS:INIT_LDT_A INIT_LDT_OFFSET EQU DS:INIT_LDT ; define pointer for first task switch ENTRY POINTER LABEL DWORD DW 0, INIT_TSS ;****************************************************************** ; ; Jump from reset vector to here. START: CLI ;disable interrupts CLD ;clear direction flag LIDT NULL_des ;force shutdown on errors ; ; move scratch GDT to RAM at physical 0 XOR DI,DI MOV ES,DI ;point ES:DI to physical location 0 MOV SI,OFFSET Temp_GDT MOV CX,end_Temp_GDT-Temp_GDT ;set byte count INC CX ; ; move table REP MOVS BYTE PTR ES:[DI],BYTE PTR CS:[SI] LGDT tGDT_pword ;load GDTR for Temp. GDT ;(located at 0) ; switch to protected mode MOV EAX,CR0 ;get current CRO MOV EAX,1 ;set PE bit MOV CRO,EAX ;begin protected mode ; ; clear prefetch queue JMP SHORT flush flush: ; set DS,ES,SS to address flat linear space (0 ... 4GB) MOV BX,FLAT_DES-Temp_GDT MOV US,BX MOV ES,BX MOV SS,BX ; ; initialize stack pointer to some (arbitrary) RAM location MOV ESP, OFFSET end_Temp_GDT ; ; copy eprom GDT to RAM MOV ESI,DWORD PTR GDT_eprom +2 ; get base of eprom GDT ; (put here by builder). MOV EDI,GDTbase ; point ES:EDI to GDT base in RAM. MOV CX,WORD PTR gdt_eprom +0 ; limit of eprom GDT INC CX SHR CX,1 ; easier to move words CLD REP MOVS WORD PTR ES:[EDI],WORD PTR DS:[ESI] ; ; copy eprom IDT to RAM ; MOV ESI,DWORD PTR IDT_eprom +2 ; get base of eprom IDT ; (put here by builder) MOV EDI,IDTbase ; point ES:EDI to IDT base in RAM. MOV CX,WORD PTR idt_eprom +0 ; limit of eprom IDT INC CX SHR CX,1 CLD REP MOVS WORD PTR ES:[EDI],WORD PTR DS:[ESI] ; switch to RAM GDT and IDT ; LIDT IDT_pword LGDT GDT_pword ; MOV BX,GDT_ALIAS ; point DS to GDT alias MOV DS,BX ; ; copy eprom TSS to RAM ; MOV BX,INIT_TSS_A ; INIT TSS A descriptor base ; has RAM location of INIT TSS. MOV ES,BX ; ES points to TSS in RAM MOV BX,INIT_TSS ; get inital task selector LAR DX,BX ; save access byte MOV [BX].access,DS_ACCESS ; set access as data segment MOV FS,BX ; FS points to eprom TSS XOR si,si ; FS:si points to eprom TSS XOR di,di ; ES:di points to RAM TSS MOV CX,[BX].lim_0_15 ; get count to move INC CX ; ; move INIT_TSS to RAM. REP MOVS BYTE PTR ES:[di],BYTE PTR FS:[si] MOV [BX].access,DH ; restore access byte ; ; change base of INIT TSS descriptor to point to RAM. MOV AX,INIT_TSS_A_OFFSET.bas_0_15 MOV INIT_TSS_OFFSET.bas_0_15,AX MOV AL,INIT_TSS_A_OFFSET.bas_16_23 MOV INIT_TSS_OFFSET.bas_16_23,AL MOV AL,INIT_TSS_A_OFFSET.bas_24_31 MOV INIT_TSS_OFFSET.bas_24_31,AL ; ; change INIT TSS A to form a save area for TSS on first task ; switch. Use RAM at location 0. MOV BX,INIT_TSS_A MOV WORD PTR [BX].bas_0_15,0 MOV [BX].bas_16_23,0 MOV [BX].bas_24_31,0 MOV [BX].access,TSS_ACCESS MOV [BX].gran,O LTR BX ; defines save area for TSS ; ; copy eprom LDT to RAM MOV BX,INIT_LDT_A ; INIT_LDT_A descriptor has ; base address in RAM for INIT_LDT. MOV ES,BX ; ES points LDT location in RAM. MOV AH,[BX].bas_24_31 MOV AL,[BX].bas_16_23 SHL EAX,16 MOV AX,[BX].bas_0_15 ; save INIT_LDT base (ram) in EAX MOV BX,INIT_LDT ; get inital LDT selector LAR DX,BX ; save access rights MOV [BX].access,DS_ACCESS ; set access as data segment MOV FS,BX ; FS points to eprom LDT XOR si,si ; FS:SI points to eprom LDT XOR di,di ; ES:DI points to RAM LDT MOV CX,[BX].lim_0_15 ; get count to move INC CX ; ; move initial LDT to RAM REP MOVS BYTE PTR ES:[di],BYTE PTR FS:[si] MOV [BX].access,DH ; restore access rights in ; INIT_LDT descriptor ; ; change base of alias (of INIT_LDT) to point to location in RAM. MOV ES:[INIT_LDT_ALIAS].bas_0_15,AX SHR EAX,16 MOV ES:[INIT_LDT_ALIAS].bas_16_23,AL MOV ES:[INIT_LDT_ALIAS].bas_24_31,AH ; ; now set the base value in INIT_LDT descriptor MOV AX,INIT_LDT_A_OFFSET.bas_0_15 MOV INIT_LDT_OFFSET.bas_0_15,AX MOV AL,INIT_LDT_A_OFFSET.bas_16_23 MOV INIT_LDT_OFFSET.bas_16_23,AL MOV AL,INIT_LDT_A_OFFSET.bas_24_31 MOV INIT_LDT_OFFSET.bas_24_31,AL ; ; Now GDT, IDT, initial TSS and initial LDT are all set up. ; ; Start the first task! ' JMP ENTRY_POINTER RESET_CODE ends END START, SS:DUMMY,DS:DUMMY 10.6 TLB Testing The 80386 provides a mechanism for testing the Translation Lookaside Buffer (TLB), the cache used for translating linear addresses to physical addresses. Although failure of the TLB hardware is extremely unlikely, users may wish to include TLB confidence tests among other power-up confidence tests for the 80386. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTE This TLB testing mechanism is unique to the 80386 and may not be continued in the same way in future processors. Sortware that uses this mechanism may be incompatible with future processors. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ When testing the TLB it is recommended that paging be turned off (PG=0 in CR0) to avoid interference with the test data being written to the TLB. 10.6.1 Structure of the TLB The TLB is a four-way set-associative memory. Figure 10-3 illustrates the structure of the TLB. There are four sets of eight entries each. Each entry consists of a tag and data. Tags are 24-bits wide. They contain the high-order 20 bits of the linear address, the valid bit, and three attribute bits. The data portion of each entry contains the high-order 20 bits of the physical address. 10.6.2 Test Registers Two test registers, shown in Figure 10-4, are provided for the purpose of testing. TR6 is the test command register, and TR7 is the test data register. These registers are accessed by variants of the MOV instruction. A test register may be either the source operand or destination operand. The MOV instructions are defined in both real-address mode and protected mode. The test registers are privileged resources; in protected mode, the MOV instructions that access them can only be executed at privilege level 0. An attempt to read or write the test registers when executing at any other privilege level causes a general protection exception. The test command register (TR6) contains a command and an address tag to use in performing the command: C This is the command bit. There are two TLB testing commands: write entries into the TLB, and perform TLB lookups. To cause an immediate write into the TLB entry, move a doubleword into TR6 that contains a 0 in this bit. To cause an immediate TLB lookup, move a doubleword into TR6 that contains a 1 in this bit. Linear On a TLB write, a TLB entry is allocated to this linear address; Address the rest of that TLB entry is set per the value of TR7 and the value just written into TR6. On a TLB lookup, the TLB is interrogated per this value; if one and only one TLB entry matches, the rest of the fields of TR6 and TR7 are set from the matching TLB entry. V The valid bit for this TLB entry. The TLB uses the valid bit to identify entries that contain valid data. Entries of the TLB that have not been assigned values have zero in the valid bit. All valid bits can be cleared by writing to CR3. D, D# The dirty bit (and its complement) for/from the TLB entry. U, U# The U/S bit (and its complement) for/from the TLB entry. W, W# The R/W bit (and its complement) for/from the TLB entry. The meaning of these pairs of bits is given by Table 10-1, where X represents D, U, or W. The test data register (TR7) holds data read from or data to be written to the TLB. Physical This is the data field of the TLB. On a write to the TLB, the Address TLB entry allocated to the linear address in TR6 is set to this value. On a TLB lookup, if HT is set, the data field (physical address) from the TLB is read out to this field. If HT is not set, this field is undefined. HT For a TLB lookup, the HT bit indicates whether the lookup was a hit (HT  1) or a miss (HT  0). For a TLB write, HT must be set to 1. REP For a TLB write, selects which of four associative blocks of the TLB is to be written. For a TLB read, if HT is set, REP reports in which of the four associative blocks the tag was found; if HT is not set, REP is undefined. Table 10-1. Meaning of D, U, and W Bit Pairs X X# Effect during Value of bit X TLB Lookup after TLB Write 0 0 (undefined) (undefined) 0 1 Match if X=0 Bit X becomes 0 1 0 Match if X=1 Bit X becomes 1 1 1 (undefined) (undefined) Figure 10-3. TLB Structure ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» 7º TAG º DATA º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹    ÚÄÄÄÄÄÄÄ    ³ SET 11    ³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 1º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 0º TAG º DATA º ³ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ³ ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ 7º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³    ³ ÀÄÄ    ³ SET 10    ³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 1º TAG º DATA º ³ D ³ ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ A ³ ³ ³ 0º TAG º DATA º ³ T ÀÄÄÄÄÄÄÙ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ A ³ ³ ÚÄÄÄÄÄÄ¿ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ B ³ ³ ³ 7º TAG º DATA º ³ U ³ ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ S ³ ³ ³    ³ ÀÄÄ    ³ SET 01    ³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 1º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 0º TAG º DATA º ³ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ³ ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ 7º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³    ³ ÀÄÄ    ³ SET 00    ÀÄÄÄÄÄÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ 1º TAG º DATA º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ 0º TAG º DATA º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 10-4. Test Registers 31 23 15 11 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍØÍÍÍÍÍÍÍØÍÍÍÍÍÑÍÑÍÍÍÑÍÍÍ» º ³ ³H³ ³ º º PHYSICAL ADDRESS ³0 0 0 0 0 0 0³ ³REP³0 0º TR7 º ³ ³T³ ³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÂÄÂÄÂÄÂÄÂÄÂÄÅÄÁÄÄÄÁÄÂĶ º ³ ³ ³D³ ³U³ ³W³ ³ º º LINEAR ADDRESS ³V³D³ ³U³ ³ ³ ³0 0 0 0³Cº TR8 º ³ ³ ³#³ ³#³ ³#³ ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍØÍÏÍÏÍÏÍØÍÏÍÏÍÏÍÍÍÍÍÍÍÏͼ NOTE: 0 INDICATES INTEL RESERVED. NO NOT DEFINE 10.6.3 Test Operations To write a TLB entry: 1. Move a doubleword to TR7 that contains the desired physical address, HT, and REP values. HT must contain 1. REP must point to the associative block in which to place the entry. 2. Move a doubleword to TR6 that contains the appropriate linear address, and values for V, D, U, and W. Be sure C=0 for "write" command. Be careful not to write duplicate tags; the results of doing so are undefined. To look up (read) a TLB entry: 1. Move a doubleword to TR6 that contains the appropriate linear address and attributes. Be sure C=1 for "lookup" command. 2. Store TR7. If the HT bit in TR7 indicates a hit, then the other values reveal the TLB contents. If HT indicates a miss, then the other values in TR7 are indeterminate. For the purposes of testing, the V bit functions as another bit of addresss. The V bit for a lookup request should usually be set, so that uninitialized tags do not match. Lookups with V=0 are unpredictable if any tags are uninitialized. Chapter 11 Coprocessing and Multiprocessing ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The 80386 has two levels of support for multiple parallel processing units: þ A highly specialized interface for very closely coupled processors of a type known as coprocessors. þ A more general interface for more loosely coupled processors of unspecified type. 11.1 Coprocessing The components of the coprocessor interface include: þ ET bit of control register zero (CR0) þ The EM, and MP bits of CR0 þ The ESC instructions þ The WAIT instruction þ The TS bit of CR0 þ Exceptions 11.1.1 Coprocessor Identification The 80386 is designed to operate with either an 80287 or 80387 math coprocessor. The ET bit of CR0 indicates which type of coprocessor is present. ET is set automatically by the 80386 after RESET according to the level detected on the ERROR# input. If desired, ET may also be set or reset by loading CR0 with a MOV instruction. If ET is set, the 80386 uses the 32-bit protocol of the 80387; if reset, the 80386 uses the 16-bit protocol of the 80287. 11.1.2 ESC and WAIT Instructions The 80386 interprets the pattern 11011B in the first five bits of an instruction as an opcode intended for a coprocessor. Instructions thus marked are called ESCAPE or ESC instructions. The CPU performs the following functions upon encountering an ESC instruction before sending the instruction to the coprocessor: þ Tests the emulation mode (EM) flag to determine whether coprocessor functions are being emulated by software. þ Tests the TS flag to determine whether there has been a context change since the last ESC instruction. þ For some ESC instructions, tests the ERROR# pin to determine whether the coprocessor detected an error in the previous ESC instruction. The WAIT instruction is not an ESC instruction, but WAIT causes the CPU to perform some of the same tests that it performs upon encountering an ESC instruction. The processor performs the following actions for a WAIT instruction: þ Waits until the coprocessor no longer asserts the BUSY# pin. þ Tests the ERROR# pin (after BUSY# goes inactive). If ERROR# is active, the 80386 signals exception 16, which indicates that the coprocessor encountered an error in the previous ESC instruction. þ WAIT can therefore be used to cause exception 16 if an error is pending from a previous ESC instruction. Note that, if no coprocessor is present, the ERROR# and BUSY# pins should be tied inactive to prevent WAIT from waiting forever or causing spurious exceptions. 11.1.3 EM and MP Flags The EM and MP flags of CR0 control how the processor reacts to coprocessor instructions. The EM bit indicates whether coprocessor functions are to be emulated. If the processor finds EM set when executing an ESC instruction, it signals exception 7, giving the exception handler an opportunity to emulate the ESC instruction. The MP (monitor coprocessor) bit indicates whether a coprocessor is actually attached. The MP flag controls the function of the WAIT instruction. If, when executing a WAIT instruction, the CPU finds MP set, then it tests the TS flag; it does not otherwise test TS during a WAIT instruction. If it finds TS set under these conditions, the CPU signals exception 7. The EM and MP flags can be changed with the aid of a MOV instruction using CR0 as the destination operand and read with the aid of a MOV instruction with CR0 as the source operand. These forms of the MOV instruction can be executed only at privilege level zero. 11.1.4 The Task-Switched Flag The TS bit of CR0 helps to determine when the context of the coprocessor does not match that of the task being executed by the 80386 CPU. The 80386 sets TS each time it performs a task switch (whether triggered by software or by hardware interrupt). If, when interpreting one of the ESC instructions, the CPU finds TS already set, it causes exception 7. The WAIT instruction also causes exception 7 if both TS and MP are set. Operating systems can use this exception to switch the context of the coprocessor to correspond to the current task. Refer to the 80386 System Software Writer's Guide for an example. The CLTS instruction (legal only at privilege level zero) resets the TS flag. 11.1.5 Coprocessor Exceptions Three exceptions aid in interfacing to a coprocessor: interrupt 7 (coprocessor not available), interrupt 9 (coprocessor segment overrun), and interrupt 16 (coprocessor error). 11.1.5.1 Interrupt 7 ÄÄ Coprocessor Not Available This exception occurs in either of two conditions: 1. The CPU encounters an ESC instruction and EM is set. In this case, the exception handler should emulate the instruction that caused the exception. TS may also be set. 2. The CPU encounters either the WAIT instruction or an ESC instruction when both MP and TS are set. In this case, the exception handler should update the state of the coprocessor, if necessary. 11.1.5.2 Interrupt 9 ÄÄ Coprocessor Segment Overrun This exception occurs in protected mode under the following conditions: þ An operand of a coprocessor instruction wraps around an addressing limit (0FFFFH for small segments, 0FFFFFFFFH for big segments, zero for expand-down segments). An operand may wrap around an addressing limit when the segment limit is near an addressing limit and the operand is near the largest valid address in the segment. Because of the wrap-around, the beginning and ending addresses of such an operand will be near opposite ends of the segment. þ Both the first byte and the last byte of the operand (considering wrap-around) are at addresses located in the segment and in present and accessible pages. þ The operand spans inaccessible addresses. There are two ways that such an operand may also span inaccessible addresses: 1. The segment limit is not equal to the addressing limit (e.g., addressing limit is FFFFH and segment limit is FFFDH); therefore, the operand will span addresses that are not within the segment (e.g., an 8-byte operand that starts at valid offset FFFC will span addresses FFFC-FFFF and 0000-0003; however, addresses FFFE and FFFF are not valid, because they exceed the limit); 2. The operand begins and ends in present and accessible pages but intermediate bytes of the operand fall either in a not-present page or in a page to which the current procedure does not have access rights. The address of the failing numerics instruction and data operand may be lost; an FSTENV does not return reliable addresses. As with the 80286/80287, the segment overrun exception should be handled by executing an FNINIT instruction (i.e., an FINIT without a preceding WAIT). The return address on the stack does not necessarily point to the failing instruction nor to the following instruction. The failing numerics instruction is not restartable. Case 2 can be avoided by either aligning all segments on page boundaries or by not starting them within 108 bytes of the start or end of a page. (The maximum size of a coprocessor operand is 108 bytes.) Case 1 can be avoided by making sure that the gap between the last valid offset and the first valid offset of a segment is either no less than 108 bytes or is zero (i.e., the segment is of full size). If neither software system design constraint is acceptable, the exception handler should execute FNINIT and should probably terminate the task. 11.1.5.3 Interrupt 16 ÄÄ Coprocessor Error The numerics coprocessors can detect six different exception conditions during instruction execution. If the detected exception is not masked by a bit in the control word, the coprocessor communicates the fact that an error occurred to the CPU by a signal at the ERROR# pin. The CPU causes interrupt 16 the next time it checks the ERROR# pin, which is only at the beginning of a subsequent WAIT or certain ESC instructions. If the exception is masked, the numerics coprocessor handles the exception according to on-board logic; it does not assert the ERROR# pin in this case. 11.2 General Multiprocessing The components of the general multiprocessing interface include: þ The LOCK# signal þ The LOCK instruction prefix, which gives programmed control of the LOCK# signal. þ Automatic assertion of the LOCK# signal with implicit memory updates by the processor 11.2.1 LOCK and the LOCK# Signal The LOCK instruction prefix and its corresponding output signal LOCK# can be used to prevent other bus masters from interrupting a data movement operation. LOCK may only be used with the following 80386 instructions when they modify memory. An undefined-opcode exception results from using LOCK before any instruction other than: þ Bit test and change: BTS, BTR, BTC. þ Exchange: XCHG. þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR. þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG. A locked instruction is only guaranteed to lock the area of memory defined by the destination operand, but it may lock a larger memory area. For example, typical 8086 and 80286 configurations lock the entire physical memory space. The area of memory defined by the destination operand is guaranteed to be locked against access by a processor executing a locked instruction on exactly the same memory area, i.e., an operand with identical starting address and identical length. The integrity of the lock is not affected by the alignment of the memory field. The LOCK signal is asserted for as many bus cycles as necessary to update the entire operand. 11.2.2 Automatic Locking In several instances, the processor itself initiates activity on the data bus. To help ensure that such activities function correctly in multiprocessor configurations, the processor automatically asserts the LOCK# signal. These instances include: þ Acknowledging interrupts. After an interrupt request, the interrupt controller uses the data bus to send the interrupt ID of the interrupt source to the CPU. The CPU asserts LOCK# to ensure that no other data appears on the data bus during this time. þ Setting busy bit of TSS descriptor. The processor tests and sets the busy-bit in the type field of the TSS descriptor when switching to a task. To ensure that two different processors cannot simultaneously switch to the same task, the processor asserts LOCK# while testing and setting this bit. þ Loading of descriptors. While copying the contents of a descriptor from a descriptor table into a segment register, the processor asserts LOCK# so that the descriptor cannot be modified by another processor while it is being loaded. For this action to be effective, operating-system procedures that update descriptors should adhere to the following steps: ÄÄ Use a locked update to the access-rights byte to mark the descriptor not-present. ÄÄ Update the fields of the descriptor. (This may require several memory accesses; therefore, LOCK cannot be used.) ÄÄ Use a locked update to the access-rights byte to mark the descriptor present again. þ Updating page-table A and D bits. The processor exerts LOCK# while updating the A (accessed) and D (dirty) bits of page-table entries. Also the processor bypasses the page-table cache and directly updates these bits in memory. þ Executing XCHG instruction. The 80386 always asserts LOCK during an XCHG instruction that references memory (even if the LOCK prefix is not used). 11.2.3 Cache Considerations Systems programmers must take care when updating shared data that may also be stored in on-chip registers and caches. With the 80386, such shared data includes: þ Descriptors, which may be held in segment registers. A change to a descriptor that is shared among processors should be broadcast to all processors. Segment registers are effectively "descriptor caches". A change to a descriptor will not be utilized by another processor if that processor already has a copy of the old version of the descriptor in a segment register. þ Page tables, which may be held in the page-table cache. A change to a page table that is shared among processors should be broadcast to all processors, so that others can flush their page-table caches and reload them with up-to-date page tables from memory. Systems designers can employ an interprocessor interrupt to handle the above cases. When one processor changes data that may be cached by other processors, it can send an interrupt signal to all other processors that may be affected by the change. If the interrupt is serviced by an interrupt task, the task switch automatically flushes the segment registers. The task switch also flushes the page-table cache if the PDBR (the contents of CR3) of the interrupt task is different from the PDBR of every other task. In multiprocessor systems that need a cacheability signal from the CPU, it is recommended that physical address pin A31 be used to indicate cacheability. Such a system can then possess up to 2 Gbytes of physical memory. The virtual address range available to the programmer is not affected by this convention. Chapter 12 Debugging ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The 80386 brings to Intel's line of microprocessors significant advances in debugging power. The single-step exception and breakpoint exception of previous processors are still available in the 80386, but the principal debugging support takes the form of debug registers. The debug registers support both instruction breakpoints and data breakpoints. Data breakpoints are an important innovation that can save hours of debugging time by pinpointing, for example, exactly when a data structure is being overwritten. The breakpoint registers also eliminate the complexities associated with writing a breakpoint instruction into a code segment (requires a data-segment alias for the code segment) or a code segment shared by multiple tasks (the breakpoint exception can occur in the context of any of the tasks). Breakpoints can even be set in code contained in ROM. 12.1 Debugging Features of the Architecture The features of the 80386 architecture that support debugging include: Reserved debug interrupt vector Permits processor to automatically invoke a debugger task or procedure when an event occurs that is of interest to the debugger. Four debug address registers Permit programmers to specify up to four addresses that the CPU will automatically monitor. Debug control register Allows programmers to selectively enable various debug conditions associated with the four debug addresses. Debug status register Helps debugger identify condition that caused debug exception. Trap bit of TSS (T-bit) Permits monitoring of task switches. Resume flag (RF) of flags register Allows an instruction to be restarted after a debug exception without immediately causing another debug exception due to the same condition. Single-step flag (TF) Allows complete monitoring of program flow by specifying whether the CPU should cause a debug exception with the execution of every instruction. Breakpoint instruction Permits debugger intervention at any point in program execution and aids debugging of debugger programs. Reserved interrupt vector for breakpoint exception Permits processor to automatically invoke a handler task or procedure upon encountering a breakpoint instruction. These features make it possible to invoke a debugger that is either a separate task or a procedure in the context of the current task. The debugger can be invoked under any of the following kinds of conditions: þ Task switch to a specific task. þ Execution of the breakpoint instruction. þ Execution of every instruction. þ Execution of any instruction at a given address. þ Read or write of a byte, word, or doubleword at any specified address. þ Write to a byte, word, or doubleword at any specified address. þ Attempt to change a debug register. 12.2 Debug Registers Six 80386 registers are used to control debug features. These registers are accessed by variants of the MOV instruction. A debug register may be either the source operand or destination operand. The debug registers are privileged resources; the MOV instructions that access them can only be executed at privilege level zero. An attempt to read or write the debug registers when executing at any other privilege level causes a general protection exception. Figure 12-1 shows the format of the debug registers. Figure 12-1. Debug Registers 31 23 15 7 0 ÉÍÍÍÑÍÍÍÑÍÍÍÑÍÍÍØÍÍÍÑÍÍÍÑÍÍÍÑÍÍÍØÍÍÍÑÍÑÍÍÍÍÍÑÍÑÍØÍÑÍÑÍÑÍÑÍÑÍÑÍÑÍ» ºLEN³R/W³LEN³R/W³LEN³R/W³LEN³R/W³ ³ ³ ³G³L³G³L³G³L³G³L³G³Lº º ³ ³ ³ ³ ³ ³ ³ ³0 0³0³0 0 0³ ³ ³ ³ ³ ³ ³ ³ ³ ³ º DR7 º 3 ³ 3 ³ 2 ³ 2 ³ 1 ³ 1 ³ 0 ³ 0 ³ ³ ³ ³E³E³3³3³2³2³1³1³0³0º ÇÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÅÄÂÄÅÄÅÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÅÄÅÄÅÄÅĶ º ³B³B³B³ ³B³B³B³Bº º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0³ ³ ³ ³0 0 0 0 0 0 0 0 0³ ³ ³ ³ º DR6 º ³T³S³D³ ³3³2³1³0º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÁÄÁÄÁÄÁĶ º º º RESERVED º DR5 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º RESERVED º DR4 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 3 LINEAR ADDRESS º DR3 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 2 LINEAR ADDRESS º DR2 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 1 LINEAR ADDRESS º DR1 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 0 LINEAR ADDRESS º DR0 º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTE 0 MEANS INTEL RESERVED. DO NOT DEFINE. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 12.2.1 Debug Address Registers (DR0-DR3) Each of these registers contains the linear address associated with one of four breakpoint conditions. Each breakpoint condition is further defined by bits in DR7. The debug address registers are effective whether or not paging is enabled. The addresses in these registers are linear addresses. If paging is enabled, the linear addresses are translated into physical addresses by the processor's paging mechanism (as explained in Chapter 5). If paging is not enabled, these linear addresses are the same as physical addresses. Note that when paging is enabled, different tasks may have different linear-to-physical address mappings. When this is the case, an address in a debug address register may be relevant to one task but not to another. For this reason the 80386 has both global and local enable bits in DR7. These bits indicate whether a given debug address has a global (all tasks) or local (current task only) relevance. 12.2.2 Debug Control Register (DR7) The debug control register shown in Figure 12-1 both helps to define the debug conditions and selectively enables and disables those conditions. For each address in registers DR0-DR3, the corresponding fields R/W0 through R/W3 specify the type of action that should cause a breakpoint. The processor interprets these bits as follows: 00 ÄÄ Break on instruction execution only 01 ÄÄ Break on data writes only 10 ÄÄ undefined 11 ÄÄ Break on data reads or writes but not instruction fetches Fields LEN0 through LEN3 specify the length of data item to be monitored. A length of 1, 2, or 4 bytes may be specified. The values of the length fields are interpreted as follows: 00 ÄÄ one-byte length 01 ÄÄ two-byte length 10 ÄÄ undefined 11 ÄÄ four-byte length If RWn is 00 (instruction execution), then LENn should also be 00. Any other length is undefined. The low-order eight bits of DR7 (L0 through L3 and G0 through G3) selectively enable the four address breakpoint conditions. There are two levels of enabling: the local (L0 through L3) and global (G0 through G3) levels. The local enable bits are automatically reset by the processor at every task switch to avoid unwanted breakpoint conditions in the new task. The global enable bits are not reset by a task switch; therefore, they can be used for conditions that are global to all tasks. The LE and GE bits control the "exact data breakpoint match" feature of the processor. If either LE or GE is set, the processor slows execution so that data breakpoints are reported on the instruction that causes them. It is recommended that one of these bits be set whenever data breakpoints are armed. The processor clears LE at a task switch but does not clear GE. 12.2.3 Debug Status Register (DR6) The debug status register shown in Figure 12-1 permits the debugger to determine which debug conditions have occurred. When the processor detects an enabled debug exception, it sets the low-order bits of this register (B0 thru B3) before entering the debug exception handler. Bn is set if the condition described by DRn, LENn, and R/Wn occurs. (Note that the processor sets Bn regardless of whether Gn or Ln is set. If more than one breakpoint condition occurs at one time and if the breakpoint trap occurs due to an enabled condition other than n, Bn may be set, even though neither Gn nor Ln is set.) The BT bit is associated with the T-bit (debug trap bit) of the TSS (refer to 7 for the location of the T-bit). The processor sets the BT bit before entering the debug handler if a task switch has occurred and the T-bit of the new TSS is set. There is no corresponding bit in DR7 that enables and disables this trap; the T-bit of the TSS is the sole enabling bit. The BS bit is associated with the TF (trap flag) bit of the EFLAGS register. The BS bit is set if the debug handler is entered due to the occurrence of a single-step exception. The single-step trap is the highest-priority debug exception; therefore, when BS is set, any of the other debug status bits may also be set. The BD bit is set if the next instruction will read or write one of the eight debug registers and ICE-386 is also using the debug registers at the same time. Note that the bits of DR6 are never cleared by the processor. To avoid any confusion in identifying the next debug exception, the debug handler should move zeros to DR6 immediately before returning. 12.2.4 Breakpoint Field Recognition The linear address and LEN field for each of the four breakpoint conditions define a range of sequential byte addresses for a data breakpoint. The LEN field permits specification of a one-, two-, or four-byte field. Two-byte fields must be aligned on word boundaries (addresses that are multiples of two) and four-byte fields must be aligned on doubleword boundaries (addresses that are multiples of four). These requirements are enforced by the processor; it uses the LEN bits to mask the low-order bits of the addresses in the debug address registers. Improperly aligned code or data breakpoint addresses will not yield the expected results. A data read or write breakpoint is triggered if any of the bytes participating in a memory access is within the field defined by a breakpoint address register and the corresponding LEN field. Table 12-1 gives some examples of breakpoint fields with memory references that both do and do not cause traps. To set a data breakpoint for a misaligned field longer than one byte, it may be desirable to put two sets of entries in the breakpoint register such that each entry is properly aligned and the two entries together span the length of the field. Instruction breakpoint addresses must have a length specification of one byte (LEN = 00); other values are undefined. The processor recognizes an instruction breakpoint address only when it points to the first byte of an instruction. If the instruction has any prefixes, the breakpoint address must point to the first prefix. Table 12-1. Breakpoint Field Recognition Examples Address (hex) Length DR0 0A0001 1 (LEN0 = 00) Register Contents DR1 0A0002 1 (LEN1 = 00) DR2 0B0002 2 (LEN2 = 01) DR3 0C0000 4 (LEN3 = 11) Some Examples of Memory 0A0001 1 References That Cause Traps 0A0002 1 0A0001 2 0A0002 2 0B0002 2 0B0001 4 0C0000 4 0C0001 2 0C0003 1 Some Examples of Memory 0A0000 1 References That Don't Cause Traps 0A0003 4 0B0000 2 0C0004 4 12.3 Debug Exceptions Two of the interrupt vectors of the 80386 are reserved for exceptions that relate to debugging. Interrupt 1 is the primary means of invoking debuggers designed expressly for the 80386; interrupt 3 is intended for debugging debuggers and for compatibility with prior processors in Intel's 8086 processor family. 12.3.1 Interrupt 1 ÄÄ Debug Exceptions The handler for this exception is usually a debugger or part of a debugging system. The processor causes interrupt 1 for any of several conditions. The debugger can check flags in DR6 and DR7 to determine what condition caused the exception and what other conditions might be in effect at the same time. Table 12-2 associates with each breakpoint condition the combination of bits that indicate when that condition has caused the debug exception. Instruction address breakpoint conditions are faults, while other debug conditions are traps. The debug exception may report either or both at one time. The following paragraphs present details for each class of debug exception. Table 12-2. Debug Exception Conditions Flags to Test Condition BS=1 Single-step trap B0=1 AND (GE0=1 OR LE0=1) Breakpoint DR0, LEN0, R/W0 B1=1 AND (GE1=1 OR LE1=1) Breakpoint DR1, LEN1, R/W1 B2=1 AND (GE2=1 OR LE2=1) Breakpoint DR2, LEN2, R/W2 B3=1 AND (GE3=1 OR LE3=1) Breakpoint DR3, LEN3, R/W3 BD=1 Debug registers not available; in use by ICE-386. BT=1 Task switch 12.3.1.1 Instruction Addrees Breakpoint The processor reports an instruction-address breakpoint before it executes the instruction that begins at the given address; i.e., an instruction- address breakpoint exception is a fault. The RF (restart flag) permits the debug handler to retry instructions that cause other kinds of faults in addition to debug faults. When it detects a fault, the processor automatically sets RF in the flags image that it pushes onto the stack. (It does not, however, set RF for traps and aborts.) When RF is set, it causes any debug fault to be ignored during the next instruction. (Note, however, that RF does not cause breakpoint traps to be ignored, nor other kinds of faults.) The processor automatically clears RF at the successful completion of every instruction except after the IRET instruction, after the POPF instruction, and after a JMP, CALL, or INT instruction that causes a task switch. These instructions set RF to the value specified by the memory image of the EFLAGS register. The processor automatically sets RF in the EFLAGS image on the stack before entry into any fault handler. Upon entry into the fault handler for instruction address breakpoints, for example, RF is set in the EFLAGS image on the stack; therefore, the IRET instruction at the end of the handler will set RF in the EFLAGS register, and execution will resume at the breakpoint address without generating another breakpoint fault at the same address. If, after a debug fault, RF is set and the debug handler retries the faulting instruction, it is possible that retrying the instruction will raise other faults. The retry of the instruction after these faults will also be done with RF=1, with the result that debug faults continue to be ignored. The processor clears RF only after successful completion of the instruction. Real-mode debuggers can control the RF flag by using a 32-bit IRET. A 16-bit IRET instruction does not affect the RF bit (which is in the high-order 16 bits of EFLAGS). To use a 32-bit IRET, the debugger must rearrange the stack so that it holds appropriate values for the 32-bit EIP, CS, and EFLAGS (with RF set in the EFLAGS image). Then executing an IRET with an operand size prefix causes a 32-bit return, popping the RF flag into EFLAGS. 12.3.1.2 Data Address Breakpoint A data-address breakpoint exception is a trap; i.e., the processor reports a data-address breakpoint after executing the instruction that accesses the given memory item. When using data breakpoints it is recommended that either the LE or GE bit of DR7 be set also. If either LE or GE is set, any data breakpoint trap is reported exactly after completion of the instruction that accessed the specified memory item. This exact reporting is accomplished by forcing the 80386 execution unit to wait for completion of data operand transfers before beginning execution of the next instruction. If neither GE nor LE is set, data breakpoints may not be reported until one instruction after the data is accessed or may not be reported at all. This is due to the fact that, normally, instruction execution is overlapped with memory transfers to such a degree that execution of the next instruction may begin before memory transfers for the prior instruction are completed. If a debugger needs to preserve the contents of a write breakpoint location, it should save the original contents before setting a write breakpoint. Because data breakpoints are traps, a write into a breakpoint location will complete before the trap condition is reported. The handler can report the saved value after the breakpoint is triggered. The data in the debug registers can be used to address the new value stored by the instruction that triggered the breakpoint. 12.3.1.3 General Detect Fault This exception occurs when an attempt is made to use the debug registers at the same time that ICE-386 is using them. This additional protection feature is provided to guarantee that ICE-386 can have full control over the debug-register resources when required. ICE-386 uses the debug-registers; therefore, a software debugger that also uses these registers cannot run while ICE-386 is in use. The exception handler can detect this condition by examining the BD bit of DR6. 12.3.1.4 Single-Step Trap This debug condition occurs at the end of an instruction if the trap flag (TF) of the flags register held the value one at the beginning of that instruction. Note that the exception does not occur at the end of an instruction that sets TF. For example, if POPF is used to set TF, a single-step trap does not occur until after the instruction that follows POPF. The processor clears the TF bit before invoking the handler. If TF=1 in the flags image of a TSS at the time of a task switch, the exception occurs after the first instruction is executed in the new task. The single-step flag is normally not cleared by privilege changes inside a task. INT instructions, however, do clear TF. Therefore, software debuggers that single-step code must recognize and emulate INT n or INTO rather than executing them directly. To maintain protection, system software should check the current execution privilege level after any single step interrupt to see whether single stepping should continue at the current privilege level. The interrupt priorities in hardware guarantee that if an external interrupt occurs, single stepping stops. When both an external interrupt and a single step interrupt occur together, the single step interrupt is processed first. This clears the TF bit. After saving the return address or switching tasks, the external interrupt input is examined before the first instruction of the single step handler executes. If the external interrupt is still pending, it is then serviced. The external interrupt handler is not single-stepped. To single step an interrupt handler, just single step an INT n instruction that refers to the interrupt handler. 12.3.1.5 Task Switch Breakpoint The debug exception also occurs after a switch to an 80386 task if the T-bit of the new TSS is set. The exception occurs after control has passed to the new task, but before the first instruction of that task is executed. The exception handler can detect this condition by examining the BT bit of the debug status register DR6. Note that if the debug exception handler is a task, the T-bit of its TSS should not be set. Failure to observe this rule will cause the processor to enter an infinite loop. 12.3.2 Interrupt 3 ÄÄ Breakpoint Exception This exception is caused by execution of the breakpoint instruction INT 3. Typically, a debugger prepares a breakpoint by substituting the opcode of the one-byte breakpoint instruction in place of the first opcode byte of the instruction to be trapped. When execution of the INT 3 instruction causes the exception handler to be invoked, the saved value of ES:EIP points to the byte following the INT 3 instruction. With prior generations of processors, this feature is used extensively for trapping execution of specific instructions. With the 80386, the needs formerly filled by this feature are more conveniently solved via the debug registers and interrupt 1. However, the breakpoint exception is still useful for debugging debuggers, because the breakpoint exception can vector to a different exception handler than that used by the debugger. The breakpoint exception can also be useful when it is necessary to set a greater number of breakpoints than permitted by the debug registers.