Re: [perf-discuss] Project Proposal: x86 APIC Scalability

Wen Lei Wed, 06 May 2009 16:19:21 -0700

Attached the proposal text previously linked to a Sun internal site.
-- 
This message posted from opensolaris.org

X86 MP Interrupt Support
========================



1. Introduction
---------------

  Today, Solaris OS x86 interrupt implementation was primarily designed for
  uniprocessor systems. On x86 multi-processor(MP) platform, it has below
  issues.

        1. An x86 MP system can in theory support up to #LocalAPIC * 256
           unique interrupts. For historical reasons, the current Solaris x86
           design limits the number to a global 256 vectors.

        2. The current Solaris x86 low level interrupt design couples the
           hardware interrupt priorities with OS interrupt priorities. This
           further bounds the number of unique interrupts to 32, 16 or less
           per interrupt priority level.

        3. The current Solaris x86 low level interrupt implementation does
           not allow multiple interrupts sharing the same IRQ to have different
           priorities. The current behavior is the highest priority of all
           sharing interrupt will be used. If interrupts above and below
           LOCK_LEVEL share the same IRQ, the system may hang or panic.


2. Project Scope
----------------

  The scope of this project is as below.

        1. Modify low-level Solaris x86 interrupt code to support as many
           interrupt vectors as provided by MP platform, support multiple
           interrupt priority level on the same IRQ, and unfixed interrupts
           at every priority level.

        2. Implement interfaces to Interrupt Resource Management(IRM) project.

        3. Satisfy the current DDI interrupt interfaces. It is perceived no
           DDI interrupt interfaces will need to be changed, thus also limit
           this project to x86 platforms.


3. Proposed X86 MP Interrupt Design
-----------------------------------


3.1 Vector Number Allocation
----------------------------

  On x86 platform, the allowable range of vector number is 0 to 255. Vectors
  in the range 0 through 31 are reserved for architecture-defined exceptions
  and interrupts. Thus the usable user-defined vectors are from 32 to 255.

  Allocate interrupt vectors to low level (below LOCK_LEVEL) interrupts from
  low to high (start from 32), while for high level (above LOCK_LEVEL) 
interrupts
  from high to low (start from 255).

                INTERRUPT VECTOR ALLOCATION

                    255  ----- -----
                        |     |  |   High level interrupts
                    254 |-----|  v
                        |     |
                    253 |-----|
                        |     |
                        | ... |
                        |     |
                     34 |-----|
                        |     |
                     33 |---- |
                        |     |
                     32 |-----|  ^ 
                        |     |  |  Low level interrupts
                         ----- -----

        - All below LOCK_LEVEL interrupts will be handled via software
          interrupts. The only functionality for the hardware interrupt
          would be to trigger the appropriate (and possibly multiple)
          software interrupts, which are handled by interrupt threads (one
          for each of the lower 9 priority levels).

        - Certain amount of vectors need to be reserved from the top for
          above LOCK_LEVEL interrupts. So that high-level interrupts can
          block out all lower-level interrupts. The number of reserved
          vectors could be a tunable and default to 16.

        - All vectors in between 32 and 239 (255 - 16) can be used for fixed
          interrupts, MSIs and MSI-Xs. This increases the total interrupts
          to #LocalAPIC * 208.


3.2 Data Structure Impact
-------------------------

  Replace system wide autovec[] with per-cpu intr_vect[]. Add an extra
  av_ipl_link to struct autovec.

        /*
         * usr/src/uts/common/sys/avintr.h
         */
        struct autovec {
  -             struct autovec  *av_link; /* pointer to next on in chain */
                uint_t  (*av_vector)();
                caddr_t av_intarg1;
                caddr_t av_intarg2;
                uint64_t *av_ticksp;
                uint_t  av_prilevel;    /* priority level */

                void    *av_intr_id;
                dev_info_t *av_dip;

  +             struct autovec *av_vec_link;    /* per vector list */
  +             struct autovec *av_ipl_link;    /* per ipl list */
        };

        /*
         * usr/src/uts/i86pc/sys/machcpuvar.h
         */
        struct machcpu {
                ....
  +             struct autovecx *intr_vect[MAX_VECT];
  +             struct autovecx *intr_head[PIL_MAX + 1];
  +             struct autovecx *intr_tail[PIL_MAX + 1];
        };


                EACH CPU'S INTERRUPT VECTOR TABLE

        intr_vect[0..31] -> NULL

                           ---- av_vec_link   ----            ----
        intr_vect[32]  -> | av |-----------> | av |-> ... -> | av |
                           ----               ----            ----
                        .........................................
                           ---- 
        intr_vect[255] -> | av | ...
                           ---- 


                EACH CPU'S IPL BASED INTERRUPT PENDING QUEUE

                          ---- av_ipl_link   ----         ----
        intr_head[1]  -> | av |------------>| av |->...->| av |<- intr_tail[1]
                          ----                ----        ----
                          ---- av_ipl_link   ----         ----
        intr_head[2]  -> | av |------------>| av |->...->| av |<- intr_tail[2]
                          ----               ----         ----
                        .........................................
                          ---- av_ipl_link   ----         ----
        intr_head[15] -> | av |------------>| av |->...->| av |<- intr_tail[15]
                          ----               ----         ----


3.3 Interrupt Handling
----------------------

  When an low level interrupt is triggerred, the following happens.

  do_interrupt()
        - Use vector number index into intr_vect[] and retrieve the linked
          list of interrupt handlers.
        - According to original priority level(av_prilevel) of each interrupt
          request, find corresponding struct autovec queue pointed to by
          intr_head[av_prilevel] and intr_tail[av_prilevel] pair, insert the
          interrupt handler to the end of the queue.
        - On each priority level, trigger software interrupt

  do_softint()
        - Go through the executing CPU's ipl-specific interrupt pending
          queue pointed to by intr_head[ipl].  De-queue first element of
          the linked list.
        - Dispatch an interrupt thread to handle the interrupt
        - Clear softint when the linked list is empty.

  High-level interrupts are handled as below.

  do_interrupt()
        - use vector number index into intr_vect[] and retrieve the handler(s).
        - According to interrupt request's original priority level, find
          corresponding struct autovec queue. add the interrupt handler to
          the queue.
        - For each triggered ipl above LOCK_LEVEL from high to low
                IF (no high-level interrupt is running)
                        switch to the executing CPU's interrupt stack and run;
                ELSE {
                        IF (new_ipl <= old_ipl)
                                return;
                        ELSE
                                run on the executing CPU;
                }

_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Project Proposal: x86 APIC Scalability

Reply via email to