This is an automated email from the ASF dual-hosted git repository.

xiaoxiang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nuttx.git

commit 2c2a794240ff7ecc57179bdebf551f6c2ad2e8af
Author: Ludovic Vanasse <[email protected]>
AuthorDate: Sun Oct 27 17:51:25 2024 -0400

    Doc: Migrate Smaller Vector Tables
    
    Migrate
    https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables
    to official wiki
    
    Signed-off-by: Ludovic Vanasse <[email protected]>
---
 Documentation/guides/index.rst                 |   3 +-
 Documentation/guides/smaller_vector_tables.rst | 472 +++++++++++++++++++++++++
 2 files changed, 474 insertions(+), 1 deletion(-)

diff --git a/Documentation/guides/index.rst b/Documentation/guides/index.rst
index b88788b225..64e388916e 100644
--- a/Documentation/guides/index.rst
+++ b/Documentation/guides/index.rst
@@ -53,4 +53,5 @@ Guides
   semihosting.rst
   renode.rst
   signal_events_interrupt_handlers.rst  
-  signaling_sem_priority_inheritance.rst
\ No newline at end of file
+  signaling_sem_priority_inheritance.rst
+  smaller_vector_tables.rst
\ No newline at end of file
diff --git a/Documentation/guides/smaller_vector_tables.rst 
b/Documentation/guides/smaller_vector_tables.rst
new file mode 100644
index 0000000000..6a13f082cc
--- /dev/null
+++ b/Documentation/guides/smaller_vector_tables.rst
@@ -0,0 +1,472 @@
+=====================
+Smaller Vector Tables
+=====================
+
+.. warning:: 
+    Migrated from: 
+    https://cwiki.apache.org/confluence/display/NUTTX/Smaller+Vector+Tables 
+
+
+One of the largest OS data structures is the vector table, 
+``g_irqvector[]``. This is the table that holds the vector 
+information when ``irq_attach()`` is called and used to 
+dispatch interrupts by ``irq_dispatch()``. Recent changes 
+have made that table even larger, for 32-bit arm the 
+size of that table is given by:
+
+.. code-block:: c
+
+    nbytes = number_of_interrupts * (2 * sizeof(void *))
+
+We will focus on the STM32 for this discussion to keep 
+things simple. However, this discussion applies to all 
+architectures.
+
+The number of (physical) interrupt vectors supported by 
+the MCU hardwared given by the definition ``NR_IRQ`` which 
+is provided in a header file in ``arch/arm/include/stm32``. 
+This is, by default, the value of ``number_of_interrupts`` 
+in the above equation.
+
+For a 32-bit ARM like the STM32 with, say, 100 interrupt 
+vectors, this size would be 800 bytes of memory. That is 
+not a lot for high-end MCUs with a lot of RAM memory, 
+but could be a show stopper for MCUs with minimal RAM.
+
+Two approaches for reducing the size of the vector tables 
+are described below. Both depend on the fact that not all 
+interrupts are used on a given MCU. Most of the time, 
+the majority of entries in ``g_irqvector[]`` are zero because 
+only a small number of interrupts are actually attached 
+and enabled by the application. If you know that certain 
+IRQ numbers are not going to be used, then it is possible 
+to filter those out and reduce the size to the number of 
+supported interrupts.
+
+For example, if the actual number of interrupts used were 
+20, the the above requirement would go from 800 bytes to 
+160 bytes.
+
+Software IRQ Remapping
+======================
+
+`[On March 3, 2017, support for this "Software IRQ Remapping" 
+as included in the NuttX repository.]`
+
+One of the simplest way of reducing the size of 
+``g_irqvector[]`` would be to remap the large set of physical 
+interrupt vectors into a much small set of interrupts that 
+are actually used. For the sake of discussion, let's 
+imagine two new configuration settings:
+
+* ``CONFIG_ARCH_MINIMAL_VECTORTABLE``: Enables IRQ mapping
+* ``CONFIG_ARCH_NUSER_INTERRUPTS``: The number of IRQs after mapping.
+
+Then it could allocate the interrupt vector table to be 
+size ``CONFIG_IRQ_NMAPPED_IRQ`` instead of the much bigger 
+``NR_IRQS``:
+
+.. code-block:: c 
+
+    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
+    struct irq_info_s g_irqvector[CONFIG_ARCH_NUSER_INTERRUPTS];
+    #else
+    struct irq_info_s g_irqvector[NR_IRQS];
+    #endif
+
+The ``g_irqvector[]`` table is accessed in only three places:
+
+``irq_attach()``
+----------------
+
+``irq_attach()`` receives the physical vector number along 
+with the information needed later to dispatch interrupts:
+
+.. code-block:: c
+
+    int irq_attach(int irq, xcpt_t isr, FAR void *arg);
+
+Logic in ``irq_attach()`` would map the incoming physical 
+vector number to a table index like:
+
+.. code-block:: c 
+
+    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
+    int ndx = g_irqmap[irq];
+    #else
+    int ndx = irq;
+    #endif
+
+where ``up_mapirq[]`` is an array indexed by the physical 
+interrupt vector number and contains the new, mapped 
+interrupt vector table index. This array must be 
+provided by platform-specific code.
+
+``irq_attach()`` would this use this index to set the ``g_irqvector[]``.
+
+.. code-block:: c 
+
+    g_irqvector[ndx].handler = isr;
+    g_irqvector[ndx].arg     = arg;
+
+``irq_dispatch()``
+------------------
+
+``irq_dispatch()`` is called by MCU logic when an interrupt is received:
+
+.. code-block:: c 
+
+    void irq_dispatch(int irq, FAR void *context);
+
+Where, again irq is the physical interrupt vector number.
+
+``irq_dispatch()`` would do essentially the same thing as 
+``irq_attach()``. First it would map the irq number to 
+a table index:
+
+.. code-block:: c 
+
+    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
+    int ndx = g_irqmap[irq];
+    #else
+    int ndx = irq;
+    #endif
+
+Then dispatch the interrupt handling to the attached 
+interrupt handler. NOTE that the physical vector 
+number is passed to the handler so it is completely 
+unaware of the underlying `shell` game:
+
+.. code-block:: c 
+
+    vector = g_irqvector[ndx].handler;
+    arg    = g_irqvector[ndx].arg;
+    
+    vector(irq, context, arg);
+
+``irq_initialize()``
+--------------------
+
+``irq_initialize()``: simply set the ``g_irqvector[]`` table 
+a known state on power-up. It would only have to distinguish 
+the difference in sizes.
+
+.. code-block:: c 
+
+    #ifdef CONFIG_ARCH_MINIMAL_VECTORTABLE
+    #  define TAB_SIZE CONFIG_ARCH_NUSER_INTERRUPTS
+    #else
+    #  define TAB_SIZE NR_IRQS
+    #endif
+    
+    for (i = 0; i < TAB_SIZE; i++)
+
+``g_mapirq[]``
+--------------
+
+An implementation of ``up_mapirq()`` might be something like:
+
+.. code-block:: c 
+
+    #include <nuttx/irq.h>
+
+    const irq_mapped_t g_irqmap[NR_IRQS] =
+    {
+    ... IRQ to index mapping values ...
+    };
+
+``g_irqmap[]`` is a array of mapped irq table indices. It 
+contains the mapped index value and is itself indexed 
+by the physical interrupt vector number. It provides 
+an ``irq_mapped_t`` value in the range of 0 to 
+``CONFIG_ARCH_NUSER_INTERRUPTS`` that is the new, mapped 
+index into the vector table. Unsupported IRQs would 
+simply map to an out of range value like ``IRQMAPPED_MAX``. 
+So, for example, if ``g_irqmap[37] == 24``, then the hardware 
+interrupt vector 37 will be mapped to the interrupt vector 
+table at index 24. if ``g_irqmap[42] == IRQMAPPED_MAX``, then 
+hardware interrupt vector 42 is not used and if it occurs 
+will result in an unexpected interrupt crash.
+
+Hardware Vector Remapping
+=========================
+
+`[This technical approach is discussed here but is 
+discouraged because of technical "Complications" and 
+"Dubious Performance Improvements" discussed at the 
+end of this section.]`
+
+Most ARMv7-M architectures support two mechanism for handling interrupts:
+
+* The so-called `common` vector handler logic enabled with 
+  ``CONFIG_ARMV7M_CMNVECTOR=y`` that can be found in 
+  ``arch/arm/src/armv7-m/``, and
+* MCU-specific interrupt handling logic. For the 
+  STM32, this logic can be found at ``arch/arm/src/stm32/gnu/stm32_vectors.S``.
+
+The `common` vector logic is slightly more efficient, 
+the MCU-specific logic is slightly more flexible.
+
+If we don't use the `common` vector logic enabled with 
+``CONFIG_ARMV7M_CMNVECTOR=y``, but instead the more 
+flexible MCU-specific implementation, then we can 
+also use this to map the large set of hardware 
+interrupt vector numbers to a smaller set of software 
+interrupt numbers. This involves minimal changes to 
+the OS and does not require any magic software lookup 
+table. But is considerably more complex to implement.
+
+This technical approach requires changes to three files:
+
+* A new header file at ``arch/arm/include/stm32``, say 
+  ``xyz_irq.h`` for the purposes of this discussion. 
+  This new header file is like the other IRQ definition 
+  header files in that directory except that it 
+  defines only the IRQ number of the interrupts after 
+  remapping. So, instead of having the 100 IRQ number 
+  definitions of the original IRQ header file based on 
+  the physical vector numbers, this header file would 
+  define ``only`` the small set of 20 ``mapped`` IRQ numbers in 
+  the range from 0 through 19. It would also set ``NR_IRQS`` 
+  to the value 20.
+* A new header file at ``arch/arm/src/stm32/hardware``, say 
+  ``xyz_vector.h``. It would be similar to the other vector 
+  definitions files in that directory: It will consist 
+  of a sequence of 100 ``VECTOR`` and ``UNUSED`` macros. It will 
+  define ``VECTOR`` entries for the 20 valid interrupts and 
+  80 ``UNUSED`` entries for the unused interrupt vector numbers. 
+  More about this below.
+* Modification of the ``stm32_vectors.S`` file. These changes 
+  are trivial and involve only the conditional inclusion 
+  of the new, special ``xyz_vectors.h`` header file.
+
+**REVISIT**: This needs to be updated. Neither the ``xyz_vector.h`` 
+files nor the ``stm32_vectors.S`` exist in the current realization. 
+This has all been replaced with the common vector handling at 
+``arch/arm/src/armv7-m``.
+
+Vector Definitions
+==================
+
+In ``arch/arm/src/stm32/gnu/stm32_vector.S``, notice that the 
+``xyz_vector.h`` file will be included twice. Before each 
+inclusion, the macros ``VECTOR`` and ``UNUSED`` are defined.
+
+The first time that ``xyz_vector.h`` included, it defines the 
+hardware vector table. The hardware vector table consists 
+of ``NR_IRQS`` 32-bit addresses in an array. This is 
+accomplished by setting:
+
+.. code-block:: c 
+
+    #undef VECTOR
+    #define VECTOR(l,i) .word l
+    
+    #undef UNUSED
+    #define UNUSED(i)   .word stm32_reserved
+
+Then including ``xyz_vector.h``. So consider the following 
+definitions in the original file:
+
+.. code-block:: c
+
+    ...
+    VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global 
interrupt */
+    VECTOR(stm32_usart2, STM32_IRQ_USART2) /* Vector 16+38: USART2 global 
interrupt */
+    VECTOR(stm32_usart3, STM32_IRQ_USART3) /* Vector 16+39: USART3 global 
interrupt */
+    ...
+
+Suppose that we wanted to support only USART1 and that 
+we wanted to have the IRQ number for USART1 to be 12. 
+That would be accomplished in the ``xyz_vector.h`` header 
+file like this:
+
+.. code-block:: c
+
+    ...
+    VECTOR(stm32_usart1, STM32_IRQ_USART1) /* Vector 16+37: USART1 global 
interrupt */
+    UNUSED(0)                              /* Vector 16+38: USART2 global 
interrupt */
+    UNUSED(0)                              /* Vector 16+39: USART3 global 
interrupt */
+    ...
+
+Where the value of ``STM32_IRQ_USART1`` was defined to 
+be 12 in the ``arch/arm/include/stm32/xyz_irq.h`` header 
+file. When ``xyz_vector.h`` is included by ``stm32_vectors.S`` 
+with the above definitions for ``VECTOR`` and ``UNUSED``, the 
+following would result:
+
+.. code-block:: c 
+
+    ...
+    .word stm32_usart1
+    .word stm32_reserved
+    .word stm32_reserved
+    ...
+
+These are the settings for vector 53, 54, and 55, 
+respectively. The entire vector table would be populated 
+in this way. ``stm32_reserved``, if called would result in 
+an "unexpected ISR" crash. ``stm32_usart1``, if called will 
+process the USART1 interrupt normally as we will see below.
+
+Interrupt Handler Definitions
+-----------------------------
+
+in the vector table, all of the valid vectors are set to 
+the address of a `handler` function. All unused vectors 
+are force to vector to ``stm32_reserved``. Currently, only 
+vectors that are not supported by the hardware are 
+marked ``UNUSED``, but you can mark any vector ``UNUSED`` in 
+order to eliminate it.
+
+The second time that ``xyz_vector.h`` is included by 
+``stm32_vector.S``, the `handler` functions are generated. 
+Each of the valid vectors point to the matching handler 
+function. In this case, you do NOT have to provide 
+handlers for the ``UNUSED`` vectors, only for the used 
+``VECTOR`` vectors. All of the unused vectors will go 
+to the common ``stm32_reserved`` handler. The remaining 
+set of handlers is very sparse.
+
+These are the values of ``UNUSED`` and ``VECTOR`` macros on the 
+second time the ``xzy_vector.h`` is included by ``stm32_vectors.S``:
+
+.. code-block:: asm
+
+    .macro HANDLER, label, irqno
+        .thumb_func
+    label:
+        mov r0, #\irqno
+        b       exception_common
+    .endm
+    
+    #undef VECTOR
+    #define VECTOR(l,i) HANDLER l, i
+    
+    #undef UNUSED
+    #define UNUSED(i)
+
+In the above USART1 example, a single handler would be 
+generated that will provide the IRQ number 12. Remember 
+that 12 is the expansion of the macro ``STM32_IRQ_USART1`` 
+that is provided in the ``arch/arm/include/stm32/xyz_irq.h`` 
+header file:
+
+.. code-block:: asm 
+
+        .thumb_func
+    stm32_usart1:
+        mov r0, #12
+        b       exception_common
+
+Now, when vector 16+37 occurs it is mapped to IRQ 12 
+with no significant software overhead.
+
+A Complication
+--------------
+
+A complication in the above logic has been noted by David Sidrane: 
+When we access the NVIC in ``stm32_irq.c`` in order to enable 
+and disable interrupts, the logic requires the physical 
+vector number in order to select the NVIC register and 
+the bit(s) the modify in the NVIC register.
+
+This could be handled with another small IRQ lookup table 
+(20 ``uint8_t`` entries in our example situation above). But 
+then this approach is not so much better than the `Software 
+Vector Mapping` described about which does not suffer from 
+this problem. Certainly enabling/disabling interrupts in a 
+much lower rate operation and at least does not put the 
+lookup in the critical interrupt path.
+
+Another option suggested by David Sidrane is equally ugly:
+
+* Don't change the ``arch/arm/include/stm32`` IRQ definition file.
+* Instead, encode the IRQ number so that it has both 
+  the index and physical vector number:
+
+.. code-block:: c 
+
+    ...
+    VECTOR(stm32_usart1, STM32_IRQ_USART1 << 8 | STM32_INDEX_USART1)
+    UNUSED(0)
+    UNUSED(0)
+    ...
+
+The STM32_INDEX_USART1 would have the value 12 and 
+STM32_IRQ_USART1 would be as before (53). This encoded 
+value would be received by ``irq_dispatch()`` and it would 
+decode both the index and the physical vector number. 
+It would use the index to look up in the ``g_irqvector[]`` 
+table but would pass the physical vector number to the 
+interrupt handler as the IRQ number.
+
+A lookup would still be required in ``irq_attach()`` in 
+order to convert the physical vector number back to 
+an index (100 ``uint8_t`` entries in our example). So 
+some lookup is unavoidable.
+
+Based upon these analysis, my recommendation is that 
+we do not consider the second option any further. The 
+first option is cleaner, more portable, and generally 
+preferable.is well worth that.
+
+Dubious Performance Improvements
+--------------------------------
+
+The intent of this second option was to provide a higher 
+performance mapping of physical interrupt vectors to IRQ 
+numbers compared to the pure software mapping of option 1. However, 
+in order to implement this approach, we had 
+to use the less efficient, non-common vector handling 
+logic. That logic is not terribly less efficient, the 
+cost is probably only a 16 bit load immediate instruction 
+and branch to another location in FLASH (which will cause 
+the CPU pipeline to be flushed).
+
+The variant of option 2 where both the physical vector number 
+and vector table index are encoded would require even more 
+processing in ``irq_dispatch()`` in order to decode the 
+physical vector number and vector table index. 
+Possible just AND and SHIFT instructions.
+
+However, the minimal cost of the first pure software 
+mapping approach was possibly as small as a single 
+indexed byte fetch from FLASH in ``irq_attach()``. 
+Indexing is, of course, essentially `free` in the ARM 
+ISA, the primary cost would be the FLASH memory access. 
+So my first assessment is that the performance of both 
+approaches is the essentially the same. If anything, the 
+first approach is possibly the more performant if 
+implemented efficiently.
+
+Both options would require some minor range checking in 
+``irq_attach()`` as well.
+
+Because of this and because of the simplicity of the 
+first option, I see no reason to support or consider 
+this second option any further.
+
+Complexity and Generalizability
+-------------------------------
+
+Option 2 is overly complex; it depends on a deep understanding 
+on how the MCU interrupt logic works and on a high level of 
+Thumb assembly language skills.
+
+Another problem with option 2 is that really only applies to 
+the Cortex-M family of processors and perhaps others that 
+support interrupt vectored interrupts in a similar fashion. 
+It is not a general solution that can be used with any CPU 
+architectures.
+
+And even worse, the MCU-specific interrupt handling logic 
+that this support depends upon is is very limited. As soon 
+as the common interrupt handler logic was added, I stopped 
+implementing the MCU specific logic in all newer ARMv7-M 
+ports. So that MCU specific interrupt handler logic is 
+only present for EFM32, Kinetis, LPC17, SAM3/4, STM32, 
+Tiva, and nothing else. Very limited!
+
+These are further reasons why option 2 is no recommended and 
+will not be supported explicitly.

Reply via email to