From: Robin Jarry <[email protected]>
Date: Tuesday, 19 May 2026 at 3:43 PM
To: [email protected] <[email protected]>; Jerin Jacob <[email protected]>; Kiran Kumar 
Kokkilagadda <[email protected]>; Nithin Kumar Dabilpuram 
<[email protected]>; Zhirun Yan <[email protected]>
Cc: Vladimir Medvedkin <[email protected]>; Christophe Fontaine 
<[email protected]>; David Marchand <[email protected]>; Konstantin 
Ananyev <[email protected]>; Maxime Leroy <[email protected]>
Subject: [EXTERNAL] [PATCH dpdk] graph: replace circular buffer with 
priority-based bitmap scheduling


Replace the FIFO circular buffer used to track pending nodes with
a bitmap and a priority-sorted schedule table. Each node can now have
a scheduling priority (int16_t, default 0, lower value means visited
first). Source nodes are forced to INT16_MIN so they always run first.

At graph creation time, nodes are sorted by (priority, id) and assigned
a bit position (sched_idx). During the walk, a bitmap tracks which nodes
have pending objects. Scanning from the lowest bit naturally visits
nodes in priority order.

This improves batching in fan-out-then-converge topologies. When
eth_input classifies packets to both mpls_input and ipv4_input, the old
FIFO order could process ipv4_input before mpls_input, causing
ipv4_input to be visited twice (once before and once after the MPLS
label is popped). With mpls_input at a higher priority (lower value), it
runs first and its output accumulates in ipv4_input which is then
visited only once with all packets.

The bitmap set operation is idempotent (OR on an already-set bit is
a no-op) which removes the need for the idx == 0 guards that the
circular buffer required to avoid duplicate enqueue.

Suggested-by: Vladimir Medvedkin <[email protected]>
Signed-off-by: Robin Jarry <[email protected]>
Cc: Christophe Fontaine <[email protected]>
Cc: David Marchand <[email protected]>
Cc: Konstantin Ananyev <[email protected]>
Cc: Maxime Leroy <[email protected]>
---
 doc/guides/prog_guide/graph_lib.rst           |   37 +-
 .../prog_guide/img/graph_mem_layout.svg       | 1823 +++++++----------
 lib/graph/graph.c                             |   19 +-
 lib/graph/graph_debug.c                       |   12 +-
 lib/graph/graph_populate.c                    |  117 +-
 lib/graph/graph_private.h                     |   27 +-
 lib/graph/node.c                              |    2 +
 lib/graph/rte_graph.h                         |    1 +
 lib/graph/rte_graph_model_mcore_dispatch.h    |   34 +-
 lib/graph/rte_graph_model_rtc.h               |   65 +-
 lib/graph/rte_graph_worker.h                  |    2 +-
 lib/graph/rte_graph_worker_common.h           |   81 +-
 12 files changed, 984 insertions(+), 1236 deletions(-)

diff --git a/doc/guides/prog_guide/graph_lib.rst 
b/doc/guides/prog_guide/graph_lib.rst
index 8409e7666e85..9c6d8679b686 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -117,13 +117,22 @@ next_node[]:
 The dynamic array to store the downstream nodes connected to this node. 
Downstream
 node should not be current node itself or a source node.

+priority:
+^^^^^^^^^
+
+The scheduling priority of the node (``int16_t``, default 0). Nodes with lower
+priority values are visited first during the graph walk. This allows control
+over the order in which pending nodes are processed, which can improve packet
+batching in topologies where multiple paths converge on the same node.
+
 Source node:
 ^^^^^^^^^^^^

 Source nodes are static nodes created using ``RTE_NODE_REGISTER`` by passing
 ``flags`` as ``RTE_NODE_SOURCE_F``.
-While performing the graph walk, the ``process()`` function of all the source
-nodes will be called first. So that these nodes can be used as input nodes for 
a graph.
+Source nodes are automatically assigned the lowest possible priority
+(``INT16_MIN``) so that their ``process()`` function is always called first
+during the graph walk. This ensures they act as input nodes for a graph.

 nb_xstats:
 ^^^^^^^^^^
@@ -396,12 +405,26 @@ Graph object memory layout
 Understanding the memory layout helps to debug the graph library and
 improve the performance if needed.

-Graph object consists of a header, circular buffer to store the pending stream
-when walking over the graph, variable-length memory to store the ``rte_node`` 
objects,
-and variable-length memory to store the xstat reported by each ``rte_node``.
+A graph object consists of a header, a scheduling table mapping bit positions 
to
+node offsets, pending and source bitmaps for tracking which nodes need
+processing, variable-length memory to store the ``rte_node`` objects, and
+variable-length memory to store the xstat reported by each ``rte_node``.

-The graph_nodes_mem_create() creates and populate this memory. The functions
-such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory
+Nodes are sorted by ``(priority, node_id)`` at graph creation time and each
+node is assigned a bit position in the pending bitmap. During the graph walk,
+the bitmap is scanned from the lowest bit, so nodes with lower priority values
+are visited first. Source nodes are always assigned the lowest priority
+(``INT16_MIN``) to ensure they run before any processing node.
+
+This priority-based ordering improves batching in fan-out-then-converge
+topologies. For example, if ``eth_input`` classifies packets to both
+``mpls_input`` and ``ipv4_input``, giving ``mpls_input`` a lower priority value
+ensures it runs first. Its output accumulates in ``ipv4_input`` which is then
+visited only once with all packets, instead of being visited twice (before and
+after MPLS label popping).
+
+The ``graph_fp_mem_create()`` function creates and populates this memory. The
+functions such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this 
memory
 to enable fastpath services.


diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 6911ea8abeed..6dc1402e6bd0 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -334,20 +334,6 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
        return graph_mem_fixup_node_ctx(graph);
 }

-static bool
-graph_src_node_avail(struct graph *graph)
-{
-       struct graph_node *graph_node;
-
-       STAILQ_FOREACH(graph_node, &graph->node_list, next)
-               if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
-                   (graph_node->node->lcore_id == RTE_MAX_LCORE ||
-                    graph->lcore_id == graph_node->node->lcore_id))
-                       return true;
-
-       return false;
-}
-
 RTE_EXPORT_SYMBOL(rte_graph_model_mcore_dispatch_core_bind)
 int
 rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
@@ -375,9 +361,8 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, 
int lcore)
        graph->graph->dispatch.lcore_id = graph->lcore_id;
        graph->socket = rte_lcore_to_socket_id(lcore);

-       /* check the availability of source node */
-       if (!graph_src_node_avail(graph))
-               graph->graph->head = 0;
+       /* Rebuild source bitmap with only source nodes bound to this lcore */
+       graph_src_bitmap_rebuild(graph);

        return 0;

diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index e3b8cccdc1f0..8e99fa1b0fb8 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -15,8 +15,8 @@ graph_dump(FILE *f, struct graph *g)

        fprintf(f, "graph <%s>\n", g->name);
        fprintf(f, "  id=%" PRIu32 "\n", g->id);
-       fprintf(f, "  cir_start=%" PRIu32 "\n", g->cir_start);
-       fprintf(f, "  cir_mask=%" PRIu32 "\n", g->cir_mask);
+       fprintf(f, "  sched_table_off=%" PRIu32 "\n", g->sched_table_off);
+       fprintf(f, "  nb_sched_words=%" PRIu16 "\n", g->nb_sched_words);
        fprintf(f, "  addr=%p\n", g);
        fprintf(f, "  graph=%p\n", g->graph);
        fprintf(f, "  mem_sz=%zu\n", g->mem_sz);
@@ -63,14 +63,14 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)

        fprintf(f, "graph <%s> @ %p\n", g->name, g);
        fprintf(f, "  id=%" PRIu32 "\n", g->id);
-       fprintf(f, "  head=%" PRId32 "\n", (int32_t)g->head);
-       fprintf(f, "  tail=%" PRId32 "\n", (int32_t)g->tail);
-       fprintf(f, "  cir_mask=0x%" PRIx32 "\n", g->cir_mask);
        fprintf(f, "  nb_nodes=%" PRId32 "\n", g->nb_nodes);
+       fprintf(f, "  nb_sched_words=%" PRIu16 "\n", g->nb_sched_words);
        fprintf(f, "  socket=%d\n", g->socket);
        fprintf(f, "  fence=0x%" PRIx64 "\n", g->fence);
        fprintf(f, "  nodes_start=0x%" PRIx32 "\n", g->nodes_start);
-       fprintf(f, "  cir_start=%p\n", g->cir_start);
+       fprintf(f, "  sched_table=%p\n", g->sched_table);
+       fprintf(f, "  pending=%p\n", g->pending);
+       fprintf(f, "  src_pending=%p\n", g->src_pending);

        rte_graph_foreach_node(count, off, g, n) {
                if (!all && n->idx == 0)
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 026daecb2122..45bc7704fede 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -3,6 +3,8 @@
  */


+#include <stdlib.h>
+
 #include <rte_common.h>
 #include <rte_errno.h>
 #include <rte_malloc.h>
@@ -15,19 +17,27 @@ static size_t
 graph_fp_mem_calc_size(struct graph *graph)
 {
        struct graph_node *graph_node;
-       rte_node_t val;
+       uint16_t nwords;
        size_t sz;

        /* Graph header */
        sz = sizeof(struct rte_graph);
-       /* Source nodes list */
-       sz += sizeof(rte_graph_off_t) * graph->src_node_count;
-       /* Circular buffer for pending streams of size number of nodes */
-       val = rte_align32pow2(graph->node_count * sizeof(rte_graph_off_t));
-       sz = RTE_ALIGN(sz, val);
-       graph->cir_start = sz;
-       graph->cir_mask = rte_align32pow2(graph->node_count) - 1;
-       sz += val;
+
+       /* Schedule table: node offset indexed by sched_idx */
+       sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+       graph->sched_table_off = sz;
+       sz += sizeof(rte_graph_off_t) * graph->node_count;
+
+       /* Pending and source pending bitmaps */
+       nwords = (graph->node_count + 63) / 64;
+       graph->nb_sched_words = nwords;
+       sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+       graph->pending_off = sz;
+       sz += sizeof(uint64_t) * nwords;
+       sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+       graph->src_pending_off = sz;
+       sz += sizeof(uint64_t) * nwords;
+
        /* Fence */
        sz += sizeof(RTE_GRAPH_FENCE);
        sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
@@ -54,20 +64,44 @@ graph_fp_mem_calc_size(struct graph *graph)
 }

 static void
-graph_header_popluate(struct graph *_graph)
+graph_header_populate(struct graph *_graph)
 {
        struct rte_graph *graph = _graph->graph;

-       graph->tail = 0;
-       graph->head = (int32_t)-_graph->src_node_count;
-       graph->cir_mask = _graph->cir_mask;
        graph->nb_nodes = _graph->node_count;
-       graph->cir_start = RTE_PTR_ADD(graph, _graph->cir_start);
+       graph->nb_sched_words = _graph->nb_sched_words;
+       graph->sched_table = RTE_PTR_ADD(graph, _graph->sched_table_off);
+       graph->pending = RTE_PTR_ADD(graph, _graph->pending_off);
+       graph->src_pending = RTE_PTR_ADD(graph, _graph->src_pending_off);
        graph->nodes_start = _graph->nodes_start;
        graph->socket = _graph->socket;
        graph->id = _graph->id;
        memcpy(graph->name, _graph->name, RTE_GRAPH_NAMESIZE);
        graph->fence = RTE_GRAPH_FENCE;
+
+       memset(graph->pending, 0, sizeof(uint64_t) * _graph->nb_sched_words);
+       memset(graph->src_pending, 0, sizeof(uint64_t) * 
_graph->nb_sched_words);
+}
+
+static int16_t
+graph_node_effective_priority(const struct graph_node *gn)
+{
+       if (gn->node->flags & RTE_NODE_SOURCE_F)
+               return INT16_MIN;
+       return gn->node->priority;
+}
+
+static int
+graph_node_priority_cmp(const void *a, const void *b)
+{
+       const struct graph_node *const *na = a;
+       const struct graph_node *const *nb = b;
+       int16_t pa = graph_node_effective_priority(*na);
+       int16_t pb = graph_node_effective_priority(*nb);
+
+       if (pa != pb)
+               return (int)pa - (int)pb;
+       return (int)(*na)->node->id - (int)(*nb)->node->id;
 }

 static void
@@ -76,15 +110,26 @@ graph_nodes_populate(struct graph *_graph)
        rte_graph_off_t xstat_off = _graph->xstats_start;
        rte_graph_off_t off = _graph->nodes_start;
        struct rte_graph *graph = _graph->graph;
-       struct graph_node *graph_node;
+       struct graph_node **sorted, *graph_node;
        rte_edge_t count, nb_edges;
        rte_node_t pid;
+       uint32_t n;

-       STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
+       /* Build a sorted array of graph_node pointers by (priority, id) */
+       sorted = calloc(_graph->node_count, sizeof(*sorted));
+       RTE_VERIFY(sorted != NULL);
+       n = 0;
+       STAILQ_FOREACH(graph_node, &_graph->node_list, next)
+               sorted[n++] = graph_node;
+       qsort(sorted, n, sizeof(*sorted), graph_node_priority_cmp);
+
+       for (n = 0; n < _graph->node_count; n++) {
+               graph_node = sorted[n];
                struct rte_node *node = RTE_PTR_ADD(graph, off);
                memset(node, 0, sizeof(*node));
                node->fence = RTE_GRAPH_FENCE;
                node->off = off;
+               node->sched_idx = n;
                if (graph_pcap_is_enable()) {
                        node->process = graph_pcap_dispatch;
                        node->original_process = graph_node->node->process;
@@ -123,8 +168,14 @@ graph_nodes_populate(struct graph *_graph)
                off += sizeof(struct rte_node *) * nb_edges;
                off = RTE_ALIGN(off, RTE_CACHE_LINE_SIZE);
                node->next = off;
+
+               /* Fill the schedule table */
+               graph->sched_table[n] = node->off;
+
                __rte_node_stream_alloc(graph, node);
        }
+
+       free(sorted);
 }

 struct rte_node *
@@ -179,12 +230,11 @@ graph_node_nexts_populate(struct graph *_graph)
 }

 static int
-graph_src_nodes_offset_populate(struct graph *_graph)
+graph_src_bitmap_populate(struct graph *_graph)
 {
        struct rte_graph *graph = _graph->graph;
        struct graph_node *graph_node;
        struct rte_node *node;
-       int32_t head = -1;
        const char *name;

        STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
@@ -195,7 +245,7 @@ graph_src_nodes_offset_populate(struct graph *_graph)
                                SET_ERR_JMP(EINVAL, fail, "%s not found", name);

                        __rte_node_stream_alloc(graph, node);
-                       graph->cir_start[head--] = node->off;
+                       __rte_node_pending_set(graph->src_pending, node);
                }
        }

@@ -204,17 +254,42 @@ graph_src_nodes_offset_populate(struct graph *_graph)
        return -rte_errno;
 }

+void
+graph_src_bitmap_rebuild(struct graph *_graph)
+{
+       struct rte_graph *graph = _graph->graph;
+       struct graph_node *graph_node;
+       struct rte_node *node;
+       const char *name;
+
+       memset(graph->src_pending, 0,
+              sizeof(uint64_t) * graph->nb_sched_words);
+
+       STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
+               if (!(graph_node->node->flags & RTE_NODE_SOURCE_F))
+                       continue;
+               if (graph_node->node->lcore_id != RTE_MAX_LCORE &&
+                   graph_node->node->lcore_id != _graph->lcore_id)
+                       continue;
+               name = graph_node->node->name;
+               node = graph_node_name_to_ptr(graph, name);
+               if (node == NULL)
+                       continue;
+               __rte_node_pending_set(graph->src_pending, node);
+       }
+}
+
 static int
 graph_fp_mem_populate(struct graph *graph)
 {
        int rc;

-       graph_header_popluate(graph);
+       graph_header_populate(graph);
        if (graph_pcap_is_enable())
                graph_pcap_init(graph);
        graph_nodes_populate(graph);
        rc = graph_node_nexts_populate(graph);
-       rc |= graph_src_nodes_offset_populate(graph);
+       rc |= graph_src_bitmap_populate(graph);

        return rc;
 }
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 26cdc6637192..df6f83b20261 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -49,6 +49,7 @@ struct node {
        STAILQ_ENTRY(node) next;      /**< Next node in the list. */
        char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
        uint64_t flags;               /**< Node configuration flag. */
+       int16_t priority;             /**< Scheduling priority. */
        unsigned int lcore_id;
        /**< Node runs on the Lcore ID used for mcore dispatch model. */
        rte_node_process_t process;   /**< Node process function. */
@@ -98,19 +99,23 @@ struct graph {
        const struct rte_memzone *mz;
        /**< Memzone to store graph data. */
        rte_graph_off_t nodes_start;
-       /**< Node memory start offset in graph reel. */
+       /**< Node memory start offset in graph memory. */
        rte_graph_off_t xstats_start;
-       /**< Node xstats memory start offset in graph reel. */
+       /**< Node xstats memory start offset in graph memory. */
        rte_node_t src_node_count;
        /**< Number of source nodes in a graph. */
        struct rte_graph *graph;
        /**< Pointer to graph data. */
        rte_node_t node_count;
        /**< Total number of nodes. */
-       uint32_t cir_start;
-       /**< Circular buffer start offset in graph reel. */
-       uint32_t cir_mask;
-       /**< Circular buffer mask for wrap around. */
+       uint32_t sched_table_off;
+       /**< Schedule table start offset in graph memory. */
+       uint32_t pending_off;
+       /**< Pending bitmap start offset in graph memory. */
+       uint32_t src_pending_off;
+       /**< Source pending bitmap start offset in graph memory. */
+       uint16_t nb_sched_words;
+       /**< Number of uint64_t words in pending bitmaps. */
        rte_graph_t id;
        /**< Graph identifier. */
        rte_graph_t parent_id;
@@ -378,6 +383,16 @@ int graph_fp_mem_create(struct graph *graph);
  */
 int graph_fp_mem_destroy(struct graph *graph);

+/**
+ * @internal
+ *
+ * Rebuild the source pending bitmap based on lcore affinity.
+ *
+ * @param graph
+ *   Pointer to the internal graph object.
+ */
+void graph_src_bitmap_rebuild(struct graph *graph);
+
 /* Lookup functions */
 /**
  * @internal
diff --git a/lib/graph/node.c b/lib/graph/node.c
index e3359fe490a5..b5599143b37b 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -153,6 +153,7 @@ __rte_node_register(const struct rte_node_register *reg)
        if (rte_strscpy(node->name, reg->name, RTE_NODE_NAMESIZE) < 0)
                goto free_xstat;
        node->flags = reg->flags;
+       node->priority = reg->priority;
        node->process = reg->process;
        node->init = reg->init;
        node->fini = reg->fini;
@@ -216,6 +217,7 @@ node_clone(struct node *node, const char *name)

        /* Clone the source node */
        reg->flags = node->flags;
+       reg->priority = node->priority;
        reg->process = node->process;
        reg->init = node->init;
        reg->fini = node->fini;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 7e433f466129..6cd32ec22284 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -496,6 +496,7 @@ struct rte_node_register {
        char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
        uint64_t flags;               /**< Node configuration flag. */
 #define RTE_NODE_SOURCE_F (1ULL << 0) /**< Node type is source. */
+       int16_t priority; /**< Scheduling priority (lower = visited first, 
default 0). */


This will break the ABI. Please run ABI check and see/fix.




        rte_node_process_t process; /**< Node process function. */
        rte_node_init_t init;       /**< Node init function. */
        rte_node_fini_t fini;       /**< Node fini function. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h 
b/lib/graph/rte_graph_model_mcore_dispatch.h
index f9ff3daa88ec..50a473564b56 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -77,9 +77,13 @@ int 
rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
                                                           unsigned int 
lcore_id);

 /**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
  * of the nodes and collect the stats.
  *
+ * Nodes are visited in scheduling order (lowest priority value first).
+ * Source nodes are seeded into the pending bitmap at the start of each walk.
+ * Nodes with different lcore affinity are dispatched to their target lcore.
+ *
  * @param graph
  *   Graph pointer returned from rte_graph_lookup function.
  *
@@ -88,20 +92,28 @@ int 
rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
 static inline void
 rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
 {
-       const rte_graph_off_t *cir_start = graph->cir_start;
-       const rte_node_t mask = graph->cir_mask;
-       uint32_t head = graph->head;
+       const uint16_t nwords = graph->nb_sched_words;
        struct rte_node *node;
+       uint16_t word, bit;

        if (graph->dispatch.wq != NULL)
                __rte_graph_mcore_dispatch_sched_wq_process(graph);

-       while (likely(head != graph->tail)) {
-               node = (struct rte_node *)RTE_PTR_ADD(graph, 
cir_start[(int32_t)head++]);
+       /* Seed pending bitmap with source nodes bound to this lcore */
+       for (word = 0; word < nwords; word++)
+               graph->pending[word] |= graph->src_pending[word];

-               /* skip the src nodes which not bind with current worker */
-               if ((int32_t)head < 1 && node->dispatch.lcore_id != 
graph->dispatch.lcore_id)
-                       continue;
+       for (;;) {
+               /* find first word with any pending bit */
+               for (word = 0; word < nwords; word++)
+                       if (graph->pending[word])
+                               break;
+               if (word == nwords)
+                       break; /* no more pending nodes */
+
+               bit = rte_ctz64(graph->pending[word]);
+               graph->pending[word] &= ~(1ULL << bit);
+               node = __rte_graph_pending_node(graph, word, bit);

                /* Schedule the node until all task/objs are done */
                if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
@@ -111,11 +123,7 @@ rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
                        continue;

                __rte_node_process(graph, node);
-
-               head = likely((int32_t)head > 0) ? head & mask : head;
        }
-
-       graph->tail = 0;
 }

 #ifdef __cplusplus
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 4b6236e301e3..38feb3e1ca88 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -6,9 +6,12 @@
 #include "rte_graph_worker_common.h"

 /**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
  * of the nodes and collect the stats.
  *
+ * Nodes are visited in scheduling order (lowest priority value first).
+ * Source nodes are seeded into the pending bitmap at the start of each walk.
+ *
  * @param graph
  *   Graph pointer returned from rte_graph_lookup function.
  *
@@ -17,30 +20,52 @@
 static inline void
 rte_graph_walk_rtc(struct rte_graph *graph)
 {
-       const rte_graph_off_t *cir_start = graph->cir_start;
-       const rte_node_t mask = graph->cir_mask;
-       uint32_t head = graph->head;
+       const uint16_t nwords = graph->nb_sched_words;
        struct rte_node *node;
+       uint16_t word, bit;

        /*
-        * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
-        * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
-        * in a circular buffer fashion.
+        * Nodes are assigned a bit position (sched_idx) sorted by (priority,
+        * node_id) at graph creation time. Source nodes are forced to INT16_MIN
+        * priority so they always come first.
         *
-        *      +-----+ <= cir_start - head [number of source nodes]
-        *      |     |
-        *      | ... | <= source nodes
-        *      |     |
-        *      +-----+ <= cir_start [head = 0] [tail = 0]
-        *      |     |
-        *      | ... | <= pending streams
-        *      |     |
-        *      +-----+ <= cir_start + mask
+        * sched_table[] maps bit positions to node offsets:
+        *
+        *   pending[]         sched_table[]
+        *   +----------+      +------------------+
+        *   | word 0   | ---> | src_node_0       | bit 0 (prio=INT16_MIN)
+        *   | 1100...1 |      | src_node_1       | bit 1 (prio=INT16_MIN)
+        *   |          |      | mpls_input       | bit 2 (prio=-10)
+        *   |          |      | ipv4_input       | bit 3 (prio=0)
+        *   |          |      | ...              |
+        *   +----------+      +------------------+
+        *   | word 1   | ---> | ip4_rewrite      | bit 64 (prio=10)
+        *   | ...      |      | ...              |
+        *   +----------+      +------------------+
+        *
+        * Walk: for each word, find lowest set bit (rte_ctz64), process that
+        * node, clear the bit, re-read the word (processing may have set new
+        * bits), repeat.
+        *
+        * After each node is processed, restart scanning from word 0 since
+        * processing may set bits in any word, including earlier ones.
         */
-       while (likely(head != graph->tail)) {
-               node = (struct rte_node *)RTE_PTR_ADD(graph, 
cir_start[(int32_t)head++]);
+
+       /* Seed pending bitmap with source nodes */
+       for (word = 0; word < nwords; word++)
+               graph->pending[word] |= graph->src_pending[word];
+
+       for (;;) {
+               /* find first word with any pending bit */
+               for (word = 0; word < nwords; word++)
+                       if (graph->pending[word])
+                               break;
+               if (word == nwords)
+                       break; /* no more pending nodes */
+
+               bit = rte_ctz64(graph->pending[word]);
+               graph->pending[word] &= ~(1ULL << bit);
+               node = __rte_graph_pending_node(graph, word, bit);
                __rte_node_process(graph, node);
-               head = likely((int32_t)head > 0) ? head & mask : head;
        }
-       graph->tail = 0;
 }
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index b0f952a82cc9..e513d7a655d9 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -14,7 +14,7 @@ extern "C" {
 #endif

 /**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
  * of the nodes and collect the stats.
  *
  * @param graph
diff --git a/lib/graph/rte_graph_worker_common.h 
b/lib/graph/rte_graph_worker_common.h
index 4ab53a533e4c..e52a37ce5e84 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -49,15 +49,14 @@ SLIST_HEAD(rte_graph_rq_head, rte_graph);
  */
 struct __rte_cache_aligned rte_graph {
        /* Fast path area. */
-       uint32_t tail;               /**< Tail of circular buffer. */
-       uint32_t head;               /**< Head of circular buffer. */
-       uint32_t cir_mask;           /**< Circular buffer wrap around mask. */
        rte_node_t nb_nodes;         /**< Number of nodes in the graph. */
-       rte_graph_off_t *cir_start;  /**< Pointer to circular buffer. */
        rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+       rte_graph_off_t *sched_table; /**< Node offset indexed by sched_idx. */
+       uint64_t *pending;           /**< Bitmap of pending nodes. */
+       uint64_t *src_pending;       /**< Bitmap of source nodes (constant). */
+       uint16_t nb_sched_words;     /**< Number of uint64_t words in pending 
bitmaps. */
        uint8_t model;               /**< graph model */
-       uint8_t reserved1;           /**< Reserved for future use. */
-       uint16_t reserved2;          /**< Reserved for future use. */
+       /* 26 bytes padding */
        union {
                /* Fast schedule area for mcore dispatch model */
                struct {
@@ -98,6 +97,7 @@ struct __rte_cache_aligned rte_node {
        rte_node_t id;          /**< Node identifier. */
        rte_node_t parent_id;   /**< Parent Node identifier. */
        rte_edge_t nb_edges;    /**< Number of edges from this node. */
+       uint16_t sched_idx;     /**< Bit position in pending bitmap. */
        uint32_t realloc_count; /**< Number of times realloced. */

        char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
@@ -132,7 +132,7 @@ struct __rte_cache_aligned rte_node {
                }; /**< Node Context. */
                uint16_t size;          /**< Total number of objects available. 
*/
                uint16_t idx;           /**< Number of objects used. */
-               rte_graph_off_t off;    /**< Offset of node in the graph reel. 
*/
+               rte_graph_off_t off;    /**< Offset of node in the graph 
memory. */
                uint64_t total_cycles;  /**< Cycles spent in this node. */
                uint64_t total_calls;   /**< Calls done to this node. */
                uint64_t total_objs;    /**< Objects processed by this node. */
@@ -187,12 +187,12 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
 /**
  * @internal
  *
- * Enqueue a given node to the tail of the graph reel.
+ * Process a node's pending objects and collect stats.
  *
  * @param graph
  *   Pointer Graph object.
  * @param node
- *   Pointer to node object to be enqueued.
+ *   Pointer to node object to be processed.
  */
 static __rte_always_inline void
 __rte_node_process(struct rte_graph *graph, struct rte_node *node)
@@ -220,21 +220,42 @@ __rte_node_process(struct rte_graph *graph, struct 
rte_node *node)
 /**
  * @internal
  *
- * Enqueue a given node to the tail of the graph reel.
+ * Get a pointer to a node from the scheduling table.
  *
  * @param graph
  *   Pointer Graph object.
+ * @param word
+ *   Offset in the pending bitmap.
+ * @param bit
+ *   Bit number.
+ *
+ * @return
+ *   Pointer to the node.
+ */
+static __rte_always_inline struct rte_node *
+__rte_graph_pending_node(struct rte_graph *graph, uint16_t word, uint16_t bit)
+{
+       const uint16_t index = (word * sizeof(*graph->pending) * CHAR_BIT) + 
bit;
+       const rte_graph_off_t node_offset = graph->sched_table[index];
+       return RTE_PTR_ADD(graph, node_offset);
+}
+
+/**
+ * @internal
+ *
+ * Mark a node as pending in the graph scheduling bitmap.
+ *
+ * @param bitmap
+ *   Either graph->pending or graph->src_pending.
  * @param node
- *   Pointer to node object to be enqueued.
+ *   Pointer to node object to be marked pending.
  */
 static __rte_always_inline void
-__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
+__rte_node_pending_set(uint64_t *bitmap, struct rte_node *node)
 {
-       uint32_t tail;
-
-       tail = graph->tail;
-       graph->cir_start[tail++] = node->off;
-       graph->tail = tail & graph->cir_mask;
+       const uint16_t word = node->sched_idx / (sizeof(*bitmap) * CHAR_BIT);
+       const uint16_t bit = node->sched_idx % (sizeof(*bitmap) * CHAR_BIT);
+       bitmap[word] |= 1ULL << bit;
 }

 /**
@@ -242,8 +263,8 @@ __rte_node_enqueue_tail_update(struct rte_graph *graph, 
struct rte_node *node)
  *
  * Enqueue sequence prologue function.
  *
- * Updates the node to tail of graph reel and resizes the number of objects
- * available in the stream as needed.
+ * Marks the node as pending in the scheduling bitmap and resizes the number
+ * of objects available in the stream as needed.
  *
  * @param graph
  *   Pointer to the graph object.
@@ -259,9 +280,7 @@ __rte_node_enqueue_prologue(struct rte_graph *graph, struct 
rte_node *node,
                            const uint16_t idx, const uint16_t space)
 {

-       /* Add to the pending stream list if the node is new */
-       if (idx == 0)
-               __rte_node_enqueue_tail_update(graph, node);
+       __rte_node_pending_set(graph->pending, node);

        if (unlikely(node->size < (idx + space)))
                __rte_node_stream_alloc_size(graph, node, node->size + space);
@@ -293,7 +312,7 @@ __rte_node_next_node_get(struct rte_node *node, rte_edge_t 
next)

 /**
  * Enqueue the objs to next node for further processing and set
- * the next node to pending state in the circular buffer.
+ * the next node to pending state in the scheduling bitmap.
  *
  * @param graph
  *   Graph pointer returned from rte_graph_lookup().
@@ -321,7 +340,7 @@ rte_node_enqueue(struct rte_graph *graph, struct rte_node 
*node,

 /**
  * Enqueue only one obj to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
  *
  * @param graph
  *   Graph pointer returned from rte_graph_lookup().
@@ -347,7 +366,7 @@ rte_node_enqueue_x1(struct rte_graph *graph, struct 
rte_node *node,

 /**
  * Enqueue only two objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
  * Same as rte_node_enqueue_x1 but enqueue two objs.
  *
  * @param graph
@@ -377,7 +396,7 @@ rte_node_enqueue_x2(struct rte_graph *graph, struct 
rte_node *node,

 /**
  * Enqueue only four objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
  * Same as rte_node_enqueue_x1 but enqueue four objs.
  *
  * @param graph
@@ -414,7 +433,7 @@ rte_node_enqueue_x4(struct rte_graph *graph, struct 
rte_node *node,

 /**
  * Enqueue objs to multiple next nodes for further processing and
- * set the next nodes to pending state in the circular buffer.
+ * set the next nodes to pending state in the scheduling bitmap.
  * objs[i] will be enqueued to nexts[i].
  *
  * @param graph
@@ -472,7 +491,7 @@ rte_node_next_stream_get(struct rte_graph *graph, struct 
rte_node *node,
 }

 /**
- * Put the next stream to pending state in the circular buffer
+ * Put the next stream to pending state in the scheduling bitmap
  * for further processing. Should be invoked after rte_node_next_stream_get().
  *
  * @param graph
@@ -495,9 +514,7 @@ rte_node_next_stream_put(struct rte_graph *graph, struct 
rte_node *node,
                return;

        node = __rte_node_next_node_get(node, next);
-       if (node->idx == 0)
-               __rte_node_enqueue_tail_update(graph, node);
-
+       __rte_node_pending_set(graph->pending, node);
        node->idx += idx;
 }

@@ -530,7 +547,7 @@ rte_node_next_stream_move(struct rte_graph *graph, struct 
rte_node *src,
                src->objs = dobjs;
                src->size = dsz;
                dst->idx = src->idx;
-               __rte_node_enqueue_tail_update(graph, dst);
+               __rte_node_pending_set(graph->pending, dst);
        } else { /* Move the objects from src node to dst node */
                rte_node_enqueue(graph, src, next, src->objs, src->idx);
        }
--
2.54.0


Reply via email to