Re: [PATCH v12 0/6] flow_parser: add shared parser library

Stephen Hemminger Tue, 05 May 2026 15:00:20 -0700

On Tue,  5 May 2026 20:39:07 +0200
Lukas Sismis <[email protected]> wrote:


> This series extracts the testpmd flow CLI parser into a reusable library,
> enabling external applications to parse rte_flow rules using testpmd syntax.
> 
> Motivation
> ----------
> External applications like Suricata IDS [1] need to express hardware filtering
> rules in a consistent, human-readable format. Rather than inventing custom
> syntax, reusing testpmd's well-tested flow grammar provides immediate
> compatibility with existing documentation and user knowledge.
> 
> Note: This library provides only one way to create rte_flow structures.
> Applications can also construct rte_flow_attr, rte_flow_item[], and
> rte_flow_action[] directly in C code.
> 
> Design
> ------
> The library (librte_flow_parser) exposes the following APIs:
> - rte_flow_parser_parse_attr_str(): Parse attributes only
> - rte_flow_parser_parse_pattern_str(): Parse patterns only
> - rte_flow_parser_parse_actions_str(): Parse actions only
> 
> Testpmd is updated to use the library, ensuring a single
> maintained parser implementation.

I wish it was not just a port of testpmd code to a library but had
been done as a clean implementation; that said the current version is
much better.

AI had lots of feedback. The part that matters to me is the new dependency
chain; and also having a syntax that looks too much like testpmd.

Would prefer that only the flow part of the string was passed.

---

I have enough material for a comprehensive architectural review. Let me write 
it up.

---

Below is my deep-dive review of the v12 series. Architecturally, this is a 
major step beyond earlier versions in terms of polish (header split, doc page, 
real example, real tests), but the underlying structure is still essentially 
the testpmd `cmdline_flow.c` lifted wholesale into `lib/ethdev`, with a thin 
"simple API" wrapper that synthesizes fake testpmd commands. Several of the 
deeper structural problems remain, plus there are concrete bugs.

```
Series-level architectural review
=================================

Context
-------

The series moves the ~14,400-line app/test-pmd/cmdline_flow.c into a
new lib/ethdev/rte_flow_parser.c, exposes a "simple" string-to-rte_flow
API in lib/ethdev/rte_flow_parser.h, exposes a fuller cmdline-aware API
in lib/ethdev/rte_flow_parser_config.h, and adds cmdline glue in
lib/cmdline/rte_flow_parser_cmdline.[ch]. testpmd (patch 4) is then
ported to consume the library via a dispatch callback that maps
RTE_FLOW_PARSER_CMD_* enum values back onto its existing port_flow_*
functions.

The main functional change versus past versions is that the simple API
(parse_attr_str / parse_pattern_str / parse_actions_str /
parse_flow_rule) now exists separately from the full command grammar.
Internally however the simple API is still implemented by string-
synthesizing a fake "flow validate 0 ..." command, running it through
the full parser, and harvesting one output field. So the architectural
center of gravity is unchanged: this is the testpmd grammar exposed as
a library.

Errors
======

Patch 3/6 -- ethdev: add flow parser library
---------------------------------------------

A1. lib/cmdline now depends on lib/ethdev (layer inversion).

  lib/cmdline/meson.build:
    -deps += ['net']
    +deps += ['net', 'ethdev']

  lib/cmdline is a foundational utility used by examples, internal
  tools, and tests that have nothing to do with networking. Pulling
  ethdev into it just so rte_flow_parser_cmdline.c can call into
  rte_flow_parser_* exports inverts the layering. Every consumer of
  libcmdline now links libethdev.

  The cmdline/flow-parser glue belongs in either lib/ethdev (and the
  header in lib/ethdev too, with ethdev depending on cmdline -- the
  natural direction) or in a new top-level lib/flow_parser that
  depends on both. It does not belong inside lib/cmdline.

A2. The "simple API" returns aliased pointers into a single 4096-byte
    static buffer, with no way for a caller to retain a result.

  lib/ethdev/rte_flow_parser.c:
    #define FLOW_PARSER_SIMPLE_BUF_SIZE 4096
    static uint8_t flow_parser_simple_parse_buf[FLOW_PARSER_SIMPLE_BUF_SIZE];
    ...
    static int parser_simple_parse(const char *cmd, ... **out) {
        memset(flow_parser_simple_parse_buf, 0, sizeof(...));
        ret = rte_flow_parser_parse(cmd,
                (struct rte_flow_parser_output *)flow_parser_simple_parse_buf,
                sizeof(flow_parser_simple_parse_buf));
        ...
        *out = (struct rte_flow_parser_output *)flow_parser_simple_parse_buf;
    }

  Then rte_flow_parser_parse_pattern_str() does:
        *pattern = out->args.vc.pattern;     /* points into buf */
        *pattern_n = out->args.vc.pattern_n;

  Three resulting problems:

  (a) Two consecutive calls alias. Any caller that wants to hold two
      parsed patterns simultaneously (e.g. parse two flows and call
      rte_flow_create() on each) cannot, without writing their own
      deep-copy via rte_flow_conv(). This is a pervasive footgun and
      is only loosely documented as "Points to internal storage valid
      until the next parse call."

  (b) The 4096-byte cap silently rejects any flow rule whose
      serialized output exceeds the buffer (returns -ENOBUFS), with
      no way for the caller to predict what fits.

  (c) The simple API itself uses parse_pattern_str inside a setter
      example flow in the example program:

        ret = rte_flow_parser_parse_pattern_str(..., &items, &items_n);
        if (ret == 0)
            ret = rte_flow_parser_raw_encap_conf_set(0, items, items_n);

      This particular sequence is safe today because raw_encap_conf_set
      doesn't re-enter the parser, but nothing in the API contract
      prevents a future parser call between (A) and (B), and the
      example pattern teaches users a fragile idiom.

  Suggested direction: provide a caller-supplied output mode -- e.g.
  rte_flow_parser_parse_pattern_str(src, items, items_cap, &items_n)
  where the caller provides storage. Or return an opaque handle owning
  the storage (rte_flow_parser_pattern_new / _free), modeled on
  rte_flow_conv().

A3. All cmdline tab-completion hooks for dynamic IDs are stubbed out
    to return zero candidates, with no override mechanism.

  lib/ethdev/rte_flow_parser.c:
    static inline int
    parser_port_id_is_invalid(uint16_t port_id)
    {
        (void)port_id;
        return 0;            /* always "valid" */
    }
    static inline uint16_t
    parser_flow_rule_count(uint16_t port_id) { return 0; }
    static inline int
    parser_flow_rule_id_get(...) { return -ENOENT; }
    /* same pattern for pattern/actions templates, tables, queues,
     * RSS queues, indirect actions */

  These are the routines comp_rule_id, comp_pattern_template_id,
  comp_actions_template_id, comp_table_id, comp_queue_id, etc. all
  call into when cmdline asks for tab-completion candidates. With
  these stubs in place, the library version of testpmd interactive
  mode can never tab-complete a rule ID, template ID, table ID,
  queue ID, or indirect action ID -- a regression versus the
  current testpmd cmdline_flow.c which queries port_flow_list,
  port_flow_template_list, etc. directly.

  patch 4 (testpmd integration) does not register or override these
  stubs anywhere. There is no callback registration interface for
  the application to provide "give me rule IDs for port N" / "give
  me table IDs for port N" / etc.

  A complete library extraction needs an introspection ops struct
  that the application registers alongside rte_flow_parser_config,
  e.g.:

    struct rte_flow_parser_introspect_ops {
        uint16_t (*flow_rule_count)(uint16_t port_id);
        int      (*flow_rule_id_get)(uint16_t port_id, unsigned idx,
                                     uint64_t *rule_id);
        /* templates, tables, queues, indirect actions, ... */
    };
    int rte_flow_parser_introspect_register(
            const struct rte_flow_parser_introspect_ops *ops);

A4. testpmd dispatch repurposes rte_flow_attr.reserved as a side
    channel for relaxed_matching.

  app/test-pmd/flow_parser.c (patch 4):
    case RTE_FLOW_PARSER_CMD_PATTERN_TEMPLATE_CREATE:
        port_flow_pattern_template_create(in->port,
            in->args.vc.pat_templ_id,
            &((const struct rte_flow_pattern_template_attr) {
                .relaxed_matching = in->args.vc.attr.reserved,
                .ingress  = in->args.vc.attr.ingress,
                .egress   = in->args.vc.attr.egress,
                .transfer = in->args.vc.attr.transfer,
            }),
            in->args.vc.pattern);

  rte_flow_attr.reserved is documented in lib/ethdev/rte_flow.h as
  reserved and required to be zero. Smuggling
  pattern_template_attr.relaxed_matching through that field couples
  the library output to a hack and breaks the moment anyone sets
  rte_flow_attr.reserved for any other purpose, or starts validating
  it. The output struct already has a vc.{pat_templ,act_templ,table}
  arm -- relaxed_matching belongs there, not overlaid on attr.reserved.

A5. Public preprocessor macros in rte_flow_parser_config.h are
    unprefixed and one of them collides with a generic name.

  lib/ethdev/rte_flow_parser_config.h:
    #define ACTION_RAW_ENCAP_MAX_DATA   512
    #define RAW_ENCAP_CONFS_MAX_NUM     8
    #define ACTION_RSS_QUEUE_NUM        128
    #define ACTION_VXLAN_ENCAP_ITEMS_NUM 6
    #define ACTION_NVGRE_ENCAP_ITEMS_NUM 5
    #define ACTION_IPV6_EXT_PUSH_MAX_DATA 512
    #define IPV6_EXT_PUSH_CONFS_MAX_NUM 8
    #define ACTION_SAMPLE_ACTIONS_NUM   10
    #define RAW_SAMPLE_CONFS_MAX_NUM    8
    #ifndef RSS_HASH_KEY_LENGTH
    #define RSS_HASH_KEY_LENGTH         64
    #endif

  Every one of these is now exported by an installed public header
  with no RTE_FLOW_PARSER_ prefix. RSS_HASH_KEY_LENGTH in particular
  is a generic name almost guaranteed to collide -- any application
  that defined its own RSS_HASH_KEY_LENGTH=40 (matching its hardware)
  before including this header would silently get 40 inside flow
  parser slots, with the parser's slot layout corrupted by mismatched
  array sizes.

  All of these need an RTE_FLOW_PARSER_ prefix and the #ifndef guard
  on RSS_HASH_KEY_LENGTH should be removed -- it is an invitation to
  define-mismatch bugs.

A6. Doc/header inconsistency: parser_parse() declared in config.h
    but documented as living in cmdline.h.

  doc/guides/prog_guide/flow_parser_lib.rst:
    "rte_flow_parser_parse() from rte_flow_parser_cmdline.h parses
    complete flow CLI commands ..."

  but the declaration is in lib/ethdev/rte_flow_parser_config.h
  (line 15342 of the diff). The doc is also internally inconsistent:
  the same .rst earlier says "Additional functions for full command
  parsing and cmdline integration are available in
  rte_flow_parser_cmdline.h. These include rte_flow_parser_parse()
  ..." -- which is wrong twice.

A7. local_cmd_flow->help_str = ... mutates the cmdline instruction.

  lib/cmdline/rte_flow_parser_cmdline.c:
    if (local_cmd_flow != NULL)
        local_cmd_flow->help_str = help ? help : name;

  This mutates the cmdline_parse_inst_t passed to
  rte_flow_parser_cmdline_register(). Idiomatic cmdline usage in DPDK
  declares cmdline_parse_inst_t variables as static const aggregates
  (see lib/cmdline/cmdline_parse.h examples and the rest of testpmd).
  Passing such an instance here writes to read-only memory and
  segfaults. The .rst note ("The library writes to inst->help_str
  dynamically ... must remain valid for the lifetime of the cmdline
  session") flags the lifetime question but does not flag the
  mutability requirement, which is the actually fatal one.

  The fix is to keep help_str storage internal to the library and
  return it via a side channel (e.g. an out-pointer in the get_help
  callback) rather than mutating the caller's instruction.

Patch 4/6 -- app/testpmd: use flow parser from ethdev
------------------------------------------------------

A8. Tab completion regression for dynamic IDs.

  As described in A3, removing app/test-pmd/cmdline_flow.c and
  replacing it with the library means testpmd's interactive flow
  command-line loses tab completion on rule IDs, pattern/actions
  template IDs, table IDs, queue IDs, RSS queue IDs, and indirect
  action IDs. Today's cmdline_flow.c calls port_flow_list,
  port_flow_template_list, port_flow_template_table_list, etc. The
  replacement library calls parser_flow_rule_count, etc., which
  return 0.

  Until the introspection callback (A3) is in place, this patch
  needs at minimum a release note entry explicitly calling out the
  loss of tab completion. As written, the change description in the
  patch does not mention it.

Warnings
========

Patch 2/6 -- ethdev: add RSS type helper APIs
----------------------------------------------

W1. The "all" entry hardcodes the OR of every RTE_ETH_RSS_* protocol
    bit and will silently go stale.

  lib/ethdev/rte_ethdev.c:
    { "all", RTE_ETH_RSS_ETH | RTE_ETH_RSS_VLAN | RTE_ETH_RSS_IP |
            RTE_ETH_RSS_TCP | RTE_ETH_RSS_UDP | RTE_ETH_RSS_SCTP |
            RTE_ETH_RSS_L2_PAYLOAD | RTE_ETH_RSS_L2TPV3 |
            RTE_ETH_RSS_ESP | RTE_ETH_RSS_AH | RTE_ETH_RSS_PFCP |
            RTE_ETH_RSS_GTPU | RTE_ETH_RSS_ECPRI | RTE_ETH_RSS_MPLS |
            RTE_ETH_RSS_L2TPV2 | RTE_ETH_RSS_IB_BTH },

  Whenever a new RTE_ETH_RSS_* protocol bit is added, this list will
  drift unless the contributor remembers this table. There is an
  existing convention RTE_ETH_RSS_PROTO_MASK in rte_ethdev.h that
  collects these; consider using it (or extending it) so the table
  tracks the canonical mask automatically.

W2. rte_eth_rss_type_to_str(0) returns "none" but
    rte_eth_rss_type_to_str(RTE_ETH_RSS_IPV4 | RTE_ETH_RSS_IPV6)
    returns NULL.

  The "to_str" function uses == on the table, so it succeeds only
  for values exactly present as a table entry. The Doxygen says
  "RSS type value (RTE_ETH_RSS_*)" which a caller will reasonably
  read as accepting any combination of RTE_ETH_RSS_* bits. The
  doc should explicitly state that only single-entry table values
  round-trip; arbitrary OR combinations return NULL.

Patch 3/6 -- ethdev: add flow parser library
---------------------------------------------

W3. enum rte_flow_parser_command + enum parser_token + token-to-cmd
    switch is a three-way invariant.

  Adding a new flow command requires:
    - new entry in enum parser_token         (private)
    - new entry in enum rte_flow_parser_command (public)
    - new case in parser_token_to_command()
  with the comment in flow parser .c file flagging this explicitly.

  This is a maintenance burden and an ABI risk -- forgetting the
  third step silently maps the new command to
  RTE_FLOW_PARSER_CMD_UNKNOWN. Consider whether the public enum
  could be derived from the private one (table-driven) so there is
  a single source of truth.

W4. rte_flow_parser_parse_attr_str() synthesizes a full validate
    command including pattern and actions just to harvest the attr.

  lib/ethdev/rte_flow_parser.c:
    ret = parser_format_cmd(&cmd, "flow validate 0 ",
                src, " pattern eth / end actions drop / end");

  This wraps the user input with a hardcoded port id 0 and a
  default pattern/actions that the simple API immediately throws
  away. If the testpmd grammar ever cross-validates attr against
  pattern/actions (e.g. "drop" not allowed on egress + transfer),
  the simple API breaks for combinations that should be valid in
  isolation. This is the architectural fragility of the synthesize-
  and-strip approach in concrete form.

W5. parser_format_cmd uses libc malloc on every simple-API call.

  lib/ethdev/rte_flow_parser.c:
    static int parser_format_cmd(char **dst, ...) {
        len = strlen(prefix) + strlen(body) + strlen(suffix) + 1;
        *dst = malloc(len);
        ...
        snprintf(*dst, len, "%s%s%s", prefix, body, suffix);

  Plain malloc, not rte_malloc, is appropriate here since the simple
  API claims to work without rte_eal_init. But the cost is a malloc/
  free pair per parse call. For an API that may be used to bulk-load
  flow rules from a config file or remote control plane, this is
  pessimal. Consider a stack-or-VLA-based formatter, since the input
  string length is already known.

W6. parser_str_strip_trailing_end heuristic strips at most one
    "/ end".

  lib/ethdev/rte_flow_parser.c:
    /* parser_str_strip_trailing_end ... */
    if (strncmp(p - 3, "end", 3) != 0)
        return strlen(src);

  Inputs like "drop / end / end" or "drop / end\t/end " strip only
  the outermost. The function's comment claims tolerance for any
  whitespace placement; it does not flag that only one trailing
  "/ end" is stripped. This is fine in practice for human input but
  surprising for programmatically generated input.

W7. No rte_flow_parser_config_unregister().

  rte_flow_parser_config_register replaces the previous registration
  and frees indirect action list configurations created by prior
  parsing sessions, but there is no unregister entry point. A test
  harness that wants a clean shutdown -- or a long-lived process
  that wants to release the SLIST of indlst_conf entries -- has to
  re-register a zeroed config to flush. Add an unregister API and
  call it from indlst_conf_cleanup at the same time.

W8. struct rte_flow_parser_vxlan_encap_conf and friends mix bit-
    fields and uint8_t arrays in a public header.

  struct rte_flow_parser_vxlan_encap_conf {
      uint32_t select_ipv4:1;
      uint32_t select_vlan:1;
      uint32_t select_tos_ttl:1;
      uint8_t  vni[3];
      ...
  };

  C bit-field layout is implementation-defined (order, alignment,
  signedness of unnamed bit-fields). For a public ABI this is
  tolerable on DPDK's supported toolchains but fragile across them.
  At minimum, reserve the remaining 29 bits explicitly:
      uint32_t reserved:29;
  to lock in the layout. A more conservative choice is plain
  uint8_t flags; with named bits.

W9. char type[16] in struct rte_flow_parser_tunnel_ops and char
    file[128]/filename[128] in rte_flow_parser_output use unnamed
    magic constants.

  struct rte_flow_parser_tunnel_ops {
      uint32_t id;
      char     type[16];
      ...
  };
  ... struct { char file[128]; ... } dump;
  ... struct { ... char filename[128]; } flex;

  These bake fixed maxima into the ABI. Define and document
  RTE_FLOW_PARSER_TUNNEL_TYPE_LEN, RTE_FLOW_PARSER_DUMP_FILE_LEN,
  etc. so contributors don't have to grep to see "is 16 enough for
  any future tunnel name?".

W10. The output struct's union arm vc has multiple raw pointer
     fields whose ownership is undocumented.

  struct rte_flow_parser_output {
      ...
      union {
          struct {
              ...
              struct rte_flow_item   *pattern;
              struct rte_flow_action *actions;
              struct rte_flow_action *masks;
              uint8_t                *data;
              ...
          } vc;
          struct {
              uint64_t *rule;
              uint64_t  rule_n;
              ...
          } destroy;
          ...
      } args;
  };

  All these pointers point either into the caller-supplied output
  buffer (rte_flow_parser_parse) or into the static simple-API
  buffer (parser_simple_parse). None of this is documented per
  field. Add a per-field comment ("points into the result_size
  buffer; valid until the next call") so a caller can see the
  contract without reading the implementation.

W11. parser_token_to_command() default branch logs ERR with no rate
     limit.

  static enum rte_flow_parser_command
  parser_token_to_command(enum parser_token token) {
      switch (token) {
      ...
      default:
          RTE_LOG_LINE(ERR, ETHDEV, "unknown parser token %u",
                  (unsigned int)token);
          return RTE_FLOW_PARSER_CMD_UNKNOWN;
      }
  }

  An attacker-controlled or fuzzed input that lands a parse on an
  unmapped token can flood the log. Use RTE_LOG_LINE_DP or downgrade
  to DEBUG.

W12. parser_ctx_set_raw_common uses 0x06 / 0x11 instead of
     IPPROTO_TCP / IPPROTO_UDP.

  case RTE_FLOW_ITEM_TYPE_UDP:
      size = sizeof(struct rte_udp_hdr);
      proto = 0x11;
      break;
  case RTE_FLOW_ITEM_TYPE_TCP:
      size = sizeof(struct rte_tcp_hdr);
      proto = 0x06;
      break;

  Other branches in the same function correctly use named constants
  (RTE_ETHER_TYPE_IPV4, IPPROTO_ROUTING). Use IPPROTO_TCP /
  IPPROTO_UDP for consistency.

Patch 4/6 -- app/test-pmd: use flow parser from ethdev
-------------------------------------------------------

W13. testpmd_flow_dispatch() (app/test-pmd/flow_parser.c) is missing
     a release note for the dropped 14k-line app/test-pmd/cmdline_flow.c.

  The release_notes for 26.07 mention the new flow parser library
  but should also note the testpmd-internal restructuring, since
  third-party patches against cmdline_flow.c will need rebasing.

Info
====

I1. Patch 1 ordering. The cmdline-stddef.h patch (patch 1/6) is
    independent of the rest of the series. It would land cleanly on
    its own and shorten this series's review surface.

I2. Series cover letter is missing from the bundle (no [PATCH v12 0/6]
    visible). For a 14k+ line series, a v12 cover letter summarizing
    what changed since v11 helps reviewers focus.

I3. Test coverage. test_flow_parser.c and test_flow_parser_simple.c
    exercise the parse path well, but I do not see a test that
    deliberately triggers the -ENOBUFS path on the simple API
    (large pattern overflowing the 4096-byte static buffer) or the
    aliasing pattern in A2(a). A negative test confirming that two
    consecutive simple-API calls invalidate the first result would
    document the contract concretely.

I4. enum rte_flow_parser_command places
    RTE_FLOW_PARSER_CMD_INDIRECT_ACTION_FLOW_CONF_CREATE alone at
    the end after the meter-policy block, breaking the per-section
    grouping established earlier. Move it into the indirect-action
    block.

I5. struct rte_flow_parser_action_vxlan_encap_data uses an anonymous
    union for ipv4/ipv6 items. This is the right size optimization
    but readers have to know to spell .item_ipv4 / .item_ipv6
    without a containing union name. Either name the union or add
    a comment.

Summary
=======

The split into rte_flow_parser.h (simple) and rte_flow_parser_config.h
(full) is the right direction and a real improvement over earlier
versions. The remaining structural issues are:

  - cmdline -> ethdev layer inversion (A1)
  - aliased pointer return semantics with a 4KB static buffer (A2)
  - tab-completion stubs with no override hook (A3, A8)
  - hack repurposing of rte_flow_attr.reserved in testpmd glue (A4)
  - unprefixed/colliding public macro names (A5)
  - mutated cmdline instruction (A7)

A2 and A3 are the two that most directly justify the "cut/paste"
critique: they reflect that the parser was extracted from testpmd
without designing a context-handle ownership model or a callback
seam for application-side state. Resolving them likely needs the
parser to take a context handle (alloc/free) and an introspection
ops struct in addition to the existing config registration.
```

A few notes on findings I considered but did not include:

- I did not flag the `RTE_LOG_LINE` calls in `indirect_action_flow_conf_create` 
for stale severity since they are correct uses.
- The `for (i = n - 1; i >= 0; --i)` loop with `int i` and `uint32_t n` is safe 
for the realistic input range so I did not flag it as integer truncation.
- The `select_ipv4:1`-style bit-fields are also present in existing public DPDK 
structs (`rte_flow_attr`), so I downgraded that to a Warning rather than an 
Error.
- Skipped patches 1/6 (trivial stddef.h include), 5/6 (example) and 6/6 (tests) 
since I had no findings beyond I3 and the test framework usage looked correct 
(`REGISTER_FAST_TEST` with `NOHUGE_OK, ASAN_OK`).

No `Reviewed-by` is given since the series has multiple Errors and Warnings 
outstanding.

Re: [PATCH v12 0/6] flow_parser: add shared parser library

Reply via email to