> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Thursday, May 18, 2023 10:46 PM > To: Zhang, Qi Z <qi.z.zh...@intel.com>; Ori Kam <or...@nvidia.com>; > dev@dpdk.org > Cc: techbo...@dpdk.org; Richardson, Bruce <bruce.richard...@intel.com>; > Burakov, Anatoly <anatoly.bura...@intel.com>; Wiles, Keith > <keith.wi...@intel.com>; Liang, Cunming <cunming.li...@intel.com>; Wu, > Jingjing <jingjing...@intel.com>; Zhang, Helin <helin.zh...@intel.com>; > Mcnamara, John <john.mcnam...@intel.com>; Xu, Rosen > <rosen...@intel.com>; nd <n...@arm.com>; nd <n...@arm.com> > Subject: RE: seeking community input on adapting DPDK to P4Runtime > backend > > <snip> > > > > > > > Hi Zhang, > > > > > > rte_flow is an excellent candidate for implementing P4. > > > We and some internal tests that shows great promise in this regard. > > > > > > I would be very happy to supply any needed information and have > > > discussion on how to continue with this project. > > > > Thank you Ori! Please check my following comments > > > > Regards > > Qi > > > > > > > > Please see inline detailed answers. > > > > > > Best, > > > Ori Kam > > > > > > > > > > > > > > > > -----Original Message----- > > > > From: Zhang, Qi Z <qi.z.zh...@intel.com> > > > > Sent: Monday, May 8, 2023 9:40 AM > > > > Subject: seeking community input on adapting DPDK to P4Runtime > > > backend > > > > > > > > Hi: > > > > > > > > Our team is currently working on developing a DPDK PMD for a P4- > > > > programmed network controller, based on customer feedback to > > > > integrate DPDK into the P4Runtime backend .[https://p4.org/p4- > > > > spec/p4runtime/main/P4Runtime-Spec.html] > > > > > > > > (*) However, we are facing challenges in adapting DPDK's rte_flow > > > > API to the P4Runtime API, primarily due to the transition from a > > > > table-based API with fields of arbitrary bits width at arbitrary > > > > offset to a protocol-based API (more detail be described in > > > > post-script). > > > > > > > > We are seeking suggestions and best practices from the open-source > > > > community to help us with this integration. Specifically, we are > > > > interested in > > > > learning: > > > > > > > > (*) If anyone has previously attempted to map rte_flow to P4-based > > > devices. > > > > > > We did try successfully. > > > > > > > (*) Thoughts on how to map from table-based matching to > > > > protocol-based matching like in rte_flow. > > > > > > Rte_flow is table based (groups), now with the introduction of > > > template API rte_flow is even more table based (we added the concept > > > of tables) which are just what > > > p4 requires. > > > > Yes, the rte_flow template can be used to map a sequence of patterns > > to a P4 table and a sequence of actions to a P4 action. However, Using > > a fixed rte_flow template can be problematic when handling different > > P4 programs in the same driver. To provide more flexibility, the > > mapping of patterns and actions can be externalized into a > > configuration file or part of the firmware can be learned from driver, > > allowing for customization based on the specific requirements of each > > P4 pipeline. actually we have enabled this approach in order to > accommodate different P4 programs. > > > > However, an alternative approach to consider is whether it would be > > feasible to directly expose the P4 table and action names or IDs to > > the application, rather than relying on rte_flow templates. This > > approach offers several potential > > benefits: > > > > Integration with P4runtime Backend: By exposing the P4 table and > > action names or IDs directly, DPDK could be easily integrated as a > > P4runtime backend. This eliminates the need for translation from the > > P4runtime API to rte_flow templates in the application, simplifying the > integration process. > > > > Elimination of Manual Mapping: Exposing the P4 table and action names > > or IDs to the application would remove the requirement for the > > engineering team to manually map each pipeline to specific rte_flow > > templates. This is particularly beneficial in cases where hardware > > vendors provide customers with a toolchain to create their own P4 > > pipelines but do not necessarily own the P4 programs. By eliminating > > the dependency on rte_flow templates, this approach reduces complexity > in using DPDK as the driver. > > > > To be more specific, the proposed API for exposing P4 table and action > > names or IDs directly to the application could be as follows: > > > > /* Get the table info */ > > struct rte_p4_table_info tbl_info; > > rte_p4_table_info_get_by_name(port_id, "decap_vxlan_tcp_table", > > &tbl_info); > > > > /* Create the key */ > > struct rte_p4_table_key *key; > > rte_p4_table_key_create(port_id, tbl_info->id, &key); > > > > /* Set the key fields */ > > rte_p4_table_key_field_set_by_name(port_id, key, "wire_port", > > &wire_port, 2); rte_p4_table_key_field_set_by_name(port_id, key, > > "tun_ip_src", &tun_ip_src, 4); > > rte_p4_table_key_field_set_by_name(port_id, key, "tun_ip_dst", > > &tun_ip_dst, 4); rte_p4_table_key_field_set_by_name(port_id, > > key, "vni", &vni, 3); rte_p4_table_key_field_set_by_name(port_id, key, > > "ipv4_src", &ipv4_src, 4); rte_p4_table_key_field_set_by_name(port_id, > > key, "ipv4_dst", &ipv4_dst, 4); > > rte_p4_table_key_field_set_by_name(port_id, key, "src_port", > > &src_port, 2); rte_p4_table_key_field_set_by_name(port_id, key, > > "dst_port", &dst_port, 2); > > > > /* Get the action spec info */ > > struct rte_p4_action_spec_info as_info; > > rte_p4_action_spec_info_get_by_name(port_id, "decap_vxlan_fwd", > > &as_info); > > > > > > /* Create the action */ > > struct rte_p4_action *action; > > rte_p4_action_create(port_id, as_info->id, &action); > > > > > > /* Set the action fields */ > > rte_p4_table_action_field_set_by_name(port_id, action, "mod_id", > > &mod_id, 3); rte_p4_table_action_field_set_by_name(port_id, action, > > "port_id", &target_port_id, 2); > > > > /* Add the entry */ > > rte_p4_table_entry_add(port_id, tbl_info->id, key, action); > These do not look at like P4 specific. Could be just generic APIs. Could we > have these as rte_flow APIs?
Agreed, the goal is not necessarily to have P4-specific APIs, but rather to expose a set of table-driven APIs that align with the programmable hardware pipeline. This approach would allow for more flexibility and customization compared to relying on existing protocol-based APIs. Both options, extending the existing rte_flow API to expose the required table-driven feature or introducing a set of dedicate table-driven APIs, appear to be viable solutions for me. Thanks Qi > > > > > > > > > > > > > > > > > (*) Any ideas on how to extend or expand the rte_flow APIs to > > > > better accommodate P4-based or other table-matching based devices. > > > > > > > > > > Lets discuss any issue you have. > > > > > > > Your insights and feedback would be greatly appreciated! > > > > > > > > ======================= Post-Script > > ============================ > > > > > > > > More details on the problem below, for anyone interested > > > > > > > > In P4, flow offloading can be implemented using the P4Runtime API, > > > > which provides a standard interface for controlling and > > > > configuring the data plane behavior of network devices. P4Runtime > > > > allows network operators to dynamically add, modify, and remove > > > > flow rules in the hardware forwarding tables of P4-enabled devices. > > > > > > > > The P4Runtime API is a table-based API, it assume the packet > > > > process pipeline was consists of one or more key/action units > > > > (tables). In P4Runtime, each table defines the fields to be > > > > matched and the actions to be taken on incoming packets. During > > > > compilation, the P4 compiler assigns a unique > > > > uint32 ID to each table, action, and field, which is associated > > > > with its corresponding string name. These IDs have no inherent > > > > relationship to any network protocol but instead serve as a means > > > > to identify different components of a P4 program within the P4Runtime > API. > > > > > > > This is the concept of tables and groups in rte_flow. > > > > > > > If we choose to use rte_flow as the low-level API for P4Runtime, a > > > > translation layer is needed in the application to map the P4 > > > > tables and actions to the corresponding rte_flow rules. However, > > > > this translation layer can be problematic as it is not easily scalable. > > > > When the P4 pipeline is refined or updated, the translation rules > > > > may also need to be updated, which can result in errors and > > > > reduced > > efficiency. > > > > > > > I don't understand why. > > > > > > > On the other hand, a hardware vendor that provides a P4-enabled > > > > device is required to implement an rte_flow interface in their DPDK PMD. > > > > Typically, the > > > > P4 compiler generates hints for the driver on how to map P4 tables > > > > to hardware resources, and how to convert table entry > > > > add/modify/delete actions into low-level hardware configurations. > > > > However, because rte_flow is protocol-based, it poses an > > > > additional challenge for driver developers, who must create > > > > another translation layer to convert rte_flow tokens into P4 > > > > object identifiers. This translation layer must be carefully > > > > designed and implemented to ensure optimal performance and > > > > scalability, and to ensure that the driver can efficiently > > > handle the dynamic nature of P4 programs. > > > > > > > Right, but some of the translation can be done in shared code by all > > > PMDs and the translation is static for the compilation so inserting > > > rules can be supper fast with no need for extra work. > > > > > > > To better understand the problem, let's consider the following > > > > example that demonstrates how to use the P4Runtime API to program > > > > a rule for processing a VXLAN packet. The rule matches a VXLAN > > > > packet, decapsulates the tunnel header, and forwards it to a specific > port. > > > > > > > > The P4 source code below describes the VXLAN decap table > > > > decap_vxlan_tcp_table, which matches the outer IP address, VNI, > > > > inner IP address, and inner TCP port. For each rule, four action > > > > specifications can be selected. We will focus on one action > > > > specification decap_vxlan_fwd that performs decapsulation and > > > > forwards > > > the packet to a specific port. > > > > > > > > table decap_vxlan_tcp_table { > > > > key = { > > > > hdrs.ipv4[meta.depth-1].src_ip: exact @name("tun_ip_src"); > > > > hdrs.ipv4[meta.depth-1].dst_ip: exact @name("tun_ip_dst"); > > > > hdrs.vxlan[meta.depth-1].vni : exact @name("vni"); > > > > hdrs.ipv4[meta.depth].src_ip : exact @name("ipv4_src"); > > > > hdrs.ipv4[meta.depth].dst_ip : exact @name("ipv4_dst"); > > > > hdrs.tcp.sport : exact @name("src_port"); > > > > hdrs.tcp.dport : exact @name("dst_port"); > > > > } > > > > actions = { > > > > @tableonly decap_vxlan_fwd; > > > > @tableonly decap_vxlan_dnat_fwd; > > > > @tableonly decap_vxlan_snat_fwd; > > > > @defaultonly set_exception; > > > > } > > > > } > > > Translate to rte_flow: > > > template pattern relaxed_mode = 1 pattern = Ipv4_src / ipv4_dst / > > > vni / ipv4_src / ipv4_dst / tcp_sport / tcp_dport map structure = { > > > tun_ip_src = &pattern[ipv4_src] > > > .... > > > } > > > > ... > > > > > > > > action decap_vxlan_fwd(PortId_t port_id) { > > > > meta.mod_action = (bit<11>)VXLAN_DECAP_OUTER_IPV4; > > > > send_to_port(port_id); > > > > } > > > > > > > Same as above just with action template > > > > > > > Below is an example of the hint that the compiler will generate > > > > for the > > > > decap_vxlan_tcp_table: > > > > > > > > Table ID: 8454144 > > > > Name: decap_vxlan_tcp_table Field ID Name Match > > > > Type Bit Width Byte Width Byte Order > > > > 1 tun_ip_src exact 32 > > > > 4 network > > > > 2 tun_ip_dst exact 32 > > > > 4 network > > > > 3 vni exact 24 > > > > 3 network > > > > 4 ipv4_src exact 32 > > > > 4 network > > > > 5 ipv4_dst exact 32 > > > > 4 network > > > > 6 src_port exact 16 > > > > 2 network > > > > 7 dst_port exact 16 > > > > 2 network Spec ID Name > > > > 8519716 decap_vxlan_fwd > > > > 8519718 decap_vxlan_dnat_fwd > > > > 8519720 decap_vxlan_snat_fwd > > > > 8519695 set_exception > > > > > > > > And the hint of action spec "decap_vxlan_fwd" as below: > > > > > > > > Spec ID: 8519716 > > > > Name: decap_vxlan_fwd > > > > Field ID Name Bit Width Byte > > > > Width Byte Order > > > > 1 port_id 32 4 host > > > > > > > > Please note that different compilers may assign different IDs. > > > > > > > > Below is an example of how to program a rule using the P4 runtime > > > > API in JSON format. This rule matches fields and directs packets to > > > > port 5. > > > > > > > > { > > > > "type": 1, //INSERT > > > > "entity": { > > > > "table_entry": { > > > > "table_id": 8454144, > > > > "match": [ > > > > { "field_id": 1, "exact": { "value": [10, 0, 0, 1] > > > > } }, // outer src IP = > > > > 10.0.0.1 > > > > { "field_id": 2, "exact": { "value": [10, 0, 0, 2] > > > > } }, // outer dst IP = > > > > 10.0.0.2 > > > > { "field_id": 3, "exact": { "value": [0, 0, 10] } > > > > }, // vni = 10, > > > > { "field_id": 4, "exact": { "value": [192, 0, 0, > > > > 1] } }, // inner src IP = > > > > 192.0.0.1 > > > > {"field_id": 5, "exact": { "value": [192, 0, 0, 2] > > > > } }, // inner dst IP = > > > > 192.0.0.2 > > > > {"field_id": 6, "exact": { "value": [0, 200] } }, > > > > // tcp src port = 200 > > > > {"field_id": 7, "exact": { "value": [0, 201] } }, > > > > // tcp dst port = 201 > > > > ], > > > > "action": { > > > > "action": { > > > > "action_id": 8519716, > > > > "params": [ > > > > { "param_id": 1, "value": [5, 0, 0, 0] } > > > > ] > > > > } > > > > }, > > > > ... > > > > } > > > > } ... > > > > } > > > > > > > > Please note that this is only a part of the full command. For more > > > > information, please refer to the p4runtime.proto[2] > > > > > > > > 1. https://p4.org/p4-spec/p4runtime/main/P4Runtime-Spec.html > > > > 2. > > > > > > > > https://github.com/p4lang/p4runtime/blob/main/proto/p4/v1/p4runtime. > > > p > > > r > > > > oto > > > > > > > > Thank you for your attention to this matter. > > > > > > > > > > I think that we should schedule some meeting to see how much gaps we > > > really have between the rte_flow and > > > P4 and how we can improve the rte_flow to allow the best experience. > > > > Sound a good idea! > > > > > > > Regards > > > > Qi