HI Jerin: BR Rongwei
> -----Original Message----- > From: Jerin Jacob <jerinjac...@gmail.com> > Sent: Wednesday, December 21, 2022 20:45 > To: Rongwei Liu <rongw...@nvidia.com> > Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko > <viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact- > Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; Ferruh Yigit > <ferruh.yi...@amd.com>; Andrew Rybchenko > <andrew.rybche...@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh > <rasl...@nvidia.com> > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active or standby > > External email: Use caution opening links or attachments > > > On Wed, Dec 21, 2022 at 5:35 PM Rongwei Liu <rongw...@nvidia.com> wrote: > > > > Hi Jerin: > > > > BR > > Rongwei > > > > > -----Original Message----- > > > From: Jerin Jacob <jerinjac...@gmail.com> > > > Sent: Wednesday, December 21, 2022 19:00 > > > To: Rongwei Liu <rongw...@nvidia.com> > > > Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko > > > <viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; NBU-Contact- > > > Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; Ferruh Yigit > > > <ferruh.yi...@amd.com>; Andrew Rybchenko > > > <andrew.rybche...@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh > > > <rasl...@nvidia.com> > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to active > > > or standby > > > > > > External email: Use caution opening links or attachments > > > > > > > > > On Wed, Dec 21, 2022 at 3:02 PM Rongwei Liu <rongw...@nvidia.com> > wrote: > > > > > > > > HI Jerin: > > > > > > > > > > Hi Rongwei > > > > > > > BR > > > > Rongwei > > > > > > > > > -----Original Message----- > > > > > From: Jerin Jacob <jerinjac...@gmail.com> > > > > > Sent: Wednesday, December 21, 2022 17:13 > > > > > To: Rongwei Liu <rongw...@nvidia.com> > > > > > Cc: Matan Azrad <ma...@nvidia.com>; Slava Ovsiienko > > > > > <viachesl...@nvidia.com>; Ori Kam <or...@nvidia.com>; > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <tho...@monjalon.net>; > > > > > Ferruh Yigit <ferruh.yi...@amd.com>; Andrew Rybchenko > > > > > <andrew.rybche...@oktetlabs.ru>; dev@dpdk.org; Raslan Darawsheh > > > > > <rasl...@nvidia.com> > > > > > Subject: Re: [RFC v3 2/2] ethdev: add API to set process to > > > > > active or standby > > > > > > > > > > External email: Use caution opening links or attachments > > > > > > > > > > > > > > > On Wed, Dec 21, 2022 at 2:31 PM Rongwei Liu > > > > > <rongw...@nvidia.com> > > > wrote: > > > > > > > > > > > > Users may want to change the DPDK process to different > > > > > > versions > > > > > > > > > > Different version of DPDK? If there is any ABI change how to support > this? > > > > > > > > > There is a new member which was introduced into rte_eth_dev_info > > > > but it > > > shouldn’t be ABI breaking since using reserved fields. > > > > > > That is just for rte_eth_dev_info. What about the ABI change in > > > different ethdev structure and rte_flow structures across different DPDK > ABI versions. > > > > > Besides this, there is no other ABI changes dependency. > > > > Assume there is a DPDK process A running with version v21.11 and plan > > to upgrade to version v22.11. Let' call v22.11 as process B. > > OK. That's a relief. I understand the use case now. > > Why not simply use standard DPDK multiprocess model then. > Primary process act as server for slow path API. Secondary process can come > and go(aka can be updated at runtime) and use as client to update rules via > primary-secondray communication mechanism. > Just image if process A and process B have ABI breakage like different rte_flow_item_*** and rte_flow_action_*** size and members. How can we quickly accommodate primary/secondary to be ABI compatible across different versions? It will be very huge effort and difficult to implement, at least in my opinion. What do you think? > > > > > Now, process A has been running for long time and has lot of rules > configured. It' "active" role per this API definition. > > Process B starts and it should call this API and set itself to > > "standby" role and user can program the flow rules as they want and > different NIC vendors may have different recommendations. Nvidia suggests > only program process B with group 0' rules now. > > > > The user should sync all desired configurations from process A to > > process B, and process A starts to yield traffic like "delete all group 0 > > rules > for Nvidia' NICs" or quit. > > After that process B calls this API and set itself to "active" role, now > > the hot- > upgrade finishes. > > > > > > > > such as hot upgrade. > > > > > > There is a strong requirement to simplify the logic and > > > > > > shorten the traffic downtime as much as possible. > > > > > > > > > > > > This update introduces new rte_eth process role definitions: > > > > > > active or standby. > > > > > > > > > > > > The active role means rules are programmed to HW immediately, > > > > > > and no > > > > > > > > > > Why it has to be specific only to rte_flow rule? If it spedieic > > > > > to rte_flow, why it is in rte_eth_process_ name space? > > > > For now, this design focuses on the flow rule offloading and > > > > traffic > > > redirection. > > > > When switching process version, it' important to make sure which > > > application receives and handles the traffic. > > > > > > Changing the DPDK version runtime is just beyond rte_flow driver. > > > > It' not about changing DPDK version but upgrading DPDK from one PMD > version to another one. > > Does the preceding example answer your question? > > > > > > > The changing should be effective across all probing eth devices, > > > > that' why it > > > was put under rte_eth_process_ (for all rte_eth_dev) name space. > > > > > > > > > > Also, if we are moving the standby, What about the rule whose > > > > > ABI is changed between versions? > > > > > > > > Like the comments mentioned: " Before role transition, all the > > > > rules set by > > > the active process should be flushed first. " > > > > > > What happens to rte_flow flow handles for existing ones which is > > > created with version X? > > > Also What if new version Y has ABI change in rte_flow_pattern and > > > rte_flow_action structure? > > > > > > For me, If DPDK version change is needed, simply reload the > > > application. This API will soon bloat, and it will be a mess if to > > > start handling Different DPDK version which is not ABI compatible at all. > > > > > Yes, you are right. Reloading the application is the easiest way but > > it may have a long time Window that traffic is lost. No traffic arrives at > process A or process B. > > We are trying to simplify the reloading logic and minimize the traffic down > time as much as possible. > > The approach may differentiate hugely between different NIC vendors, > > so I think it should be better if DPDK can provide an abstract API. > > > > If process A and process B are ABI different, it doesn't matter. > > 1. Call this API with process A means older ABI. > > 2. Call this API with process B means newer ABI. > > It' have process concept and working scope. > > > > > > > > > > > > > > > > > behavior changed. This is the default state. > > > > > > The standby role means rules are queued in the HW. If no > > > > > > active roles alive or back to active, the rules are effective > immediately. > > > > > > > > > > > > Signed-off-by: Rongwei Liu <rongw...@nvidia.com>