Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework

Jiayu Hu Fri, 26 May 2017 00:26:25 -0700

Hi Konstantin,

On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> 
> Hi Jiayu,
> 
> > 
> > Hi Konstantin,
> > 
> > Thanks for your comments. My replies/questions are below.
> > 
> > BRs,
> > Jiayu
> > 
> > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > Hi Jiayu,
> > > My comments/questions below.
> > > Konstantin
> > >
> > > >
> > > > For applications, DPDK GRO provides three external functions to
> > > > enable/disable GRO:
> > > > - rte_gro_init: initialize GRO environment;
> > > > - rte_gro_enable: enable GRO for a given port;
> > > > - rte_gro_disable: disable GRO for a given port.
> > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > initizalize GRO environment. After that, applications can call
> > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > specific ports.
> > >
> > > I think this is too restrictive and wouldn't meet various user's needs.
> > > User might want to:
> > > - enable/disable GRO for particular RX queue
> > > - or even setup different GRO types for different RX queues,
> > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 
> > > 1, etc.
> > 
> > The reason for enabling/disabling GRO per-port instead of per-queue is that 
> > LINUX
> > controls GRO per-port. To control GRO per-queue indeed can provide more 
> > flexibility
> > to applications. But are there any scenarios that different queues of a 
> > port may
> > require different GRO control (i.e. GRO types and enable/disable GRO)?
> 
> I think yes.
> 
> > 
> > > - For various reasons, user might prefer not to use RX callbacks for 
> > > various reasons,
> > >   But invoke gro() manually at somepoint in his code.
> > 
> > An application-used GRO library can enable more flexibility to 
> > applications. Besides,
> > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > rte_eth_rx_burst returns actually received packet number or GROed packet 
> > number. And
> > the same issue happens in GSO, and even more seriously. This is because 
> > applications
> > , like VPP, always rely on the return value of rte_eth_tx_burst to decide 
> > further
> > operations. If applications can direcly call GRO/GSO libraries, this issue 
> > won't exist.
> > And DPDK is a library, which is not a holistic system like LINUX. We don't 
> > need to do
> > the same as LINUX. Therefore, maybe it's a better idea to directly provide 
> > SW
> > segmentation/reassembling libraries to applications.
> > 
> > > - Many users would like to control size (number of flows/items per flow),
> > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > - User would need a way to flush all or only timeout packets from 
> > > particular GRO tables.
> > >
> > > So I think that API needs to extended to become be much more fine-grained.
> > > Something like that:
> > >
> > > struct rte_gro_tbl_param {
> > >    int32_t socket_id;
> > >    size_t max_flows;
> > >    size_t max_items_per_flow;
> > >    size_t max_pkt_size;
> > >    uint64_t packet_timeout_cycles;
> > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > >   <probably type specific params>
> > >   ...
> > > };
> > >
> > > struct rte_gro_tbl;
> > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param 
> > > *param);
> > >
> > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > 
> > Yes, I agree with you. It's necessary to provide more fine-grained control 
> > APIs to
> > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP 
> > packets?
> 
> For any packets that sits in the gro_table for too long.
> 
> > 
> > >
> > > /*
> > >  * process packets, might store some packets inside the GRO table,
> > >  * returns number of filled entries in pkt[]
> > >  */
> > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf 
> > > *pkt[], uint32_t num);
> > >
> > > /*
> > >   * retirieves up to num timeouted packets from the table.
> > >   */
> > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, 
> > > struct rte_mbuf *pkt[], uint32_t num);
> > 
> > Currently, we implement GRO as RX callback, whose processing logic is 
> > simple:
> > receive burst packets -> perform GRO -> return to application. GRO stops 
> > after
> > finishing processing received packets. If we provide rte_gro_tbl_timeout, 
> > when
> > and who will call it?
> 
> I mean the following scenario:
> We receive a packet, find it is eligible for GRO and put it into gro_table
> in expectation - there would be more packets for the same flow.
> But it could happen that we would never (or for some long time) receive
> any new packets for that flow.
> So the first packet would never be delivered to the upper layer,
> or delivered too late.
> So we need a mechanism to extract such not merged packets
> and push them to the upper layer.
> My thought it would be application responsibility to call it from time to 
> time.
> That's actually another reason why I think we shouldn't use application
> to always use RX callbacks for GRO.


Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
which merges N inputted packets at a time. After finishing processing these
packets, it returns all of them and reset hashing tables. Therefore, there
are no packets in hashing tables after rte_gro_reassemble_burst returns.

If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
function, like rte_gro_reassemble, which processes one given packet at a
time and won't reset hashing tables. Applications decide when to flush packets
in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used 
to flush the packets. Do you mean that?
> 
> > 
> > >
> > > And if you like to keep them as helper functions:
> > >
> > > int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
> > > void rte_gro_port_disable(uint8_t port);
> > >
> >
>  > Though from my perspective, it is out of scope of that library,
> > > and I'd leave it to the upper layer to decide which callbacks and
> > > in what order have to be installed on particular port.
> > 
> > In my opinion, GRO library includes two parts. One is in charge of 
> > reassembling
> > packets, the other is in charge of implementing GRO as RX callback. And
> > rte_gro_port_enable/disable belongs to the second part. You mean we should 
> > let
> > applications to register RX callback, and GRO library just provides 
> > reassembling
> > related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)?
> 
> Yes, that would be my preference.
> Let the application developer to decide how/when to
> call GRO functions (callback, direct call, some combination).
> 
> > 
> > >
> > > ....
> > >
> > > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > > new file mode 100644
> > > > index 0000000..996b382
> > > > --- /dev/null
> > > > +++ b/lib/librte_gro/rte_gro.c
> > > > @@ -0,0 +1,219 @@
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_mbuf.h>
> > > > +#include <rte_hash.h>
> > > > +#include <stdint.h>
> > > > +#include <rte_malloc.h>
> > > > +
> > > > +#include "rte_gro.h"
> > > > +#include "rte_gro_common.h"
> > > > +
> > > > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> > > > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
> > >
> > > Why not to initialize these arrays properly at building time (here)?
> > 
> > Yes, I should declare them as static variables.
> > 
> > >
> > > > +
> > > > +struct rte_gro_status *gro_status;
> > > > +
> > > > +/**
> > > > + * Internal function. It creates one hashing table for all
> > > > + * DPDK-supported GRO types, and all of them are stored in an object
> > > > + * of struct rte_gro_tbl.
> > > > + *
> > > > + * @param name
> > > > + *  Name for GRO lookup table
> > > > + * @param nb_entries
> > > > + *  Element number of each hashing table
> > > > + * @param socket_id
> > > > + *  socket id
> > > > + * @param gro_tbl
> > > > + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> > > > + *  inside rte_gro_tbl_setup.
> > > > + * @return
> > > > + *  If create successfully, return a positive value; if not, return
> > > > + *  a negative value.
> > > > + */
> > > > +static int
> > > > +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> > > > +               uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> > > > +{
> > > > +       gro_tbl_create_fn create_tbl_fn;
> > > > +       const uint32_t len = strlen(name) + 10;
> > > > +       char tbl_name[len];
> > > > +       int i;
> > > > +
> > > > +       for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > > +               sprintf(tbl_name, "%s_%u", name, i);
> > > > +               create_tbl_fn = tbl_create_functions[i];
> > > > +               if (create_tbl_fn && (create_tbl_fn(name,
> > > > +                                               nb_entries,
> > > > +                                               socket_id,
> > > > +                                               &(gro_tbl->
> > > > +                                                       
> > > > lkp_tbls[i].hash_tbl))
> > > > +                                       < 0)) {
> > >
> > > Shouldn't you destroy already created tables here?
> > 
> > Yes, I should add codes to destory created tables before creating new ones.
> > 
> > >
> > > > +                       return -1;
> > > > +               }
> > > > +               gro_tbl->lkp_tbls[i].gro_type = i;
> > > > +       }
> > > > +       return 1;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Internal function. It frees all the hashing tables stored in
> > > > + * the given struct rte_gro_tbl object.
> > > > + */
> > > > +static void
> > > > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > > > +{
> > > > +       int i;
> > > > +
> > > > +       if (gro_tbl == NULL)
> > > > +               return;
> > > > +       for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > > +               rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> > > > +               gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> > > > +               gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> > > > +       }
> > > > +}
> > > > +
> > > > +/**
> > > > + * Internal function. It performs all supported GRO types on inputted
> > > > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> > > > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> > > > + * packets. Packets of unsupported GRO types won't be processed. For
> > > > + * ethernet devices, which want to support GRO, this function is used 
> > > > to
> > > > + * registered as RX callback for all queues.
> > > > + *
> > > > + * @param pkts
> > > > + *  Packets to reassemble.
> > > > + * @param nb_pkts
> > > > + *  The number of packets to reassemble.
> > > > + * @param gro_tbl
> > > > + *  pointer points to an object of struct rte_gro_tbl, which has been
> > > > + *  initialized by rte_gro_tbl_setup.
> > > > + * @return
> > > > + *  Packet number after GRO. If reassemble successfully, the value is
> > > > + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> > > > + *  parameters are invalid, return 0.
> > > > + */
> > > > +static uint16_t
> > > > +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> > > > +               uint16_t queue __rte_unused,
> > > > +               struct rte_mbuf **pkts,
> > > > +               uint16_t nb_pkts,
> > > > +               uint16_t max_pkts __rte_unused,
> > > > +               void *gro_tbl)
> > > > +{
> > > > +       if ((gro_tbl == NULL) || (pkts == NULL)) {
> > > > +               printf("invalid parameters for GRO.\n");
> > > > +               return 0;
> > > > +       }
> > > > +       uint16_t nb_after_gro = nb_pkts;
> > >
> > > Here and everywhere - please move variable definitions to the top of the 
> > > block.
> > 
> > Thanks. I will modify them in next version.
> > 
> > >
> > > > +
> > > > +       return nb_after_gro;
> > > > +}
> > > > +
> > > > +void
> > > > +rte_gro_init(void)
> > > > +{
> > > > +       uint8_t nb_port, i;
> > > > +       uint16_t nb_queue;
> > > > +       struct rte_eth_dev_info dev_info;
> > > > +
> > > > +       /* if init already, return immediately */
> > > > +       if (gro_status) {
> > > > +               printf("repeatly init GRO environment\n");
> > > > +               return;
> > > > +       }
> > > > +
> > > > +       gro_status = (struct rte_gro_status *)rte_zmalloc(
> > > > +                       NULL,
> > > > +                       sizeof(struct rte_gro_status),
> > > > +                       0);
> > > > +
> > > > +       nb_port = rte_eth_dev_count();
> > >
> > > Number of ports and number of configured queues per port can change 
> > > dynamically.
> > > In fact, I don't think we need that function and global gro_status at all.
> > > To add/delete RX callback for particular port/queue - you can just scan 
> > > over exisiting callbacks
> > > Searching for particular cb_func and cb_arg values.
> > 
> > Yes, it's better to store these parameters (e.g. hashing table pointers) as 
> > cb_arg. But
> > we can't remove RX callback by searching for particular cb_func inside GRO 
> > library, since
> > these operations need locking protection and the lock variable (i.e. 
> > rte_eth_rx_cb_lock)
> > is unavailable to GRO library. To my knowledge, the only way is to provide 
> > a new API in
> > lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func 
> > or cb_arg values.
> > You mean we need to add this API?
> 
> Yes my though was to add something like:
> rte_eth_find_rx_callback()                        /*find specific callback *./
> or rte_eth_remove_get_rx_callbacks()  /*get a copy  of list of all currently 
> installed callback */ 
> 
> But if we move installing GRO callbacks out of scope of the library, we 
> probably wouldn't need it.
> 
> > 
> > >
> > > > +       gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> > > > +                       NULL,
> > > > +                       nb_port * sizeof(struct gro_port_status),
> > > > +                       0);
> > > > +       gro_status->nb_port = nb_port;
> > > > +
> > > > +       for (i = 0; i < nb_port; i++) {
> > > > +               rte_eth_dev_info_get(i, &dev_info);
> > > > +               nb_queue = dev_info.nb_rx_queues;
> > > > +               gro_status->ports[i].gro_tbls =
> > > > +                       (struct rte_gro_tbl **)rte_zmalloc(
> > > > +                                       NULL,
> > > > +                                       nb_queue * sizeof(struct 
> > > > rte_gro_tbl *),
> > > > +                                       0);
> > > > +               gro_status->ports[i].gro_cbs =
> > > > +                       (struct rte_eth_rxtx_callback **)
> > > > +                       rte_zmalloc(
> > > > +                                       NULL,
> > > > +                                       nb_queue *
> > > > +                                       sizeof(struct 
> > > > rte_eth_rxtx_callback *),
> > > > +                                       0);
> > > > +       }
> > > > +}
> > > > +
> > > > +}
> > > ....
> > > > diff --git a/lib/librte_gro/rte_gro_common.h 
> > > > b/lib/librte_gro/rte_gro_common.h
> > > ....
> > >
> > > > +
> > > > +typedef int (*gro_tbl_create_fn)(
> > > > +               char *name,
> > > > +               uint32_t nb_entries,
> > > > +               uint16_t socket_id,
> > > > +               struct rte_hash **hash_tbl);
> > > > +
> > >
> > > If you have create_fn, then you'll probably need a destroy_fn too.
> > > Second thing why always assume that particular GRO implementation would
> > > always use rte_hash to store it's data?
> > > Probably better hide that inside something neutral, like 'struct 
> > > gro_data;' or so.
> > 
> > Agree. I will modify it.
> > 
> > >
> > > > +typedef int32_t (*gro_reassemble_fn)(
> > > > +               struct rte_hash *hash_tbl,
> > > > +               struct gro_item_list *item_list);
> > > > +#endif
> > > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > > > index b5215c0..8956821 100644
> > > > --- a/mk/rte.app.mk
> > > > +++ b/mk/rte.app.mk
> > > > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += 
> > > > -lrte_ring
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)               += -lrte_gro
> > > >
> > > >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > > > --
> > > > 2.7.4

Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework

Reply via email to