Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework

Ananyev, Konstantin Mon, 29 May 2017 05:53:33 -0700

Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Monday, May 29, 2017 11:23 AM
> To: Ananyev, Konstantin <[email protected]>
> Cc: [email protected]; Wiles, Keith <[email protected]>; 
> [email protected]
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Sunday, May 28, 2017 12:51 AM
> > To: Hu, Jiayu <[email protected]>
> > Cc: [email protected]; Wiles, Keith <[email protected]>;
> > [email protected]
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> >
> >
> > Hi Jiayu,
> >
> > >
> > > Hi Konstantin,
> > >
> > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Hu, Jiayu
> > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > To: Ananyev, Konstantin <[email protected]>
> > > > > Cc: [email protected]; Wiles, Keith <[email protected]>;
> > [email protected]
> > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > >
> > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > > > Hi Jiayu,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Hu, Jiayu
> > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > To: Ananyev, Konstantin <[email protected]>
> > > > > > > Cc: [email protected]; Wiles, Keith <[email protected]>;
> > [email protected]
> > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin
> > wrote:
> > > > > > > >
> > > > > > > > Hi Jiayu,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Konstantin,
> > > > > > > > >
> > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > >
> > > > > > > > > BRs,
> > > > > > > > > Jiayu
> > > > > > > > >
> > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > Konstantin wrote:
> > > > > > > > > > Hi Jiayu,
> > > > > > > > > > My comments/questions below.
> > > > > > > > > > Konstantin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For applications, DPDK GRO provides three external 
> > > > > > > > > > > functions
> > to
> > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > Before using GRO, applications should explicitly call
> > rte_gro_init to
> > > > > > > > > > > initizalize GRO environment. After that, applications can 
> > > > > > > > > > > call
> > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to
> > disable GRO for
> > > > > > > > > > > specific ports.
> > > > > > > > > >
> > > > > > > > > > I think this is too restrictive and wouldn't meet various 
> > > > > > > > > > user's
> > needs.
> > > > > > > > > > User might want to:
> > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > IPV6/TCP for queue 1, etc.
> > > > > > > > >
> > > > > > > > > The reason for enabling/disabling GRO per-port instead of per-
> > queue is that LINUX
> > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > provide more flexibility
> > > > > > > > > to applications. But are there any scenarios that different
> > queues of a port may
> > > > > > > > > require different GRO control (i.e. GRO types and 
> > > > > > > > > enable/disable
> > GRO)?
> > > > > > > >
> > > > > > > > I think yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > - For various reasons, user might prefer not to use RX 
> > > > > > > > > > callbacks
> > for various reasons,
> > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > >
> > > > > > > > > An application-used GRO library can enable more flexibility to
> > applications. Besides,
> > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is 
> > > > > > > > > an
> > issue that
> > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > GROed packet number. And
> > > > > > > > > the same issue happens in GSO, and even more seriously. This 
> > > > > > > > > is
> > because applications
> > > > > > > > > , like VPP, always rely on the return value of 
> > > > > > > > > rte_eth_tx_burst to
> > decide further
> > > > > > > > > operations. If applications can direcly call GRO/GSO 
> > > > > > > > > libraries,
> > this issue won't exist.
> > > > > > > > > And DPDK is a library, which is not a holistic system like 
> > > > > > > > > LINUX.
> > We don't need to do
> > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > directly provide SW
> > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > >
> > > > > > > > > > - Many users would like to control size (number of 
> > > > > > > > > > flows/items
> > per flow),
> > > > > > > > > >   max allowed packet size, max timeout, etc., for different 
> > > > > > > > > > GRO
> > tables.
> > > > > > > > > > - User would need a way to flush all or only timeout packets
> > from particular GRO tables.
> > > > > > > > > >
> > > > > > > > > > So I think that API needs to extended to become be much more
> > fine-grained.
> > > > > > > > > > Something like that:
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > >    int32_t socket_id;
> > > > > > > > > >    size_t max_flows;
> > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > >   <probably type specific params>
> > > > > > > > > >   ...
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > rte_gro_tbl_param *param);
> > > > > > > > > >
> > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > >
> > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > grained control APIs to
> > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is 
> > > > > > > > > it
> > for TCP packets?
> > > > > > > >
> > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >  * process packets, might store some packets inside the GRO
> > table,
> > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > >  */
> > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > >   */
> > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, 
> > > > > > > > > > uint64_t
> > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > >
> > > > > > > > > Currently, we implement GRO as RX callback, whose processing
> > logic is simple:
> > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > GRO stops after
> > > > > > > > > finishing processing received packets. If we provide
> > rte_gro_tbl_timeout, when
> > > > > > > > > and who will call it?
> > > > > > > >
> > > > > > > > I mean the following scenario:
> > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > gro_table
> > > > > > > > in expectation - there would be more packets for the same flow.
> > > > > > > > But it could happen that we would never (or for some long time)
> > receive
> > > > > > > > any new packets for that flow.
> > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > or delivered too late.
> > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > and push them to the upper layer.
> > > > > > > > My thought it would be application responsibility to call it 
> > > > > > > > from
> > time to time.
> > > > > > > > That's actually another reason why I think we shouldn't use
> > application
> > > > > > > > to always use RX callbacks for GRO.
> > > > > > >
> > > > > > > Currently, we only provide one reassembly function,
> > rte_gro_reassemble_burst,
> > > > > > > which merges N inputted packets at a time. After finishing
> > processing these
> > > > > > > packets, it returns all of them and reset hashing tables. 
> > > > > > > Therefore,
> > there
> > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > returns.
> > > > > >
> > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > So gro_ressemble_burst() performs only inline processing on current
> > input packets
> > > > > > and doesn't try to save packets for future merging, correct?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > Such approach indeed is much lightweight and doesn't require any
> > extra timeouts and flush().
> > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > BTW, I think in that case we don't need any hashtables (or any other
> > persistent strucures)at all.
> > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > perform on given array of packets.
> > > > >
> > > > > Beside GRO types that are desired to perform, maybe it also needs
> > max_pkt_size and
> > > > > some GRO type specific information?
> > > >
> > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > Passing something like struct gro_param seems enough.
> > >
> > > Yes, we can just pass gro_param and allocate hashing tables
> > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > desired GRO types are created and freed in each invocation
> > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > are created by GRO type specific gro_tbl_create_fn. These
> > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > degrade GRO performance greatly.
> >
> > I don't' understand why do we need to put/extract each packet into/from
> > hash table at all.
> > We have N input packets that need to be grouped/sorted  by some criteria.
> > Surely that can be done without any hash-table involved.
> > What is the need for hash table and all the overhead it brings here?
> 
> In current design, I assume all GRO types use hash tables to merge
> packets. The key of the hash table is the criteria to merge packets.
> So the main difference for different GRO types' hash tables is the
> key definition.
> 
> And the reason for using hash tables is to speed up reassembly. Given
> There are N TCP packets inputted, the simplest way to process packets[i]
> Is to traverse processed packets[0]~packets[i-1] and try to find a one
> to merge. In the worst case, we need to check all of packets[0~i-1].
> In this case, the time complexity of processing N packets is O(N^2).
> If we use a hash table, whose key is the criteria to merge two packets,
> the time to find a packet that may be merged with packets[i] is O(1).


I understand that, but add/search inside the hash table,
plus resetting it for every burst of packets doesn't come for free also.
I think that for most common burst sizes (< 100 packets) 
hash overhead would significantly overweight the price of
worst case O(N^2) scan.
Also, if such worst case really worries you, you can  pre-sort
input packets first before starting the actual reassemble for them.
That might help to bring the price down.
Konstantin  

> 
> Do you think it's too complicated?
> 
> Jiayu
> 
> > Konstantin
> >
> > >
> > > But if we ask applications to input hashing tables, what we
> > > need to do is to reset them after finishing using in
> > > rte_gro_reassemble_burst, rather than to malloc and free each
> > > time. Therefore, I think this way is more efficient. How do you
> > > think?
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > If we provide rte_gro_tbl_timeout, we also need to provide another
> > reassmebly
> > > > > > > function, like rte_gro_reassemble, which processes one given
> > packet at a
> > > > > > > time and won't reset hashing tables. Applications decide when to
> > flush packets
> > > > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that
> > can be used
> > > > > > > to flush the packets. Do you mean that?
> > > > > >
> > > > > > Yes, that's what I meant, but as I said above - I think your 
> > > > > > approach is
> > probably
> > > > > > more preferable - it is much simpler and lightweight.
> > > > > > Konstantin
> > > > > >

Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework

Reply via email to