[dpdk-dev] mbuf changes

Bruce Richardson Tue, 25 Oct 2016 12:13:57 +0100

On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote:
> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote:
> > Comments inline.
> > 
> > Med venlig hilsen / kind regards
> > - Morten Br?rup
> > 
> > 
> > > -----Original Message-----
> > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> > > Sent: Tuesday, October 25, 2016 11:39 AM
> > > To: Bruce Richardson
> > > Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier Matz; Oleg
> > > Kuporosov
> > > Subject: Re: [dpdk-dev] mbuf changes
> > > 
> > > On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson wrote:
> > > > On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith wrote:
> > > [...]
> > > > > > On Oct 24, 2016, at 10:49 AM, Morten Br?rup
> > > <mb at smartsharesystems.com> wrote:
> > > [...]
> > > > > > 5.
> > > > > >
> > > > > > And here?s something new to think about:
> > > > > >
> > > > > > m->next already reveals if there are more segments to a packet.
> > > Which purpose does m->nb_segs serve that is not already covered by m-
> > > >next?
> > > >
> > > > It is duplicate info, but nb_segs can be used to check the validity
> > > of
> > > > the next pointer without having to read the second mbuf cacheline.
> > > >
> > > > Whether it's worth having is something I'm happy enough to discuss,
> > > > though.
> > > 
> > > Although slower in some cases than a full blown "next packet" pointer,
> > > nb_segs can also be conveniently abused to link several packets and
> > > their segments in the same list without wasting space.
> > 
> > I don?t understand that; can you please elaborate? Are you abusing 
> > m->nb_segs as an index into an array in your application? If that is the 
> > case, and it is endorsed by the community, we should get rid of m->nb_segs 
> > and add a member for application specific use instead. 
> 
> Well, that's just an idea, I'm not aware of any application using this,
> however the ability to link several packets with segments seems
> useful to me (e.g. buffering packets). Here's a diagram:
> 
>  .-----------.   .-----------.   .-----------.   .-----------.   .------
>  | pkt 0     |   | seg 1     |   | seg 2     |   | pkt 1     |   | pkt 2
>  |      next --->|      next --->|      next --->|      next --->| ...
>  | nb_segs 3 |   | nb_segs 1 |   | nb_segs 1 |   | nb_segs 1 |   |
>  `-----------'   `-----------'   `-----------'   `-----------'   `------
> 
> > > > One other point I'll mention is that we need to have a discussion on
> > > > how/where to add in a timestamp value into the mbuf. Personally, I
> > > > think it can be in a union with the sequence number value, but I also
> > > > suspect that 32-bits of a timestamp is not going to be enough for
> > > many.
> > > >
> > > > Thoughts?
> > > 
> > > If we consider that timestamp representation should use nanosecond
> > > granularity, a 32-bit value may likely wrap around too quickly to be
> > > useful. We can also assume that applications requesting timestamps may
> > > care more about latency than throughput, Oleg found that using the
> > > second cache line for this purpose had a noticeable impact [1].
> > > 
> > >  [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html
> > 
> > I agree with Oleg about the latency vs. throughput importance for such 
> > applications.
> > 
> > If you need high resolution timestamps, consider them to be generated by 
> > the NIC RX driver, possibly by the hardware itself 
> > (http://w3new.napatech.com/features/time-precision/hardware-time-stamp), so 
> > the timestamp belongs in the first cache line. And I am proposing that it 
> > should have the highest possible accuracy, which makes the value hardware 
> > dependent.
> > 
> > Furthermore, I am arguing that we leave it up to the application to keep 
> > track of the slowly moving bits (i.e. counting whole seconds, hours and 
> > calendar date) out of band, so we don't use precious space in the mbuf. The 
> > application doesn't need the NIC RX driver's fast path to capture which 
> > date (or even which second) a packet was received. Yes, it adds complexity 
> > to the application, but we can't set aside 64 bit for a generic timestamp. 
> > Or as a weird tradeoff: Put the fast moving 32 bit in the first cache line 
> > and the slow moving 32 bit in the second cache line, as a placeholder for 
> > the application to fill out if needed. Yes, it means that the application 
> > needs to check the time and update its variable holding the slow moving 
> > time once every second or so; but that should be doable without significant 
> > effort.
> 
> That's a good point, however without a 64 bit value, elapsed time between
> two arbitrary mbufs cannot be measured reliably due to not enough context,
> one way or another the low resolution value is also needed.
> 
> Obviously latency-sensitive applications are unlikely to perform lengthy
> buffering and require this but I'm not sure about all the possible
> use-cases. Considering many NICs expose 64 bit timestaps, I suggest we do
> not truncate them.
> 
> I'm not a fan of the weird tradeoff either, PMDs will be tempted to fill the
> extra 32 bits whenever they can and negate the performance improvement of
> the first cache line.


I would tend to agree, and I don't really see any convenient way to
avoid putting in a 64-bit field for the timestamp in cache-line 0. If we
are ok with having this overlap/partially overlap with sequence number,
it will use up an extra 4B of storage in that cacheline. However,
nb_segs may be a good candidate for demotion, along with possibly the
port value, or the reference count.

/Bruce

[dpdk-dev] mbuf changes

Reply via email to