On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote: > On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote: > > Comments inline. > > > > Med venlig hilsen / kind regards > > - Morten Br?rup > > > > > > > -----Original Message----- > > > From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com] > > > Sent: Tuesday, October 25, 2016 11:39 AM > > > To: Bruce Richardson > > > Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier Matz; Oleg > > > Kuporosov > > > Subject: Re: [dpdk-dev] mbuf changes > > > > > > On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson wrote: > > > > On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith wrote: > > > [...] > > > > > > On Oct 24, 2016, at 10:49 AM, Morten Br?rup > > > <mb at smartsharesystems.com> wrote: > > > [...] > > > > > > 5. > > > > > > > > > > > > And here?s something new to think about: > > > > > > > > > > > > m->next already reveals if there are more segments to a packet. > > > Which purpose does m->nb_segs serve that is not already covered by m- > > > >next? > > > > > > > > It is duplicate info, but nb_segs can be used to check the validity > > > of > > > > the next pointer without having to read the second mbuf cacheline. > > > > > > > > Whether it's worth having is something I'm happy enough to discuss, > > > > though. > > > > > > Although slower in some cases than a full blown "next packet" pointer, > > > nb_segs can also be conveniently abused to link several packets and > > > their segments in the same list without wasting space. > > > > I don?t understand that; can you please elaborate? Are you abusing > > m->nb_segs as an index into an array in your application? If that is the > > case, and it is endorsed by the community, we should get rid of m->nb_segs > > and add a member for application specific use instead. > > Well, that's just an idea, I'm not aware of any application using this, > however the ability to link several packets with segments seems > useful to me (e.g. buffering packets). Here's a diagram: > > .-----------. .-----------. .-----------. .-----------. .------ > | pkt 0 | | seg 1 | | seg 2 | | pkt 1 | | pkt 2 > | next --->| next --->| next --->| next --->| ... > | nb_segs 3 | | nb_segs 1 | | nb_segs 1 | | nb_segs 1 | | > `-----------' `-----------' `-----------' `-----------' `------ > > > > > One other point I'll mention is that we need to have a discussion on > > > > how/where to add in a timestamp value into the mbuf. Personally, I > > > > think it can be in a union with the sequence number value, but I also > > > > suspect that 32-bits of a timestamp is not going to be enough for > > > many. > > > > > > > > Thoughts? > > > > > > If we consider that timestamp representation should use nanosecond > > > granularity, a 32-bit value may likely wrap around too quickly to be > > > useful. We can also assume that applications requesting timestamps may > > > care more about latency than throughput, Oleg found that using the > > > second cache line for this purpose had a noticeable impact [1]. > > > > > > [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html > > > > I agree with Oleg about the latency vs. throughput importance for such > > applications. > > > > If you need high resolution timestamps, consider them to be generated by > > the NIC RX driver, possibly by the hardware itself > > (http://w3new.napatech.com/features/time-precision/hardware-time-stamp), so > > the timestamp belongs in the first cache line. And I am proposing that it > > should have the highest possible accuracy, which makes the value hardware > > dependent. > > > > Furthermore, I am arguing that we leave it up to the application to keep > > track of the slowly moving bits (i.e. counting whole seconds, hours and > > calendar date) out of band, so we don't use precious space in the mbuf. The > > application doesn't need the NIC RX driver's fast path to capture which > > date (or even which second) a packet was received. Yes, it adds complexity > > to the application, but we can't set aside 64 bit for a generic timestamp. > > Or as a weird tradeoff: Put the fast moving 32 bit in the first cache line > > and the slow moving 32 bit in the second cache line, as a placeholder for > > the application to fill out if needed. Yes, it means that the application > > needs to check the time and update its variable holding the slow moving > > time once every second or so; but that should be doable without significant > > effort. > > That's a good point, however without a 64 bit value, elapsed time between > two arbitrary mbufs cannot be measured reliably due to not enough context, > one way or another the low resolution value is also needed. > > Obviously latency-sensitive applications are unlikely to perform lengthy > buffering and require this but I'm not sure about all the possible > use-cases. Considering many NICs expose 64 bit timestaps, I suggest we do > not truncate them. > > I'm not a fan of the weird tradeoff either, PMDs will be tempted to fill the > extra 32 bits whenever they can and negate the performance improvement of > the first cache line.
I would tend to agree, and I don't really see any convenient way to avoid putting in a 64-bit field for the timestamp in cache-line 0. If we are ok with having this overlap/partially overlap with sequence number, it will use up an extra 4B of storage in that cacheline. However, nb_segs may be a good candidate for demotion, along with possibly the port value, or the reference count. /Bruce