Hi Olivier, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, October 25, 2016 1:49 PM > To: Richardson, Bruce <bruce.richardson at intel.com>; Morten Br?rup > <mb at smartsharesystems.com> > Cc: Adrien Mazarguil <adrien.mazarguil at 6wind.com>; Wiles, Keith > <keith.wiles at intel.com>; dev at dpdk.org; Oleg Kuporosov > <olegk at mellanox.com> > Subject: Re: [dpdk-dev] mbuf changes > > > > On 10/25/2016 02:45 PM, Bruce Richardson wrote: > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote: > >> Comments at the end. > >> > >> Med venlig hilsen / kind regards > >> - Morten Br?rup > >> > >>> -----Original Message----- > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com] > >>> Sent: Tuesday, October 25, 2016 2:20 PM > >>> To: Morten Br?rup > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier Matz; Oleg > >>> Kuporosov > >>> Subject: Re: [dpdk-dev] mbuf changes > >>> > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote: > >>>> Comments inline. > >>>> > >>>>> -----Original Message----- > >>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce > >>>>> Richardson > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM > >>>>> To: Adrien Mazarguil > >>>>> Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz; Oleg > >>>>> Kuporosov > >>>>> Subject: Re: [dpdk-dev] mbuf changes > >>>>> > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote: > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote: > >>>>>>> Comments inline. > >>>>>>> > >>>>>>> Med venlig hilsen / kind regards > >>>>>>> - Morten Br?rup > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com] > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM > >>>>>>>> To: Bruce Richardson > >>>>>>>> Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier Matz; > >>>>>>>> Oleg Kuporosov > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes > >>>>>>>> > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson > >>> wrote: > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith > >>> wrote: > >>>>>>>> [...] > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup > >>>>>>>> <mb at smartsharesystems.com> wrote: > >>>>>>>> [...] > >>>> > >>>>>>>>> One other point I'll mention is that we need to have a > >>>>>>>>> discussion on how/where to add in a timestamp value into > >>> the > >>>>>>>>> mbuf. Personally, I think it can be in a union with the > >>>>> sequence > >>>>>>>>> number value, but I also suspect that 32-bits of a > >>> timestamp > >>>>>>>>> is not going to be enough for > >>>>>>>> many. > >>>>>>>>> > >>>>>>>>> Thoughts? > >>>>>>>> > >>>>>>>> If we consider that timestamp representation should use > >>>>> nanosecond > >>>>>>>> granularity, a 32-bit value may likely wrap around too > >>> quickly > >>>>>>>> to be useful. We can also assume that applications requesting > >>>>>>>> timestamps may care more about latency than throughput, Oleg > >>>>> found > >>>>>>>> that using the second cache line for this purpose had a > >>>>> noticeable impact [1]. > >>>>>>>> > >>>>>>>> [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html > >>>>>>> > >>>>>>> I agree with Oleg about the latency vs. throughput importance > >>>>>>> for > >>>>> such applications. > >>>>>>> > >>>>>>> If you need high resolution timestamps, consider them to be > >>>>> generated by the NIC RX driver, possibly by the hardware itself > >>>>> (http://w3new.napatech.com/features/time-precision/hardware-time- > >>>>> stamp), so the timestamp belongs in the first cache line. And I am > >>>>> proposing that it should have the highest possible accuracy, which > >>>>> makes the value hardware dependent. > >>>>>>> > >>>>>>> Furthermore, I am arguing that we leave it up to the > >>> application > >>>>>>> to > >>>>> keep track of the slowly moving bits (i.e. counting whole seconds, > >>>>> hours and calendar date) out of band, so we don't use precious > >>> space > >>>>> in the mbuf. The application doesn't need the NIC RX driver's fast > >>>>> path to capture which date (or even which second) a packet was > >>>>> received. Yes, it adds complexity to the application, but we can't > >>>>> set aside 64 bit for a generic timestamp. Or as a weird tradeoff: > >>>>> Put the fast moving 32 bit in the first cache line and the slow > >>>>> moving 32 bit in the second cache line, as a placeholder for the > >>> application to fill out if needed. > >>>>> Yes, it means that the application needs to check the time and > >>>>> update its variable holding the slow moving time once every second > >>>>> or so; but that should be doable without significant effort. > >>>>>> > >>>>>> That's a good point, however without a 64 bit value, elapsed time > >>>>>> between two arbitrary mbufs cannot be measured reliably due to > >>> not > >>>>>> enough context, one way or another the low resolution value is > >>>>>> also > >>>>> needed. > >>>>>> > >>>>>> Obviously latency-sensitive applications are unlikely to perform > >>>>>> lengthy buffering and require this but I'm not sure about all the > >>>>>> possible use-cases. Considering many NICs expose 64 bit > >>> timestaps, > >>>>>> I suggest we do not truncate them. > >>>>>> > >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be tempted > >>>>>> to fill the extra 32 bits whenever they can and negate the > >>>>>> performance improvement of the first cache line. > >>>>> > >>>>> I would tend to agree, and I don't really see any convenient way > >>>>> to avoid putting in a 64-bit field for the timestamp in cache-line 0. > >>>>> If we are ok with having this overlap/partially overlap with > >>>>> sequence number, it will use up an extra 4B of storage in that > >>> cacheline. > >>>> > >>>> I agree about the lack of convenience! And Adrien certainly has a > >>> point about PMD temptations. > >>>> > >>>> However, I still don't think that a NICs ability to date-stamp a > >>> packet is sufficient reason to put a date-stamp in cache line 0 of > >>> the mbuf. Storing only the fast moving 32 bit in cache line 0 seems > >>> like a good compromise to me. > >>>> > >>>> Maybe you can find just one more byte, so it can hold 17 minutes > >>>> with nanosecond resolution. (I'm joking!) > >>>> > >>>> Please don't sacrifice the sequence number for the > >>>> seconds/hours/days > >>> part a timestamp. Maybe it could be configurable to use a 32 bit or > >>> 64 bit timestamp. > >>>> > >>> Do you see both timestamp and sequence numbers being used together? > >>> I would have thought that apps would either use one or the other? > >>> However, your suggestion is workable in any case, to allow the > >>> sequence number to overlap just the high 32 bits of the timestamp, > >>> rather than the low. > >> > >> In our case, I can foresee sequence numbers used for packet processing and > timestamps for timing analysis (and possibly for packet capturing, when being > used). For timing analysis, we don?t need long durations, e.g. 4 seconds with > 32 > bit nanosecond resolution suffices. And for packet capturing we are perfectly > capable of adding the slowly moving 32 bit of the timestamp to our output data > stream without fetching it from the mbuf. > >> > > We should keep in mind that today we have the seqn field but it is not used by > any PMD. In case it is implemented, would it be a per-queue sequence number? > Is it useful from an application view? > > This field is only used by the librte_reorder library, and in my opinion, we > should > consider moving it in the second cache line since it is not filled by the PMD. > > > > For the 32-bit timestamp case, it might be useful to have a > > right-shift value passed in to the ethdev driver. If we assume a NIC > > with nanosecond resolution, (or TSC value with resolution of that > > order of magnitude), then the app can choose to have 1 ns resolution > > with 4 second wraparound, or alternatively 4ns resolution with 16 > > second wraparound, or even microsecond resolution with wrap around of over > an hour. > > The cost is obviously just a shift op in the driver code per packet - > > hopefully with multiple packets done at a time using vector operations. > > > About the timestamp, we can manage to find 64 bits in the first cache line, > without sacrifying any field we have today. The question is more for the > fields > we may want to add later. > > To answer to the question of the size of the timestamp, the first question is > to > know what is the precision required for the applications using it?
As part of the 17.02 latency calculation feature, the requirement is to report latency in nanoseconds. So, +1 on keeping timestamp as 64bits. Since the feature is planned for 17.02, can we finalize on the timestamp position in the mbuf struct? Based on the decision, I am planning to make the change and send ABI notice if needed. Reshma > > I don't quite like the idea of splitting the timestamp in the 2 cache lines, > I think it > would not be easy to use. > > > Olivier