Comments inline. Med venlig hilsen / kind regards - Morten Br?rup
> -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz > Sent: Tuesday, October 25, 2016 2:49 PM > To: Bruce Richardson; Morten Br?rup > Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Oleg Kuporosov > Subject: Re: [dpdk-dev] mbuf changes > > > > On 10/25/2016 02:45 PM, Bruce Richardson wrote: > > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Br?rup wrote: > >> Comments at the end. > >> > >> Med venlig hilsen / kind regards > >> - Morten Br?rup > >> > >>> -----Original Message----- > >>> From: Bruce Richardson [mailto:bruce.richardson at intel.com] > >>> Sent: Tuesday, October 25, 2016 2:20 PM > >>> To: Morten Br?rup > >>> Cc: Adrien Mazarguil; Wiles, Keith; dev at dpdk.org; Olivier Matz; > Oleg > >>> Kuporosov > >>> Subject: Re: [dpdk-dev] mbuf changes > >>> > >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Br?rup wrote: > >>>> Comments inline. > >>>> > >>>>> -----Original Message----- > >>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce > >>>>> Richardson > >>>>> Sent: Tuesday, October 25, 2016 1:14 PM > >>>>> To: Adrien Mazarguil > >>>>> Cc: Morten Br?rup; Wiles, Keith; dev at dpdk.org; Olivier Matz; Oleg > >>>>> Kuporosov > >>>>> Subject: Re: [dpdk-dev] mbuf changes > >>>>> > >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote: > >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Br?rup wrote: > >>>>>>> Comments inline. > >>>>>>> > >>>>>>> Med venlig hilsen / kind regards > >>>>>>> - Morten Br?rup > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com] > >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM > >>>>>>>> To: Bruce Richardson > >>>>>>>> Cc: Wiles, Keith; Morten Br?rup; dev at dpdk.org; Olivier Matz; > >>>>>>>> Oleg Kuporosov > >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes > >>>>>>>> > >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson > >>> wrote: > >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith > >>> wrote: > >>>>>>>> [...] > >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Br?rup > >>>>>>>> <mb at smartsharesystems.com> wrote: > >>>>>>>> [...] > >>>> > >>>>>>>>> One other point I'll mention is that we need to have a > >>>>>>>>> discussion on how/where to add in a timestamp value into > >>> the > >>>>>>>>> mbuf. Personally, I think it can be in a union with the > >>>>> sequence > >>>>>>>>> number value, but I also suspect that 32-bits of a > >>> timestamp > >>>>>>>>> is not going to be enough for > >>>>>>>> many. > >>>>>>>>> > >>>>>>>>> Thoughts? > >>>>>>>> > >>>>>>>> If we consider that timestamp representation should use > >>>>> nanosecond > >>>>>>>> granularity, a 32-bit value may likely wrap around too > >>> quickly > >>>>>>>> to be useful. We can also assume that applications requesting > >>>>>>>> timestamps may care more about latency than throughput, Oleg > >>>>> found > >>>>>>>> that using the second cache line for this purpose had a > >>>>> noticeable impact [1]. > >>>>>>>> > >>>>>>>> [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html > >>>>>>> > >>>>>>> I agree with Oleg about the latency vs. throughput importance > >>>>>>> for > >>>>> such applications. > >>>>>>> > >>>>>>> If you need high resolution timestamps, consider them to be > >>>>> generated by the NIC RX driver, possibly by the hardware itself > >>>>> (http://w3new.napatech.com/features/time-precision/hardware-time- > >>>>> stamp), so the timestamp belongs in the first cache line. And I > am > >>>>> proposing that it should have the highest possible accuracy, > which > >>>>> makes the value hardware dependent. > >>>>>>> > >>>>>>> Furthermore, I am arguing that we leave it up to the > >>> application > >>>>>>> to > >>>>> keep track of the slowly moving bits (i.e. counting whole > seconds, > >>>>> hours and calendar date) out of band, so we don't use precious > >>> space > >>>>> in the mbuf. The application doesn't need the NIC RX driver's > fast > >>>>> path to capture which date (or even which second) a packet was > >>>>> received. Yes, it adds complexity to the application, but we > can't > >>>>> set aside 64 bit for a generic timestamp. Or as a weird tradeoff: > >>>>> Put the fast moving 32 bit in the first cache line and the slow > >>>>> moving 32 bit in the second cache line, as a placeholder for the > >>> application to fill out if needed. > >>>>> Yes, it means that the application needs to check the time and > >>>>> update its variable holding the slow moving time once every > second > >>>>> or so; but that should be doable without significant effort. > >>>>>> > >>>>>> That's a good point, however without a 64 bit value, elapsed > time > >>>>>> between two arbitrary mbufs cannot be measured reliably due to > >>> not > >>>>>> enough context, one way or another the low resolution value is > >>>>>> also > >>>>> needed. > >>>>>> > >>>>>> Obviously latency-sensitive applications are unlikely to perform > >>>>>> lengthy buffering and require this but I'm not sure about all > the > >>>>>> possible use-cases. Considering many NICs expose 64 bit > >>> timestaps, > >>>>>> I suggest we do not truncate them. > >>>>>> > >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be tempted > >>>>>> to fill the extra 32 bits whenever they can and negate the > >>>>>> performance improvement of the first cache line. > >>>>> > >>>>> I would tend to agree, and I don't really see any convenient way > >>>>> to avoid putting in a 64-bit field for the timestamp in cache- > line 0. > >>>>> If we are ok with having this overlap/partially overlap with > >>>>> sequence number, it will use up an extra 4B of storage in that > >>> cacheline. > >>>> > >>>> I agree about the lack of convenience! And Adrien certainly has a > >>> point about PMD temptations. > >>>> > >>>> However, I still don't think that a NICs ability to date-stamp a > >>> packet is sufficient reason to put a date-stamp in cache line 0 of > >>> the mbuf. Storing only the fast moving 32 bit in cache line 0 seems > >>> like a good compromise to me. > >>>> > >>>> Maybe you can find just one more byte, so it can hold 17 minutes > >>>> with nanosecond resolution. (I'm joking!) > >>>> > >>>> Please don't sacrifice the sequence number for the > >>>> seconds/hours/days > >>> part a timestamp. Maybe it could be configurable to use a 32 bit or > >>> 64 bit timestamp. > >>>> > >>> Do you see both timestamp and sequence numbers being used together? > >>> I would have thought that apps would either use one or the other? > >>> However, your suggestion is workable in any case, to allow the > >>> sequence number to overlap just the high 32 bits of the timestamp, > >>> rather than the low. > >> > >> In our case, I can foresee sequence numbers used for packet > processing and timestamps for timing analysis (and possibly for packet > capturing, when being used). For timing analysis, we don?t need long > durations, e.g. 4 seconds with 32 bit nanosecond resolution suffices. > And for packet capturing we are perfectly capable of adding the slowly > moving 32 bit of the timestamp to our output data stream without > fetching it from the mbuf. > >> > > We should keep in mind that today we have the seqn field but it is not > used by any PMD. In case it is implemented, would it be a per-queue > sequence number? Is it useful from an application view? > > This field is only used by the librte_reorder library, and in my > opinion, we should consider moving it in the second cache line since it > is not filled by the PMD. Our DPDK project is not yet sufficiently progressed to be able to provide quality feedback about this. > > For the 32-bit timestamp case, it might be useful to have a > > right-shift value passed in to the ethdev driver. If we assume a NIC > > with nanosecond resolution, (or TSC value with resolution of that > > order of magnitude), then the app can choose to have 1 ns resolution > > with 4 second wraparound, or alternatively 4ns resolution with 16 > > second wraparound, or even microsecond resolution with wrap around of > over an hour. > > The cost is obviously just a shift op in the driver code per packet - > > hopefully with multiple packets done at a time using vector > operations. > > > About the timestamp, we can manage to find 64 bits in the first cache > line, without sacrifying any field we have today. The question is more > for the fields we may want to add later. > > To answer to the question of the size of the timestamp, the first > question is to know what is the precision required for the applications > using it? In our application, microsecond resolution probably suffices, although we do use nanoseconds today. And a wrapping counter is fine for us, so we don?t need the slow moving bits. For dedicated packet capture applications, please refer to e.g. Napatech's reasoning: http://w3new.napatech.com/features/time-precision/hardware-time-stamp > I don't quite like the idea of splitting the timestamp in the 2 cache > lines, I think it would not be easy to use. > > > Olivier