On 04/12/2011, at 11:31 PM, Lawrence Stewart wrote:

> On 12/03/11 12:02, Jung-uk Kim wrote:
>> On Friday 02 December 2011 07:27 pm, Jung-uk Kim wrote:
>>> On Thursday 01 December 2011 11:43 pm, Lawrence Stewart wrote:
>>>> On 12/02/11 03:43, Jung-uk Kim wrote:
>>>>> On Thursday 01 December 2011 10:11 am, Lawrence Stewart wrote:
>>>>>> On 11/30/11 05:09, Jung-uk Kim wrote:
>>>>>>> On Tuesday 29 November 2011 11:13 am, Lawrence Stewart wrote:
> [snip]
>>>>>>>> Here's the first of the patches:
>>>>>>>> 
>>>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf
>>>>>>>> _i nt act abi_10.x.r228130.patch
>>>>>>> 
>>>>>>> I only glanced at it but it looks very close to what I wanted
>>>>>>> to suggest.
>>>>>> 
>>>>>> Final candidate patch is at:
>>>>>> 
>>>>>> http://people.freebsd.org/~lstewart/patches/misc/ffclock_bpf_i
>>>>>> nt act abi_10.x.r228180.patch
>>>>>> 
>>>>>> Assuming it passes the "make tinderbox" build I'm currently
>>>>>> running and no further input is received from interested
>>>>>> parties, I plan to commit it in ~10 hours.
>>>>>> 
>>>>>> Changes since the r228130 patch I sent previously:
>>>>>> 
>>>>>> - The new flags in bpf.h are added unconditionally so that
>>>>>> they can always be referenced at compile time and a decision
>>>>>> made at runtime as to whether a flag will be set or not. Using
>>>>>> one of the new flags when the kernel doesn't have FFCLOCK
>>>>>> compiled in results in the flag being ignored. An app should
>>>>>> check for the existence of the "ffclock" kernel feature or the
>>>>>> "kern.sysclock" sysctl tree before attempting to use the
>>>>>> flags.
>>>>>> 
>>>>>> - This patch will hopefully be MFCed at some point, so I added
>>>>>> a CTASSERT to bpf.c to ensure that the ABI of structs
>>>>>> bpf_hdr32, bpf_hdr and bpf_xhdr remains intact when FFCLOCK is
>>>>>> enabled and the union of a ffcounter and struct
>>>>>> timeval32/timeval/bpf_ts is switched in.
>>>>>> 
>>>>>> - bpf_gettime() more comprehensively covers all the possible
>>>>>> cases of flag combination and does sensible things for each
>>>>>> case (even though some cases are rather silly).
>>>>>> 
>>>>>> - The snippet of code at the beginning of catchpacket() that
>>>>>> was manipulating the struct bintime derived from bpf_gettime()
>>>>>> was gross and has been removed in favour of selecting the
>>>>>> right {get}bin{up}time() function call in bpf_gettime().
>>>>> 
>>>>> I did that to reduce branching.  Since you are introducing more
>>>>> branches, it warrants a function pointer now.
>>> 
>>> There's another reason, BTW.  If mbufs are tagged with timestamps
>>> (where only monotonic timestamps are allowed), they must be
>>> converted within the bpf.c.  I forgot all the details. :-(
> 
> We should document this knowledge in some code comments.
> 
>>>> I see, thanks for the explanation. Could you elaborate a bit more
>>>> about how you would implement the function pointer idea? I'm also
>>>> curious in the !FFCLOCK case just how much overhead having the
>>>> 2-layer nested if/else adds. I'm not an very optimisation savvy
>>>> person, but I'm wondering if it's actually worth micro-optimising
>>>> this code.
>>>> 
>>>> My initial thoughts about your function pointer idea lead to
>>>> adding a function pointer in the bpf_d and setting it to the
>>>> appropriate function to get the timestamp from at bpf_d creation
>>>> or ioctl time. Whilst I like this idea, I can't see how it would
>>>> work given that the various functions involved in time/ffcounter
>>>> stamp generation all have different signatures.
>>>> 
>>>> We could have multiple variants of bpf_gettime() which each call
>>>> the appropriate underlying functions to generate the appropriate
>>>> stamp. Would add quite a lot of code but would reduce the
>>>> overhead of calling bpf_gettime() to an indirect function call +
>>>> the underlying stamp generation function call. This also solves
>>>> the problem of multiple function signatures.
>>> 
>>> Please see my patch:
>>> 
>>> http://people.freebsd.org/~jkim/bpf_ffclock.diff
>> 
>> I booted up the kernel and found it just crashes because of stupid
>> typos.  Now a new patch is uploaded in place.  Sorry.
>> 
>> Anyway, I see no regression nor ABI breakage. :-)
> 
> struct bpf_d being part of the ABI was the main thing I was concerned about, 
> so given that it isn't I like your approach a lot.
> 
> As noted by Julien, this approach does introduce problems with respect to the 
> follow up patch that adds a permanent bh_ffcounter member to the bpf header. 
> I thought the fromclock() API would sufficiently meet the needs of consumers 
> like BPF, but if we were to proceed with something like Jung-uk's proposed 
> patch I don't think that's true anymore.
> 
> We decided to bite the bullet and devise an API that is more compact and can 
> return all appropriate time information from all underlying sysclocks in an 
> efficient manner - something like a more generic version of 
> ffclock_abstime(). Julien and I spent quite some time today nutting out 
> details, and Julien has done a proof of concept patch against my proposed BPF 
> patch which looks good as a starting point for discussion:
> 
> http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228254/sysclock_snap.patch
> 
> It needs a bit more work and should be split into two patches (one 
> introducing the API and the other being the BPF changes).
> 
> With something like this though, the BPF patch becomes radically simplified 
> in the FFCLOCK case. We don't even need a function pointer cached in the 
> bpf_d anymore, but could cache a struct sysclock_snap there instead if we 
> really wanted to minimise overhead in bpf_gettime().
> 
> We'll have a go at refining the patch tomorrow hopefully, but wanted to put 
> this out there for early consideration.
> 
> Cheers,
> Lawrence


Hi Jung-uk (and all),

It took us a bit of time but we have a new patch. There are a lot of changes in 
the patch, and I will start by summarising our motivation for the changes. 
Hopefully I won't forget anything, but if I leave something unclear, don't 
hesitate to ask Lawrence or me.

Starting with the easy bits, we drop the union in the BPF header and followed 
something similar to what Ben and you proposed.

Next, the core motivation was to ensure that the timestamping function be 
called once only per packet captured, while keeping the ability to read both 
clocks and return the time from one of the two to the user. This forced us to 
redesign the timestamping function and one thing leading to another a fair bit 
of the BPF code.

There 3 orthogonal parameters to the timestamping function in the BPF context:
- which clock is used to return the time
- what is the format of the value returned
- and a performance issue (nothing to do with timestamping or timekeeping): 
reading the hardware counter or not

We handle the first one with a new sysclock_getsnapshot() which saves both 
clock parameters when timestamping and potentially reads the hardware counter.
The second one has been covered in previous discussion with the addition of 
BPF_T_FFCOUNTER.

The performance issue created problems. In the current code, the order with 
which the BPF devices are open on one interface impacts the timestamping 
function. Let's consider two BPF devices respectively open with 
BPF_TSTAMP_NORMAL and BPF_TSTAMP_FAST quality. If NORMAL is open after FAST, it 
is at the head of the list, the hardware counter is read, and FAST benefits 
from higher quality timestamps. If FAST is open after NORMAL, the timestamping 
function is called once and return last kernel tick time, then it is called a 
second time for NORMAL, this time accessing the hardware counter.

None of these cases are satisfactory. This lead to inconsistent behaviour based 
on the order the BPF are open. An application can impact the settings of other 
BPF devices and other applications / kernel consumers. The timestamping suffers 
from higher latency and worse, higher variability of the latency, which is bad.

We then took a different approach, which is inspired by your work on BPF. We 
believe the FAST/NORMAL settings should have 2 level of granularity: system 
wide and per interface. The system wide is the default behaviour (for example 
all BPF devices are NORMAL) and the interface one super-seeds the system one. 
This is implemented by the use of new sysctl living in the net.bpf tree.

This enables many scenarios such as:
- system default is FAST to improve performance because the system uses a slow 
timecounter (eg HPET or ACPI).
- the system has 3 interfaces. Interface 2 is put in NORMAL mode (need for 
accuracy but not much traffic), and the last can do hardware timestamping and 
is put in BPF_TSTAMP_EXTERNAL mode.

This then naturally led to store the performance setting 
(none/fast/normal/external) for each interface instead of each BPF descriptor.

Each BPF descriptor still maintains the other 2 settings: the choice of clock 
and the format of the timestamp independently from each other.

We believe this new patch keeps the spirit of the changes you proposed and 
improves it while accomodating the changes required by the feed-forward clock 
and future evolutions with things like IEEE 1588 compatible NICs.

We plan to push this patch as soon as possible. We would really appreciate if 
you could comment on this long email and pach very soon.

The patch can be found here:
http://www.cubinlab.ee.unimelb.edu.au/~jrid/patches/r228429-v4/ffclock_bpf_intactabi.patch

We will adjust the documentation as well once we have reached consensus.

Julien (and Lawrence of course)



_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to