Re: [RFC] BPF timestamping

Bruce Evans Fri, 11 Jun 2010 06:08:49 -0700

On Thu, 10 Jun 2010, Jung-uk Kim wrote:

On Thursday 10 June 2010 05:45 am, Bruce Evans wrote:

On Wed, 9 Jun 2010, Jung-uk Kim wrote:

bpf(4) can only timestamp packets with microtime(9).  I want to
expand it to be able to use different format and resolution.  The
...

This has too many timestamp types, yet not one timestamp type which
is any good except possibly BPF_T_NONE, and not one monotonic
timestamp type.  Only external uses and compatibility require use
of CLOCK_REALTIME.
...

Please note that I am not trying to solve timecounter issues here.
The current BPF timestamping is not too good because of two main
reasons; 1) it is too slow with some timecounter hardware as you have
noted and 2) we have no API to change timestamp resolution, accuracy,
format, offset, or whatever *at all*.


The most common trick for the first problem is using getmicrotime(9)
instead of microtime() if the users don't care much about its
accuracy.  For those people who want to collect as many packets as
possible without spending fortunes, it works pretty well.  However,
suppose you have multiple interfaces.  You want good timestamps from
a slower controller (LAN side) and less accurate timestamps from a
super fast controller (WAN side), but you can't.  My patch solves
this problem by assigning time stamping function per descriptor.  So,
you can use the same resolution but different accuracies, for
example.


I now think you should provide exactly the same timestamping features
as provided to useland by clock_gettime(2), clock_getres(2) and
clock_getaccprecres(2missing), using essentially the same interface
and code.  The userland interface involves clock ids of type clockid_t
with names like CLOCK_REALTIME instead of bpf-specific names and types.
Unfortunately it only supports the timespec format.

The second problem is little bit harder for us without breaking
libpcap and its consumers as it expects struct timeval and nothing
else.  That's why I had to introduce new header format with compat
shims.  In fact, struct bpf_hdr (and struct pcap_sf_pkthdr) is really
obsolete and people have been talking about pcap NG for many years,
which can store timestamps in variable resolutions and offsets.


Does it prefer or support bintimes?

However, we can only use the default resolution even if libpcap gets
the new format because we are stuck with struct bpf_hdr[1].

BTW, I updated my patch, which includes monotonic clocks now.

        BPF_T_MICROTIME_MONOTONIC       microuptime(9)
        BPF_T_NANOTIME_MONOTONIC        nanouptime(9)
        BPF_T_BINTIME_MONOTONIC         binuptime(9)
        BPF_T_MICROTIME_MONOTONIC_FAST  getmicrouptime(9)
        BPF_T_NANOTIME_MONOTONIC_FAST   getnanouptime(9)
        BPF_T_BINTIME_MONOTONIC_FAST    getbinuptime(9)

http://people.freebsd.org/~jkim/bpf_tstamp2.diff

Thanks for the hint, Bruce, although you may say there are more bogus
clock types now. ;-)


Yes, there are far too many, but many are still missing:
- aliases BPF_T_*TIME_PRECISE for BPF_T_*TIME correpsonding to the
  corresponding aliases for clockid_t's.  This gives 18 clock ids
  per timecounter instead of only 12.  clock_gettime() only supports
  6 of these (it doesn't support the micro or bin time formats).
- aliases BPF_T_UPTIME* for BPF_*TIME_MONOTONIC.  This gives 27
  clock ids per timecounter instead of only 18.  clock_gettime()
  only supports 9 of these.
- BPF_T_SECOND corresponding to CLOCK_SECOND.  clock_gettime()
  supports this.
- BPF_T_THREAD_CPUTIME corresponding to CLOCK_THREAD_CPUTIME_ID, but
  without the bogus _ID suffix.  The latter gives the runtime of the current
  thread in nanoseconds.  This might be almost useful for bpf if all the
  packets are stamped by the same kernel or user thread.  Then it would
  function as a packet id with extra info about the time spent processing
  packets.
- BPF_T_VIRTUAL and BPF_T_PROF corresponding to CLOCK_VIRTUAL and
  CLOCK_PROF.  The latter give user and user+sys times for processes.
  They would be about as useful as BPF_T_THREAD_CPUTIME for bpf.
- the total is now 31 for bpf (19 missing) and 13 for clock_gettime().
- multiply this by the number of timecounters.  Non-primary timecounters
  should be available iff something has a use for them.
- raw cputicker timestamps.  CLOCK_THREAD_CPUTIME_ID's timer uses these.
  These are not available in userland.  They are easily available in the
  kernel, by calling cpu_tick().  Scaling them is nontrivial.
- raw timecounter reads.  These are already available in userland via
  sysctlbyname("kern.timecounter.tc.<name>.counter", ...).  Strangely,
  they are hard to call from the kernel.

By using normal clock ids and calling kern_clock_gettime(), you can
avoid lots of duplication (including documentation of the bpf clock
ids) and automatically support new normal clock ids.  However, I
can't see how to implement the following features as efficiently:
- direct scaling to the final precision (kern_clock_gettime() only
  returns timspecs -- see abov)
- delayed scaling to the final precision (bpf seems to make timestamps
  as binuptimes and scale them later)
- avoiding going through layers and switches.  bpf goes through several
  layers and switches now, but perhaps it can go directly to the
  *time() function in kern_tc.c via a single function pointer, where
  kern_clock_gettime() and delayed scaling have to use a switch or
  an indexed function pointer since their clock id is highly variable.

Bruce
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: [RFC] BPF timestamping

Reply via email to