Re: [Ntop] another "is this known" question - attempt to update RRD for the same time twice

Rick Jones Thu, 28 Apr 2011 18:09:06 -0700

On Wed, 2011-04-27 at 14:14 -0700, Rick Jones wrote:
> On Mon, 2011-04-11 at 17:08 -0700, Rick Jones wrote:
> > I'm noticing messages such as:
> > 
> > Mon Apr 11 16:13:14 2011  **WARNING** RRD:
> > rrd_update(/usr/local/var/ntop/rrd/interfaces/sFlow-device.2/sFlow/48/ifOutBroadcastPkts.rrd)
> >  error: 
> > /usr/local/var/ntop/rrd/interfaces/sFlow-device.2/sFlow/48/ifOutBroadcastPkts.rrd:
> >  illegal attempt to update using time 1302563594 when last update time is 
> > 1302563594 (minimum one second step)
> > 
> > appearing when ntop is running - they seem to travel in fairly large
> > groups - one entry per statistic per port it seems (I don't have that
> > much scroll-back and ntop is not running as a daemon so it isn't going
> > to syslog (iirc)).
> > 
> > At one level the error message is encouraging - it suggests that one can
> > indeed store to the RRD as often as once a second :) but it is still
> > worrisome regardless.
> 
> As part of learning more about RRD, I have been writing my own little
> "store the sflow counters to an RRD" program, and was noticing similar
> errors there, eventhough I was creating the RRD with a start time of
> even as much as 2 seconds prior to "current" (using --start -2).
> 
> I have a hypothesis as to why I am seeing it in my program but am still
> not familiar enough with the ntop code to say if it is happening there,
> but sFlow counters tend to travel in groups.  If one snaps the arrival
> timestamp of the sFlow PDU when making the recvfrom() call, say to use
> as the timestamp for the update... but uses "current" time in the create
> (even subtracting some seconds from it) by the time one has created five
> or six new RRDs, several seconds can pass and so the sixth or seventh
> can be well beyond the arrival timestamp.  I won't inundate the list
> with the strace, but here is some output from my little program:
> 
> (successful updates are actually silent)
> 
> 2011-04-27-13:42:33.307211: stat_and_create_generic_counter_db creating
> new RRD /tmp/counters/192.168.1.7/0/31/generic.rrd
> 2011-04-27-13:42:33.754664: stat_and_create_generic_counter_db creating
> new RRD /tmp/counters/192.168.1.7/0/33/generic.rrd
> 2011-04-27-13:42:34.335818: stat_and_create_generic_counter_db creating
> new RRD /tmp/counters/192.168.1.7/0/37/generic.rrd
> 2011-04-27-13:42:34.925467: stat_and_create_generic_counter_db creating
> new RRD /tmp/counters/192.168.1.7/0/38/generic.rrd
> 2011-04-27-13:42:35.377701: stat_and_create_generic_counter_db creating
> new RRD /tmp/counters/192.168.1.7/0/40/generic.rrd
> 2011-04-27-13:42:35.916924: stat_and_create_generic_counter_db creating
> new RRD /tmp/counters/192.168.1.7/0/41/generic.rrd
> 2011-04-27-13:42:36.344945: stat_and_create_generic_counter_db creating
> new RRD /tmp/counters/192.168.1.7/0/19/generic.rrd
> 2011-04-27-13:42:36.912633: update_generic_counter_db() rrd_update()
> returned '/tmp/counters/192.168.1.7/0/19/generic.rrd: illegal attempt to
> update using time 1303936953 when last update time is 1303936954
> (minimum one second step)'
> 2011-04-27-13:42:36.912814: update_generic_counter_db failure storing to
> generic counter db line 1512 argc 32 position 3
> 
> I suppose that part of "the answer" is to use the rrcached, but the way
> I've fixed the issue in my little program is to use the PDU's arrival
> timestamp less one second.
> 
> Oh, and one other thing I've noticed - "contemporary" (perhaps for
> versions of contemporary going back several years) allow update
> timestamps to include milliseconds.  While the minimum step remains one
> second, this helps avoid issues when one is using a step of one second,
> and the actual arrival times are fluttering back and forth across the
> one second interval.


And another possible source for those - if the application "goes out to
lunch" and does not read from the socket(s) for a length of time greater
than the sFlow counter sampling interval.  When that happens (such as,
for example stopping program output to the terminal with Ctrl-S :)
multiple sFlow PDUs containing samples for the same entity/entities can
be queued-up in the socket buffer.

There are two ways to address that.  The first, and what I've done in my
own little program, is to enable SO_TIMESTAMP on the socket(s) on which
the sFlow PDUs are being received.  By using the arrival timestamp
generated by the stack instead of one snapped by the program, the PDUs
will not suddenly seem to time compress after the application as
"returned from lunch."

The other option, which is a bit more complicated, involves making use
of the millisecond uptime field of the sFlow PDU header to compute the
time deltas and use them when generating the timestamp given to RRD.  I
suppose if one were particularly clever one could track the rate at
which time was advancing on the sFlow agent and compare that with the
rate at which it was advancing at the collector...

rick jones

_______________________________________________
Ntop mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop

Re: [Ntop] another "is this known" question - attempt to update RRD for the same time twice

Reply via email to