On Mon, Nov 7, 2011 at 9:24 AM, Ben Pfaff <b...@nicira.com> wrote:
> On Sun, Nov 06, 2011 at 09:56:10PM -0800, Jesse Gross wrote:
>> On Fri, Nov 4, 2011 at 4:43 PM, Ben Pfaff <b...@nicira.com> wrote:
>> > NetFlow active timeouts were only mixed in with flow expiration for
>> > convenience: both processes need to iterate all the facets. ??But
>> > an upcoming commit will change flow expiration to work in terms of
>> > a new "subfacet" entity, so they will no longer fit together well.
>> >
>> > This change could be seen as an optimization, since NetFlow active
>> > timeouts don't ordinarily have to run as often as flow expiration,
>> > especially when the flow expiration rate is stepped up due to a
>> > large volume of flows.
>>
>> This has a pretty significant effect on the accuracy of the timeouts
>> that I'm not sure is intended.  Currently, active timeouts are done on
>> a per-flow basis starting from time of first use.  However, this
>> essentially starts a per-bridge timer on first configuration that must
>> first expire in order to check the per-flow timer.  So with the
>> default timeout of 10 minutes, the first active timeout will occur
>> somewhere between 10 and 20 minutes after first use.  This only
>> happens for the first one though since they will tend to synchronize.
>> However, I think that there is a potential for the two timers to
>> desynchronize, resulting in apparently random doubling of intervals.
>> For example, netflow_run() is also called from gen_netflow_rec() when
>> it fills up a packet but does not check the return code, skipping the
>> active timeout if a timer tick occurred in that window.  Finally, the
>> current active timeout code distributes reporting over a large span of
>> time but this concentrates all of them at once, which could cause a
>> load spike in the collector if a number of switches are brought up at
>> the same time.
>
> Hmm.
>
> Maybe I should just do NetFlow reporting once a second (as it was
> before).  What do you think?

I think either that or actually tracking when the next timeout will
occur are the only real solutions.  However, I think the only
efficient way to do correct timeouts is to again combine this with the
flow expiration code, which gets us back to where we were before.
When you say do reporting once a second do you mean essentially the
same as in this patch but use 1 second instead of the active timeout
interval or go back to the original version?
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to