On Wed, May 30, 2012 at 07:28:49PM -0700, Ethan Jackson wrote: > Looks good, only minor comments. > > > + * We use a variation on the "Space-Saving" algorithm in Metwally et al., > > + * "Efficient Computation of Frequent and Top-k Elements in Data Streams", > > ACM > > + * Transactions on Database Systems 31:3 (2006). This algorithm yields > > + * perfectly accurate results when the data stream's unique values (in this > > + * case, port numbers) fit into our data structure, and degrades gracefully > > + * even for challenging distributions (e.g. Zipf). > > My reading of the paper, which may be erroneous, lead me to believe > that they did quite well with Zipfian distributions. Their worst case > is perfectly random distributions.
Yes, I call Zipf distributions challenging because a naive algorithm would find them challenging. The "space-saving" algorithm does well with them. > > +enum { N_SKETCHES = 8 }; /* Number of elements per channel. */ > > It's hard to say, but I wonder if this value is a bit low. I could > imagine situations in which we overflow the sketches and basically > have no useful information. That said, this is a good conservative > value from a performance perspective, we can always implement the full > space savings data structure later if we want to increase the number > of counters to 256 or something. It's a pretty low value, but it's enough to allow us to estimate the top sources of packets in any given channel, which is the goal. 256 would make sense if we were goal were to log for later data-mining type analysis. 8 seems more reasonable for the kind of "rule of thumb" manual analysis that I actually do in practice when someone reports problems. "What's the top source of packets?" is the question I ask, followed by "Any other sources with a comparable quantity?" and the logging here should answer both questions nicely. > > +/* Logs information about a packet that was recently lost in 'ch' (in > > + * 'dpif_'). */ > > +static void > > +report_loss(struct dpif *dpif_, struct dpif_channel *ch) > > I think it would be quite useful to expose this information in an > appctl command. May also be useful to automatically log this every > once in a while even if we aren't experiencing problems. Neither of > these suggestions should block merging of this patch, just a thought. I agree with the first, not sure about the latter. > > + VLOG_ERR("%s: lost packet with hash %d%s", > > + dpif_name(dpif_), ch - dpif->channels, ds_cstr(&s)); > > I don't fully understand this log message. Why is ch - dpif->channels > called a hash? I don't see how that information is useful either. I > would think just the dpif_name and the dynamic-string would be > sufficient. ch - dpif->channels is a hash of the port number because the channels are assigned as a function of the port number. Would s/with hash/in channel/ make it clearer? > > +} > > + > > Redundant newline. I'll fix that. Thanks, Ben. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev