Re: [pmacct-discussion] Flexible aggregation

Paolo Lucente Sat, 13 Jun 2009 04:58:08 -0700

Hi Chris,

On Sat, Jun 13, 2009 at 02:08:56PM +0300, Chris Wilson wrote:


> I guess so; I was thinking that Aguri seems to store its output in text 
> files rather than a database, and perhaps provides more dynamic/automatic 
> filtering, but seems to be a research project and not highly supported or 
> maintained.

It seems it does this summary of summaries; such aggregate summaries
seems to be produced upon receipt by the application of an HUP signal.
It sounds like the application saves down everything to its maximum
resolution, say with all primitives enabled, but then as a "frontend"
feature (presenting statistics) is able to aggregate it. 

> We are using this feature to filter out small flows, but the problem is 
> that they are not accounted for at all, so the database contents e.g. 
> SUM(bytes) no longer reflect the interface totals.
> 
> What I would ideally like to see, but I realise that it's hard is 
> something like this:
> 
> Initial filter selects flows over a certain size and non-selected flows 
> can either be discarded (as now) or reaggregated by zeroing a selected 
> feature, e.g. the destination port, and combined into a new single record 
> if there is more than one of them. These, more highly aggregated records 
> then continue down the preprocess chain, and if they fail to match a later 
> condition then they can be aggregated again in a different way, e.g. by 
> zeroing the destination IP address, and so on, until we end up with a 
> single record where all the features were aggregated.
> 
> For example, sql_preprocess might look something like this:
> 
> minb = 10000, zero_dstip, minb = 10000, zero_dstport, minb = 10000, 
> zero_srcport, minb = 10000, zero_srcip
> 
> Then any flows which together do not add up to enough bytes to pass the 
> minb filters, even after aggregation, end up in a record where all the 
> selector fields are zeroed out. Since there is no final minb condition, 
> this row would always be added to the database, never rejected, so 
> SUM(bytes) would again equal the interface counters for any given time 
> range.

I explored this valid approach some time ago (years!); by zeroing some
aggregation primitives previously selected, duplicates are likely to be
created. The trick is to "resolve" such duplicates before offering them
to the SQL database - via a sub-aggregation operation. The cache is not
sorted - making any sub-aggregation operation very expensive (scaling
linearly with the number of aggregated being offered); the idea here is
to index the cache, perform the sub-aggregation and offer the result of
this to the SQL database. 

In summary, it's not something quick to do but it can be done - maybe
something good for inclusion within the 0.12 trunk later in the year. 
At this stage, this feature can't be included in the first pre-release
version (0.12.0p1) but I can plan it along the rocky way to the first
official release, 0.12.0. Maybe already in 0.12.0p2. How does it sound?

Let me spend a couple of words on a different aspect: the above approach
implies everything ends in the same SQL table - which can have pros and
cons; the pro is simplicity (one table for everything); the con is that
might want to have sub-aggregated data clearly separated into a different
table to, say, apply different policies. This is something can be done
today with pmacct as 'sql_preprocess' offers also the "max" version of
the "min" features you are using. It means having, for example, two SQL
plugins, writing to different SQL tables, aggregating data differently
and using complementary sql_preprocess features (so that at the end by
summing data in both tables one ends with the full picture). Would this
be a feasible approach to you?

Cheers,
Paolo

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] Flexible aggregation

Reply via email to