Hi Chris, On Sat, Jun 13, 2009 at 02:08:56PM +0300, Chris Wilson wrote:
> I guess so; I was thinking that Aguri seems to store its output in text > files rather than a database, and perhaps provides more dynamic/automatic > filtering, but seems to be a research project and not highly supported or > maintained. It seems it does this summary of summaries; such aggregate summaries seems to be produced upon receipt by the application of an HUP signal. It sounds like the application saves down everything to its maximum resolution, say with all primitives enabled, but then as a "frontend" feature (presenting statistics) is able to aggregate it. > We are using this feature to filter out small flows, but the problem is > that they are not accounted for at all, so the database contents e.g. > SUM(bytes) no longer reflect the interface totals. > > What I would ideally like to see, but I realise that it's hard is > something like this: > > Initial filter selects flows over a certain size and non-selected flows > can either be discarded (as now) or reaggregated by zeroing a selected > feature, e.g. the destination port, and combined into a new single record > if there is more than one of them. These, more highly aggregated records > then continue down the preprocess chain, and if they fail to match a later > condition then they can be aggregated again in a different way, e.g. by > zeroing the destination IP address, and so on, until we end up with a > single record where all the features were aggregated. > > For example, sql_preprocess might look something like this: > > minb = 10000, zero_dstip, minb = 10000, zero_dstport, minb = 10000, > zero_srcport, minb = 10000, zero_srcip > > Then any flows which together do not add up to enough bytes to pass the > minb filters, even after aggregation, end up in a record where all the > selector fields are zeroed out. Since there is no final minb condition, > this row would always be added to the database, never rejected, so > SUM(bytes) would again equal the interface counters for any given time > range. I explored this valid approach some time ago (years!); by zeroing some aggregation primitives previously selected, duplicates are likely to be created. The trick is to "resolve" such duplicates before offering them to the SQL database - via a sub-aggregation operation. The cache is not sorted - making any sub-aggregation operation very expensive (scaling linearly with the number of aggregated being offered); the idea here is to index the cache, perform the sub-aggregation and offer the result of this to the SQL database. In summary, it's not something quick to do but it can be done - maybe something good for inclusion within the 0.12 trunk later in the year. At this stage, this feature can't be included in the first pre-release version (0.12.0p1) but I can plan it along the rocky way to the first official release, 0.12.0. Maybe already in 0.12.0p2. How does it sound? Let me spend a couple of words on a different aspect: the above approach implies everything ends in the same SQL table - which can have pros and cons; the pro is simplicity (one table for everything); the con is that might want to have sub-aggregated data clearly separated into a different table to, say, apply different policies. This is something can be done today with pmacct as 'sql_preprocess' offers also the "max" version of the "min" features you are using. It means having, for example, two SQL plugins, writing to different SQL tables, aggregating data differently and using complementary sql_preprocess features (so that at the end by summing data in both tables one ends with the full picture). Would this be a feasible approach to you? Cheers, Paolo _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
