Hello Michael and Wim,
would you excuse me for the long email that will follow.
On Thu, Nov 25, 2004 at 08:52:08PM +1100, Michael Ralston wrote:
> Yeah, pruning the database solves half the problem... The storage... But it
> doesn't solve the io of writing out the records... I wonder how hard it
> would be to make pmacctd sort traffic by quantity and only log the biggest
> traffic. Alternatively I wonder if you could use the command line data fetch
> thingo and sort the data yourself... Again that is going to be cpu intensive
> I suppose... Maybe putting a sorted data structure like a heap into pmacctd
> would help...
Denial of service and pmacct. It's a rather interesting thread: i don't have a
solution in mind yet. But yes, it's an actual issue and i've caught into it too.
So, let me share some thoughts and if we will be able to produce a 'convincing
enough' idea i will be happy to implement it quickly.
First of all, we may have different situations to solve: (a) people that fire up
pmacctd to 'see' what's happening on their network *now*; usually these guys
have
no interest in 'normal' flows transiting through the network; they need live
data (b) people that use pmacctd because of persistent accounting activities on
their networks; usually these guys may have their activities disrupted by
abnormal
conditions because of the sudden explosion of resource needs. They need
historical
data.
(1) 'Write then prune' is a good solution either if much resources are
available or
the denial of service is not 'too denial'. Its cons are intuitive (IO
constraints,
DB performances, etc.), however it has also convincing pros (the good half of
the
work): you have all traffic, then, go on analyzing it with the auxilium of SQL.
(2) Apply some kind of rules when purging the cache of the SQL plugin into the
DB.
For rules i mean 'give me TOP ten guys lurking resources', 'give me only boxes
which
bytes counter exceedes x bytes'. As a preamble, this solution should be not
suitable
for (b). The good half of the solution: you will be able to reduce dramatically
the
number of rows will be pushed into the DB; but do you still have any kind of
control
over which data will end up in your DB ? If i imagine the network populated by
some
big email server, i suspect the answer will be 'no'.
(3) Reduce the number of primitives in place, in particular take out 'src_port',
'dst_port' and 'proto'. Say, take in only 'src_host' and 'dst_host'.
Additionally
filter out foreign (internet) hosts, defining your local networks into a
'networks_file'.
What will be the result ? If the denial of service is either 1-1 or M-1 (one
foreign
host to one local machine, many internet hosts to one local machine) you will
see a
single really pumped counter. If the denial of service is either 1-M or M-M (one
foreign host to many local machines or many foreign hosts to many foreign
machines)
then you will see a number of quasi-equally distributed counters. Having
temporal
breakdown enabled ('sql_history') and having knowledge of your network you
should
be able to classify 'suspected' targets. Once target machines have been
identified,
go on, say, with tcpdump and see what's happening with the auxilium of protocol
dissectors and so on. Summarizing: having much less informations into the DB,
avoids all drawbacks of (1), however the detailed picture will not be in your
hands
and the additional step is, say, use tcpdump.
(4) This point applies only to (b) because it's feasible for historical data
but not
for live data. It's a variance on (1). While purging the cache into the DB, see
how
many queries you are about to produce (don't care if they will be UPDATEs or
INSERTs).
If they exceed a given treshold then push them into a file, optionally trap an
alert
to humans (getting rid of the triggering mechanism already implemented) and
avoid to
push rows into the DB. Then, you may 'asyncronously' contribute data into the
DB when
the environment has been offloaded. Such a solution addresses IO constraints
when a
peak is at your door.
Given this all, my personal answer, my two cents to the discussion, would be
that the
problem hasn't an unique solution because we are not facing an unique problem.
I think
the solution is to search in a mix of the (1), (3) and (4) depending on the
specific
environment.
About protocols: if you enable 'proto' primitive, protocols listed into
'pmacct-data.h'
will get written as nice strings (tcp, icmp, udp, etc.); those not listed there
will
be simply written as numbers.
Cheers,
Paolo