Hi Anil,

I see the graph of your provider is showing (and counting) traffic from
the last 31 days. Are you doing sums for that whole timespan? Your query
below does not carry any time reference. Maybe some of the date and time
functions available in MySQL might help:

http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html

Also consider your provider might count port-level traffic, they might
be including L2 headers:

Dst MAC(6b), Src MAC(6b), Length/Type(2b), 802.1q(4b) = 18 bytes

Or even full ethernet frames:

Preamble(7b), Delimiter(1b), Dst MAC(6b), Src MAC(6b), Length/Type(2b),
802.1q(4b), FCS/CRC(4b) = 30 bytes

And non-IP packets (ie. ARPs). All of this can contribute to some skew
in the reported traffic levels. The best is be to ask them in order to
make guesses more educated. Very roughly, considering an i-mix of ~400
bytes, 18 bytes is 4.5% of each packet while 30 bytes is 7.5%. 42GB (the
difference between your accounting and that of your provider) is around
8.9% of 468GB (your provider grand total for the last 31 days). Still
non-IP traffic (ie. ARPs) has to be factored in. Maybe you have a better
idea about your i-mix and can do more precise calculations. 

About the 95th-percentile, you might want to read the following link:

http://en.wikipedia.org/wiki/Burstable_billing

In a nutshell, that is not a constant value and hence you can't simply
multiply that by a time amount and get bandwidth consumption. Quoting
from the link: "Bandwidth is measured (or sampled) from the switch or
router and recorded in a log file. In most cases this is done every 5
minutes. At the end of the month, the samples are sorted from highest
to lowest, and the top 5% (which equal to approximately 36 hours of a
30-day billing cycle) of data is thrown away. The next highest
measurement becomes the 'billable utilization' for the month."

Cheers,
Paolo


On Wed, Apr 29, 2009 at 07:47:44PM -0700, Anil wrote:
> Thanks Paolo.
> 
> I just did another test, and used these queries:
> 
> SELECT sum(bytes) FROM acct WHERE ip_src LIKE '208.64.%' AND ip_dst =
> '0.0.0.0';
> SELECT sum(bytes) FROM acct WHERE ip_src = '0.0.0.0' AND ip_dst LIKE
> '208.64.%';
> 
> That should get all the input+output. The output of those queries (after
> parsing them through python):
> Input: 55Gb
> Output: 370Gb
> Total: 426Gb
> 
> I am not sure exactly how my provider is capturing the data. They provide me
> usage graphs on cacti (see attachment). Note though, they do 95 percentile
> and they also indicate total bandwidth in the graph in Gb.
> 
> Their graph shows:
> 68Gb Input, 400Gb Output
> 
> So, that also seems like there is another 40Gb this past month that is going
> IN/OUT to internal subnet?? The whole subnet is on this one server, so
> anything talking to the outside should be to my provider's router:
> 
> Provider <---> Server (with 208.64 subnet)
> 
> Also, perhaps I am not understanding something here. The graph also shows
> 5mbit in+out with 95%. So, isn't 5mbit constant rate could be like 320gb * 5
> = 1600Gb. However the numbers above show only 430Gb.
> 
> 
> Thanks
> 
> 
> On Wed, Apr 29, 2009 at 1:38 PM, Paolo Lucente <[email protected]> wrote:
> 
> > Hi Anil,
> >
> > In principle what you are doing looks correct: perhaps for a cleaner
> > result i would use filters to separate the traffic, two plugins, two
> > SQL tables - one for incoming the other for outgoing traffic. Anyway,
> > the result of your SQL query is to account for traffic between your
> > servers and the rest of the world. It leaves out traffic internal to
> > your subnets.
> >
> > In saying that your counters don't match with those of your provider,
> > can you be more precise? Some exaples (byte and packet counters, you
> > vs your provider) and more information would help to shed some light;
> > things that pop to my mind:
> >
> > * Which method does your provider use to account data, ie. SNMP, a
> >  SPAN port or rather either sFlow or NetFlow.
> > * How you are comparing things. For example, your SQL query didn't
> >  specify any timeframe (ie. all yesterday traffic from midnight to
> >  midnight). Do you compare daily traffic levels?
> >
> > Cheers,
> > Paolo
> >
> >
> > On Wed, Apr 29, 2009 at 08:03:32AM -0700, Anil wrote:
> > > I've been capturing the bandwidth usage on my server for the past few
> > months
> > > now, and when I correlate it with my network provider's bandwidth graphs,
> > > they don't correlate. I am sure their graphs are right.
> > >
> > > Here is a snapshot of some of the data in the database:
> > >
> > > mysql> select * from acct limit 5;
> > >
> > +---------+---------+---------------+---------------+----------+----------+----------+---------+-----------+---------------------+---------------------+
> > > | mac_src | mac_dst | ip_src        | ip_dst        | src_port | dst_port
> > |
> > > ip_proto | packets | bytes     | stamp_inserted      | stamp_updated
> >   |
> > >
> > +---------+---------+---------------+---------------+----------+----------+----------+---------+-----------+---------------------+---------------------+
> > > |         |         | 199.89.XX.XX | 0.0.0.0       |        0 |        0
> > |
> > > ip       |  138004 | 178895357 | 2009-04-01 09:05:00 | 2009-04-01
> > 09:10:01 |
> > >
> > > |         |         | 0.0.0.0       | 199.89.XX.XX |        0 |        0
> > |
> > > ip       |    2145 |    303722 | 2009-04-01 09:05:00 | 2009-04-01
> > 09:10:01 |
> > >
> > > |         |         | 199.89.XX.XX | 0.0.0.0       |        0 |        0
> > |
> > > ip       |    1950 |    179998 | 2009-04-01 09:05:00 | 2009-04-01
> > 09:10:01 |
> > >
> > > |         |         | 0.0.0.0       | 199.89.XX.XX |        0 |        0
> > |
> > > ip       |   85468 |   7506758 | 2009-04-01 09:05:00 | 2009-04-01
> > 09:10:01 |
> > >
> > > |         |         | 0.0.0.0       | 199.89.XX.XX |        0 |        0
> > |
> > > ip       |    3638 |    845965 | 2009-04-01 09:05:00 | 2009-04-01
> > 09:10:01 |
> > >
> > >
> > +---------+---------+---------------+---------------+----------+----------+----------+---------+-----------+---------------------+---------------------+
> > > 5 rows in set (0.00 sec)
> > >
> > >
> > > r...@vps2:/usr/local/pmacct/etc# cat networks.def
> > > !
> > > ! local networks
> > > !
> > > 208.64.XX.YY/26
> > > 208.64.XX.XX/27
> > >
> > > There is data in the table for the 208.64 subnets, so ignore that it
> > doesn't
> > > show in the SELECT above.
> > >
> > > r...@vps2:/usr/local/pmacct/etc# cat pmacctd.conf
> > > interface: e1000g0
> > > daemonize: true
> > > !aggregate: sum_host
> > > aggregate: src_host,dst_host
> > > plugins: mysql
> > > !plugins: memory
> > > !
> > > sql_host:
> > > sql_db:
> > > sql_table: acct
> > > sql_user:
> > > sql_passwd:
> > > sql_table_version: 1
> > > !
> > > sql_refresh_time: 300
> > > sql_history: 5m
> > > sql_history_roundoff: m
> > > networks_file: /usr/local/pmacct/etc/networks.def
> > >
> > > I do something like this when calculating total bandwidth:
> > >
> > > SELECT sum(bytes) FROM acct WHERE ip_src = '%s' AND ip_dst = '0.0.0.0';
> > > SELECT sum(bytes) FROM acct WHERE ip_src = '0.0.0.0' AND ip_dst = '%s';
> > >
> > > Then, I add both the in+out to capture usage for a specific IP address.
> > >
> > > Is this the right approach? I am wondering if I am doing something wrong
> > > here.
> > >
> > > Thanks,
> > > Anil

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to