[ ... Moved to -net ... ] "James E. Housley" wrote: > I am just trying to count bytes in and out, too keep track of usage and > head off a large overage and a larger bill then necessary. Counting > packets is worthless. But just do the math. With a GigE NIC, at what > data rate do you start overflowing the counters too quickly. I suppose > there is another possibility, that the ti GigE driver is counting the > data multiple times. But I don't think so, because at 200Mbits/sec the > counter should overflow in 172 seconds. And this machine is easily > doing this most of the day.
Do you get billed on retransmits? I'm pretty sure that they are not counted, unless you have a Tigon II, and have rewritten the firmware, or have a non-disclosure with another vendor, a license for their firmware, and have rewritten the firmware. I think the place to count this stuff is at the router. If your router *is* the FreeBSD box, then it makes sense for you to do the counting; but it doesn't make sense for the rest of us to do the counting. Is your problem with packet granularity that it doesn't give you better than an estimate based on an average packet size of the amount of data you send? You can keep a modular counter based on kilobytes (or even on megabytes), where keep an exact byte count at that level of granularity, but don't reflect that granularity out into the counter itself. In other words, make it accurate, but not precise. > That all sounds reasonable. And it make sense to move the counters > under existing locks. But, 32-bit machines are going to be around for > awhile longer and fast network connections are going to get faster and > more common. Maybe the counters should be completely removed from the > 32-bit arch.s since they give such misleading results and only have them > on the 64-bit machines. That way no one will be confused by the data. > > Of course I am not completely serious about removing the counters, but > it is not hard to make them very wrong. The problem with this is that it appears that you have a very specific problem domain that, if fully mapped, will damage the performance for the rest of us: a system in-the-neighborhood-of gigabit throughput, for which every byte is counted against you as part of a cost metric (most people at that level have an optical cable in from a NAP, and really don't care about bytes transited because they are one of the top tier backbone providers). I would be much more comfortable with you slowing you down, and not the rest of us. To my mind, the ability to meter based on this type of metric acts against flat rate pipes, and comes down on the wrong side of the technology wall between the users, who want to buy based on size of water pipe, and the providers, who want to charge based on how much water goes through the pipe, so that they can get their tax on every drop. In other words, if I had my way, it would be technologically impossible to meter based on a metric like this (it's the one merit to a direct ATM interconnect, IMO: inability to even store accounting records fast enough without a supercomputer). "...Of course I am not completely serious about removing the counters..." 8-) Frankly, I have a hard time believing that you really have the problem that you think you have. Specifically, I have worked on Gigabit equipment, and while it's nice and impressive sounding to be able to say "GigaBit!" in an "I've just had my cake frosted!" excited voice, in practice, there's not a real colocation center on the planet that would let you talk out their pipes at anywhere near that rate, and you would be really hard put to find one that could talk fast enough to allow you to pump a fully saturated 100Mbit interface out of your box. I helped put a single Gigabit box in front of three of the top ten porn sites in the U.S., and even while the damn thing was starting up under load and before startup had fully completed, the thing never got over 7% load, and in operation, it ran at around 4% load steady-state, and that's *CPU load* on a 1GHz Pentium, not even network load (which was a hell of a lot less). -- This is getting way off topic, but here is a business case illustration. Are you perhaps doing what the Q/A people at a previous job were doing, and stress-testing the crap out of a machine on a Gigabit LAN, at or near wire speeds, when in the field, the equipment is *NEVER*, *EVER* going to have to handle anywhere near even 1/40th of the load you are placing on it? While it is natural -- even, in some ways, admirable -- for Q/A people to want to test their products to destruction, you are going to manufacture sev-1 bugs where none will exist in deployment at customer sites, and these putative "show stoppers" will cost you in time to market and other areas where you really can't afford to be pissing away time over nothing (e.g. if your sales force gets wind of them, they will lose confidence in the products ability to make your customers happy, when in fact no such problem really exists). While customers are likey to set up a test network like yours, and stress test it, they are unlikely to be able to duplicate your load. In practice, this means anything that can't be repeated with standard test tools (e.g. http_load, etc.) in under 24 hours will not show up, even under their "stress test" scenarios. Customers care about equipment not failing, not about equipment being infallable. For example, if you have a problem that occurs once a day at that level of amplification, in the field it will perhaps occur once every month and a half, assuming that your customer keeps the load up at that level, and the problem is unrelated to resource starvation (e.g. a small memory leak in an uncommon failure mode, where an allocation is not freed, when the machine is under stress). In other words, if someone were to DDOS them for a month and a half at a dual OC3 facility like Exodus or UUNet in San Jose, and they did absolutely nothing to stop or curb the attack, then you might expect to see the problem in the field in a month. If your problem occurs once a week, then that grows to almost a year before the problem is seen in the field, assuming that there are no upgrades or anything else requiring a "bugfix"... so half-life that: you have six months to come up with a fix for it, and push it off as a "security upgrade" -- technically, it is one -- assuming the customer plugs in your box and then forgets it. If they reboot or reconfigure, requiring a reboot, then the clock starts all over again. In fact, this assumes linear amplification: "multiply the data rate by 40, and you multiply the failure rate by 40"; in the real world, this relationship is exponential: something you see at the stress breaking point of your product will be almost impossible to repeat in the field. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message