On Sun, 2009-02-15 at 02:05 +0100, Karsten Bräckelmann wrote: > Lindsay, if you end up doing some benchmarking, please let us know. I > wouldn't be surprised if you're actually the first one to do this across > the Internet. :) > Just a thought. Since getting message sizes and counts on traffic between a client and server isn't the easiest thing to do unless they're already instrumented to collect this information, the best approach may be two pronged:
1) write a Perl or awk script that processes /var/log/maillog.* and gathers message size statistics. The regex 'spamd.*bytes.$' will pick the relevant log lines and the message size is the second to last field. It would counting messages in size bands, e.g. 0-10KB, 10-100KB, 100-1MB, 1MB-250MB, >250MB to get some size and frequency statistics. 2) Pick a message from each band and run it through spamc manually while using Wireshark to capture both spamc-spamd traffic and spamd-MySQL traffic. Combining the message sizes and counts from the two streams should give you enough information to correctly size the traffic flows. ==== Question to developers on this list: Why is a message that exceeds the maximunm size skipped entirely? Is there a case for passing its headers through spamd and then combining the returned headers with the body in spamc? It would give a bit more protection and doesn't look too difficult to do since spamd is already capable of handling just the headers. Martin