RE: Bayes & Apache James server

2011-07-29 Thread Kelson Vibber
> -Original Message- > From: David F. Skoll [mailto:d...@roaringpenguin.com] > > It's probably more efficient to have the thing that would block more mail run > first. On our installation, for example, ClamAV stops less than 0.1% of all > mail > (yes, you read that right), so running it f

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* Walter Hurry : > On Fri, 29 Jul 2011 22:44:14 +0200, Patrick Ben Koetter wrote: > > > * Walter Hurry : > >> On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > >> > >> > Using an asynchronous approach using different databases is > >> > interesting, but as I understand the solution

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > On Fri, 29 Jul 2011 22:41:18 +0200 > Patrick Ben Koetter wrote: > > That's ~230 msg/sec. Ever took it to 500 msg/sec? > > No, we lack the hardware to do that. The 230 msgs/sec rate was > reached by a customer with a lot more money for hardware than we have. :) Isn't that th

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Walter Hurry
On Fri, 29 Jul 2011 22:44:14 +0200, Patrick Ben Koetter wrote: > * Walter Hurry : >> On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: >> >> > Using an asynchronous approach using different databases is >> > interesting, but as I understand the solution discussed addresses >> > read

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:41:18 +0200 Patrick Ben Koetter wrote: > That's where your product an SA differ, right? SA writes more to > PostgreSQL e.g. it also stores Bayes tokens in PostgreSQL. Right. > That's ~230 msg/sec. Ever took it to 500 msg/sec? No, we lack the hardware to do that. The 230

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* Walter Hurry : > On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > > > Using an asynchronous approach using different databases is interesting, > > but as I understand the solution discussed addresses read performace. I > > am interested in write performance. How far could you tak

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > On Fri, 29 Jul 2011 21:56:03 +0200 > Patrick Ben Koetter wrote: > > > I am interested in write performance. How far could > > you take it before PSQL topped out? Any special hardware in use? > > We're not writing very much to PostgreSQL. For each message, we > write a small

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 21:56:03 +0200 Patrick Ben Koetter wrote: > I am interested in write performance. How far could > you take it before PSQL topped out? Any special hardware in use? We're not writing very much to PostgreSQL. For each message, we write a small row containing the internal incide

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Walter Hurry
On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > Using an asynchronous approach using different databases is interesting, > but as I understand the solution discussed addresses read performace. I > am interested in write performance. How far could you take it before > PSQL topped o

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > > Claiming SA "ignores large sites" because it doesn't have a complex > > CDB backend is ridiculous. > > I'm not at all claiming SA ignores large sites. I'm claiming that people > with *your* attitude ("Other 99.9% of user don't really care...") are > ignoring large sites. c

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:35:01 +0300 Henrik K wrote: [...] > Feel free to donate your code for SA and stop the pointless bashing. Um? I'm not "bashing" SA. I think it's a fine piece of work. All I asked is if anyone has made a CDB back-end for SA and I explained why I thought it might be a goo

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Henrik K
On Fri, Jul 29, 2011 at 03:12:40PM -0400, David F. Skoll wrote: > On Fri, 29 Jul 2011 22:02:10 +0300 > Henrik K wrote: > > > Let's be serious. Only people that really need it are the ones with a > > custom high volume distributed spam appliance thing. Other 99.9% of > > users don't really care if

Re: Bayes & Apache James server

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 15:08:34 -0400 Adam Moffett wrote: > I've often mused about which should run first, but never did any sort > of testing. Is it pretty much the general consensus that it's less > wasteful for the AV to scan the spam than to have SA scan the malware? It's probably more effici

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:02:10 +0300 Henrik K wrote: > Let's be serious. Only people that really need it are the ones with a > custom high volume distributed spam appliance thing. Other 99.9% of > users don't really care if Bayes lookups take 100ms or whatever. It's > peanuts compared to other proc

Re: Bayes & Apache James server

2011-07-29 Thread Adam Moffett
On 07/29/2011 02:13 PM, Kelson Vibber wrote: > Also, to complete the system, I recall there were some AV-mailets at the age. If possible use> them before SA to catch message carrying viruses. Absolutely - we've got ClamAV running first, before anything touches SA, and using some of the SaneS

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Henrik K
On Fri, Jul 29, 2011 at 01:00:52PM -0400, David F. Skoll wrote: > > That's why I was wondering if anyone had looked at using CDB with SA's > Bayes module. Let's be serious. Only people that really need it are the ones with a custom high volume distributed spam appliance thing. Other 99.9% of user

RE: Bayes & Apache James server

2011-07-29 Thread Kelson Vibber
> That said, I would suggest to not decouple bayes from SA, since I wouldn't > see any advantage > in this approach and you would rather miss the a bayes score from the SA > totals. You would > end having more FPs due to the bayesian mailer running apart and needing > special score > thresholds

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 12:45:53 -0400 Michael Scheidell wrote: > you need custom code to sync bayes? do expires? or just interesting > entries in local.cf? Ah, I should have mentioned we don't use SpamAssassin's Bayes module. We use our own Bayes implementation. That's why I was wondering if an

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 12:41 PM, David F. Skoll wrote: On Fri, 29 Jul 2011 12:31:01 -0400 Michael Scheidell wrote: ok, but are you using cdb or postgresql for bayes? cdb for the Bayes data; PostgreSQL for the journal table. Regards, David. you need custom code to sync bayes? do expires? or just intere

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 12:31:01 -0400 Michael Scheidell wrote: > ok, but are you using cdb or postgresql for bayes? cdb for the Bayes data; PostgreSQL for the journal table. Regards, David.

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 12:20 PM, David F. Skoll wrote: This INSERT-only operation cannot block under PostgreSQL MVCC. ok, but are you using cdb or postgresql for bayes? -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259 >*| *SECNAP Network Security Corporation * Best Mobile Solutions Product

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:59:14 -0400 Michael Scheidell wrote: > in mysql, we don't journal. what does that journaling time do to SA > processing times? Id hate to think we go from 1 s/email processing > time to 60 seconds or something while journal is locked. Journalling *improves* training spee

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 11:47 AM, David F. Skoll wrote: CDB is*very* fast. If you journal your Bayes training and run the journal every 5-10 minutes, CDB can easily keep up even with a 2GB Bayes database. in mysql, we don't journal. what does that journaling time do to SA processing times? Id hate to thin

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:36:52 -0400 Michael Scheidell wrote: > On 7/29/11 11:33 AM, David F. Skoll wrote: > > Has anyone investigated writing a CDB backend for SpamAssassin's > > Bayes implementation? I'm guessing the need to rewrite the DB each > > time makes it a bit complex. > esp for people

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 11:33 AM, David F. Skoll wrote: Has anyone investigated writing a CDB backend for SpamAssassin's Bayes implementation? I'm guessing the need to rewrite the DB each time makes it a bit complex. esp for people with 2gb db's? -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259

Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:26:57 -0400 Michael Scheidell wrote: > if you use mysql.pm for other things (sql params, user's, etc), it > still doesn't seem to make sense to use sdbm AND mysql. We use PostgreSQL for a number of things, but we found that CDB is much faster than all competitors for Bayes

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Michael Scheidell
Can this really be true? On 7/29/11 5:28 AM, Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can its not just faster than DB, but faster the innodb/mysql.pm? one of the things I like about innodb/mysql.pm i

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread John Hardin
On Fri, 29 Jul 2011, monolit939 wrote: it will be problem, because when I use: sa-learn --backup > /tmp/bayes_export I get: ls -l /tmp/bayes_export -rw-r--r-- 1 root root 77 2011-07-29 15:37 /tmp/bayes_export # the file has just 77B BUT when I use: su mail -c 'sa-learn --backup > /tmp/bayes_exp

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 16:16, monolit939 wrote: Axb wrote: On 2011-07-29 15:50, monolit939 wrote: Axb wrote: On 2011-07-29 15:03, monolit939 wrote: Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increa

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 15:50, monolit939 wrote: >> >> >> Axb wrote: >>> >>> On 2011-07-29 15:03, monolit939 wrote: Axb wrote: > > On 2011-07-29 11:14, monolit939 wrote: >> >> Hello, >> >> I have found test which says the change of type of Spamassassi

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 15:50, monolit939 wrote: Axb wrote: On 2011-07-29 15:03, monolit939 wrote: Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB form

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 15:03, monolit939 wrote: >> >> >> Axb wrote: >>> >>> On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley D

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 15:03, monolit939 wrote: Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has s

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 11:14, monolit939 wrote: >> >> Hello, >> >> I have found test which says the change of type of Spamassassin database >> can >> increase performance almost three times (from Berkeley DB format to SDBM >> format). I want to ask you if somebody has some experience with

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 11:14, monolit939 wrote: >> >> Hello, >> >> I have found test which says the change of type of Spamassassin database >> can >> increase performance almost three times (from Berkeley DB format to SDBM >> format). I want to ask you if somebody has some experience with

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has some experience with conversion of standard Spamassassin

Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has some experience with conversion of standard Spamassassin bayes database. I have found just this

Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has some experience with conversion of standard Spamassassin bayes database. I have found just this

Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has some experience with conversion of standard Spamassassin bayes database. I have found just this

Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has some experience with conversion of standard Spamassassin bayes database. I have found just this

RE: Bayes & Apache James server

2011-07-29 Thread Giampaolo Tomassoni
> From: Kelson Vibber [mailto:k...@tollfreeforwarding.com] > > ...omissis... > > If so, would you recommend: > 1. Sticking with SA's Bayesian filter? > 2. Running SpamAssassin without Bayes, then James' BayesianAnalysis > mailet? > 3. Running James's BayesianAnalysis mailet first, then SpamAssas