RE: Conversion Spamassassin(bayes) database to SDBM

2011-08-05 Thread Lars Jørgensen
> Hello, thanks for the post. Firstly, you are wrong about performance of my > computer - I dont have supercomputer. I didnt run 10 000 000 messages > through spamc/spamd. In fact the number is 100 000 000 and it means the max. > size of message I run through spamc/spamd(notice that the number is b

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-04 Thread monolit939
Martin Gregorie-2 wrote: > > On Mon, 2011-08-01 at 12:30 -0700, monolit wrote: >> I tried to measure performance of Spamassassin by using SDBM databse, >> because of improvement performance. This site >> http://wiki.apache.org/spamassassin/BayesBenchmarkResults >> BayesBenchmarkResults claims

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-01 Thread Martin Gregorie
On Mon, 2011-08-01 at 12:30 -0700, monolit wrote: > I tried to measure performance of Spamassassin by using SDBM databse, > because of improvement performance. This site > http://wiki.apache.org/spamassassin/BayesBenchmarkResults > BayesBenchmarkResults claims, that by using SDBM database instead

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-01 Thread RW
On Mon, 1 Aug 2011 07:50:14 -0700 (PDT) monolit939 wrote: > 2) stop spamassassin > 3) start spamassassin > 4) start the script > #! /bin/bash > for i in $(ls /path/to/emails); do > spamc -c -s 1000< $i > done > > The results: > real 84m55.472s > user 0m17.145s > sys 0m34.466s >

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-01 Thread monolit
Axb wrote: > > On 2011-08-01 16:50, monolit939 wrote: >> >> >> Axb wrote: >>> >>> On 2011-08-01 9:52, monolit939 wrote: Axb wrote: > > wrong! > > http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt > > see "bayes_path" > > in yo

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-01 Thread Axb
On 2011-08-01 16:50, monolit939 wrote: Axb wrote: On 2011-08-01 9:52, monolit939 wrote: Axb wrote: wrong! http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt see "bayes_path" in your case: bayes_path /var/mail/.spamassassin/bayes Hello, firstly, I have to than

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-01 Thread monolit939
Axb wrote: > > On 2011-08-01 9:52, monolit939 wrote: >> >> >> Axb wrote: >>> >>> wrong! >>> >>> http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt >>> >>> see "bayes_path" >>> >>> in your case: >>> bayes_path /var/mail/.spamassassin/bayes >>> >> >> Hello, >> >> firstly, I h

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-01 Thread Axb
On 2011-08-01 9:52, monolit939 wrote: Axb wrote: wrong! http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt see "bayes_path" in your case: bayes_path /var/mail/.spamassassin/bayes Hello, firstly, I have to thank for your advices. I added bayes_path /var/mail/.spama

Re: Conversion Spamassassin(bayes) database to SDBM

2011-08-01 Thread monolit939
Axb wrote: > > wrong! > > http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.txt > > see "bayes_path" > > in your case: > bayes_path /var/mail/.spamassassin/bayes > Hello, firstly, I have to thank for your advices. I added bayes_path /var/mail/.spamassassin/bayes to loca

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-31 Thread monolit939
John Hardin wrote: > > On Fri, 29 Jul 2011, monolit939 wrote: > >> it will be problem, because when I use: >> sa-learn --backup > /tmp/bayes_export >> I get: >> ls -l /tmp/bayes_export >> -rw-r--r-- 1 root root 77 2011-07-29 15:37 /tmp/bayes_export # the file >> has >> just 77B >> >> BUT when

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* Walter Hurry : > On Fri, 29 Jul 2011 22:44:14 +0200, Patrick Ben Koetter wrote: > > > * Walter Hurry : > >> On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > >> > >> > Using an asynchronous approach using different databases is > >> > interesting, but as I understand the solution

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > On Fri, 29 Jul 2011 22:41:18 +0200 > Patrick Ben Koetter wrote: > > That's ~230 msg/sec. Ever took it to 500 msg/sec? > > No, we lack the hardware to do that. The 230 msgs/sec rate was > reached by a customer with a lot more money for hardware than we have. :) Isn't that th

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Walter Hurry
On Fri, 29 Jul 2011 22:44:14 +0200, Patrick Ben Koetter wrote: > * Walter Hurry : >> On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: >> >> > Using an asynchronous approach using different databases is >> > interesting, but as I understand the solution discussed addresses >> > read

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:41:18 +0200 Patrick Ben Koetter wrote: > That's where your product an SA differ, right? SA writes more to > PostgreSQL e.g. it also stores Bayes tokens in PostgreSQL. Right. > That's ~230 msg/sec. Ever took it to 500 msg/sec? No, we lack the hardware to do that. The 230

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* Walter Hurry : > On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > > > Using an asynchronous approach using different databases is interesting, > > but as I understand the solution discussed addresses read performace. I > > am interested in write performance. How far could you tak

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > On Fri, 29 Jul 2011 21:56:03 +0200 > Patrick Ben Koetter wrote: > > > I am interested in write performance. How far could > > you take it before PSQL topped out? Any special hardware in use? > > We're not writing very much to PostgreSQL. For each message, we > write a small

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 21:56:03 +0200 Patrick Ben Koetter wrote: > I am interested in write performance. How far could > you take it before PSQL topped out? Any special hardware in use? We're not writing very much to PostgreSQL. For each message, we write a small row containing the internal incide

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Walter Hurry
On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > Using an asynchronous approach using different databases is interesting, > but as I understand the solution discussed addresses read performace. I > am interested in write performance. How far could you take it before > PSQL topped o

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > > Claiming SA "ignores large sites" because it doesn't have a complex > > CDB backend is ridiculous. > > I'm not at all claiming SA ignores large sites. I'm claiming that people > with *your* attitude ("Other 99.9% of user don't really care...") are > ignoring large sites. c

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:35:01 +0300 Henrik K wrote: [...] > Feel free to donate your code for SA and stop the pointless bashing. Um? I'm not "bashing" SA. I think it's a fine piece of work. All I asked is if anyone has made a CDB back-end for SA and I explained why I thought it might be a goo

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Henrik K
On Fri, Jul 29, 2011 at 03:12:40PM -0400, David F. Skoll wrote: > On Fri, 29 Jul 2011 22:02:10 +0300 > Henrik K wrote: > > > Let's be serious. Only people that really need it are the ones with a > > custom high volume distributed spam appliance thing. Other 99.9% of > > users don't really care if

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:02:10 +0300 Henrik K wrote: > Let's be serious. Only people that really need it are the ones with a > custom high volume distributed spam appliance thing. Other 99.9% of > users don't really care if Bayes lookups take 100ms or whatever. It's > peanuts compared to other proc

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Henrik K
On Fri, Jul 29, 2011 at 01:00:52PM -0400, David F. Skoll wrote: > > That's why I was wondering if anyone had looked at using CDB with SA's > Bayes module. Let's be serious. Only people that really need it are the ones with a custom high volume distributed spam appliance thing. Other 99.9% of user

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 12:45:53 -0400 Michael Scheidell wrote: > you need custom code to sync bayes? do expires? or just interesting > entries in local.cf? Ah, I should have mentioned we don't use SpamAssassin's Bayes module. We use our own Bayes implementation. That's why I was wondering if an

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 12:41 PM, David F. Skoll wrote: On Fri, 29 Jul 2011 12:31:01 -0400 Michael Scheidell wrote: ok, but are you using cdb or postgresql for bayes? cdb for the Bayes data; PostgreSQL for the journal table. Regards, David. you need custom code to sync bayes? do expires? or just intere

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 12:31:01 -0400 Michael Scheidell wrote: > ok, but are you using cdb or postgresql for bayes? cdb for the Bayes data; PostgreSQL for the journal table. Regards, David.

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 12:20 PM, David F. Skoll wrote: This INSERT-only operation cannot block under PostgreSQL MVCC. ok, but are you using cdb or postgresql for bayes? -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259 >*| *SECNAP Network Security Corporation * Best Mobile Solutions Product

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:59:14 -0400 Michael Scheidell wrote: > in mysql, we don't journal. what does that journaling time do to SA > processing times? Id hate to think we go from 1 s/email processing > time to 60 seconds or something while journal is locked. Journalling *improves* training spee

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 11:47 AM, David F. Skoll wrote: CDB is*very* fast. If you journal your Bayes training and run the journal every 5-10 minutes, CDB can easily keep up even with a 2GB Bayes database. in mysql, we don't journal. what does that journaling time do to SA processing times? Id hate to thin

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:36:52 -0400 Michael Scheidell wrote: > On 7/29/11 11:33 AM, David F. Skoll wrote: > > Has anyone investigated writing a CDB backend for SpamAssassin's > > Bayes implementation? I'm guessing the need to rewrite the DB each > > time makes it a bit complex. > esp for people

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 11:33 AM, David F. Skoll wrote: Has anyone investigated writing a CDB backend for SpamAssassin's Bayes implementation? I'm guessing the need to rewrite the DB each time makes it a bit complex. esp for people with 2gb db's? -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259

Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:26:57 -0400 Michael Scheidell wrote: > if you use mysql.pm for other things (sql params, user's, etc), it > still doesn't seem to make sense to use sdbm AND mysql. We use PostgreSQL for a number of things, but we found that CDB is much faster than all competitors for Bayes

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Michael Scheidell
Can this really be true? On 7/29/11 5:28 AM, Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can its not just faster than DB, but faster the innodb/mysql.pm? one of the things I like about innodb/mysql.pm i

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread John Hardin
On Fri, 29 Jul 2011, monolit939 wrote: it will be problem, because when I use: sa-learn --backup > /tmp/bayes_export I get: ls -l /tmp/bayes_export -rw-r--r-- 1 root root 77 2011-07-29 15:37 /tmp/bayes_export # the file has just 77B BUT when I use: su mail -c 'sa-learn --backup > /tmp/bayes_exp

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 16:16, monolit939 wrote: Axb wrote: On 2011-07-29 15:50, monolit939 wrote: Axb wrote: On 2011-07-29 15:03, monolit939 wrote: Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increa

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 15:50, monolit939 wrote: >> >> >> Axb wrote: >>> >>> On 2011-07-29 15:03, monolit939 wrote: Axb wrote: > > On 2011-07-29 11:14, monolit939 wrote: >> >> Hello, >> >> I have found test which says the change of type of Spamassassi

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 15:50, monolit939 wrote: Axb wrote: On 2011-07-29 15:03, monolit939 wrote: Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB form

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 15:03, monolit939 wrote: >> >> >> Axb wrote: >>> >>> On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley D

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 15:03, monolit939 wrote: Axb wrote: On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has s

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 11:14, monolit939 wrote: >> >> Hello, >> >> I have found test which says the change of type of Spamassassin database >> can >> increase performance almost three times (from Berkeley DB format to SDBM >> format). I want to ask you if somebody has some experience with

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread monolit939
Axb wrote: > > On 2011-07-29 11:14, monolit939 wrote: >> >> Hello, >> >> I have found test which says the change of type of Spamassassin database >> can >> increase performance almost three times (from Berkeley DB format to SDBM >> format). I want to ask you if somebody has some experience with

Re: Conversion Spamassassin(bayes) database to SDBM

2011-07-29 Thread Axb
On 2011-07-29 11:14, monolit939 wrote: Hello, I have found test which says the change of type of Spamassassin database can increase performance almost three times (from Berkeley DB format to SDBM format). I want to ask you if somebody has some experience with conversion of standard Spamassassin