Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? Probably related to fix for bug 7191

2017-11-07 Thread David Gessel
Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? NYTProf results TxRep.pm 1720440 vs 1651114 From: RW To: users@spamassassin.apache.org Date: Tue Nov 07 2017 03:44:50 GMT+0300 (AST

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-07 Thread Matus UHLAR - fantomas
On 04.11.17 16:09, David Gessel wrote: so days later, still chunking away, not making much progress. 1. did you enable bayes_learn_to_journal? 2. do you still run multiple sa-learn jobs in parallel? 3. do you still feed thousands of spam messages to it? there is possibility of storing bayes da

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? NYTProf results TxRep.pm 1720440 vs 1651114

2017-11-06 Thread RW
On Tue, 7 Nov 2017 00:59:12 +0300 David Gessel wrote: > FreeBSD is currently installing TxRep.pm rev 1651114 from Jan 12 > 15:17:46 2015 (it is the only revision that has only whitespace > differences, all leading padding, there are code differences between > installed and 1650327 (previous) and 1

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? NYTProf results TxRep.pm 1720440 vs 1651114

2017-11-06 Thread David Gessel
continue through 1720440. However, 1720440 seems to cause massive performance issues. == With FreeBSD installed TxRep.pm (1651114) == # sa-learn --clear # sa-learn --dump ERROR: Bayes dump returned an error, please re-run with -D for more information (folder has 236 messages) # perl -T -d:NYTProf

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-04 Thread David Gessel
Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: David Jones To: users@spamassassin.apache.org Date: Sat Nov 04 2017 16:35:02 GMT+0300 (AST) > On 11/04/2017 08:09 AM, Da

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-04 Thread David Gessel
Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: Matus UHLAR - fantomas To: users@spamassassin.apache.org Date: Tue Oct 31 2017 23:05:23 GMT+0300 (AST) > dovecot's antspa

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-04 Thread David Jones
there or not there a few times over various versions and so may be slightly meaningful to something) 0021 # use bytes; I'm not sufficiently perl savvy to have any idea whether that's useful to my performance issues or not, but it an easy enough mod to try. Any thoughts? Can you s

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-04 Thread David Gessel
times over various versions and so may be slightly meaningful to something) 0021 # use bytes; I'm not sufficiently perl savvy to have any idea whether that's useful to my performance issues or not, but it an easy enough mod to try. Any thoughts? -David Original Message

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-01 Thread David Gessel
Oh, I wiped the bayes data and started over already once, it isn't (or shouldn't be) that big a deal. Disk performance: seems OK to me. # diskinfo -t /dev/aacd0 /dev/aacd0 512 # sectorsize 73295462400 # mediasize in bytes (68G)

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-01 Thread David Jones
On 11/01/2017 04:40 PM, David Gessel wrote: Bill, Thanks for the advice. I'm not too worried about the permissions config, though I will make the mods once I get performance up to the point where bayes is usable at all - I wouldn't want to lose all those sweet, sweet token

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-01 Thread David Gessel
Bill, Thanks for the advice. I'm not too worried about the permissions config, though I will make the mods once I get performance up to the point where bayes is usable at all - I wouldn't want to lose all those sweet, sweet tokens to some unauthorized write premissio

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-01 Thread RW
> way, way below what it should be. A hint that suggests it isn't any > sort of processing performance issue is that CPU load barely > registers for perl/sa-learn. First set bayes_auto_expire 0 This a good idea anyway as auto-expiry can cause problems during scanning. Running sa

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-01 Thread gessel
k results suggest is within the range of normal. I'm sure > I'm doing something really wrong, but not sure what. sa-learn is more suited to individual or small batches of messages. You'll get significantly improved performance using spamc -L spam (or ham, or forget). Aside from

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-01 Thread RW
sults suggest is within the range of normal. I'm sure > > I'm doing something really wrong, but not sure what. > > sa-learn is more suited to individual or small batches of messages. > You'll get significantly improved performance using spamc -L spam (or > ham, or

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-11-01 Thread David Gessel
Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: Matus UHLAR - fantomas To: users@spamassassin.apache.org Date: Tue Oct 31 2017 23:05:23 GMT+0300 (AST) >>> On 31.10

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread Bill Cole
On 31 Oct 2017, at 7:27 (-0400), David Gessel wrote: bayes_file_mode 0777 Don't do that. I know the SiteWideBayes page recommends that, but it's wrong. It's a bad idea to EVER make ANY file mode 0777 on any normal system. Something mangled your Bayes DB. Anything running on that system *cou

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread Matus UHLAR - fantomas
On 31.10.17 01:35, David Gessel wrote: amavisd-new-2.11.0_2,1 I'm finding the command /usr/local/bin/sa-learn --spam --showdots /mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to if you use amavis, you must train amavis' bayes database (/var/lib/amavis/.spamassassin/ here), no

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread David Gessel
Thank you very much for your help! A few answers inline. Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: Matus UHLAR - fantomas To: users@spamassassin.apache.org Date: Tue Oct

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread David Gessel
Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: Matus UHLAR - fantomas To: users@spamassassin.apache.org Date: Tue Oct 31 2017 13:21:10 GMT+0300 (AST) > > 1. spamc requi

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread David Gessel
at. Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: Reindl Harald To: David Gessel , users@spamassassin.apache.org Date: Tue Oct 31 2017 06:12:43 GMT+0300 (AST) > > > Am 3

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread Matus UHLAR - fantomas
t not sure what. On 31.10.17 08:44, Kevin Golding wrote: sa-learn is more suited to individual or small batches of messages. You'll get significantly improved performance using spamc -L spam (or ham, or forget). 1. spamc requires to be fed with individual messages. 2. spamc communicates

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread David Gessel
Original Message Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? From: Kevin Golding To: users@spamassassin.apache.org Date: Tue Oct 31 2017 11:44:20 GMT+0300 (AST) > On Mon, 30 Oct 2017 22:35

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread Kevin Golding
g, but not sure what. sa-learn is more suited to individual or small batches of messages. You'll get significantly improved performance using spamc -L spam (or ham, or forget).

Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-31 Thread Matus UHLAR - fantomas
On 31.10.17 01:35, David Gessel wrote: amavisd-new-2.11.0_2,1 I'm finding the command /usr/local/bin/sa-learn --spam --showdots /mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to if you use amavis, you must train amavis' bayes database (/var/lib/amavis/.spamassassin/ here), not

very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal?

2017-10-30 Thread David Gessel
FreeBSD 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r322073: Sat Aug 5 01:44:09 PDT 2017 spamassassin-3.4.1_10 amavisd-new-2.11.0_2,1 I'm finding the command /usr/local/bin/sa-learn --spam --showdots /mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to complete... by a while I mean it

Re: txrep training performance

2017-08-01 Thread Jesse Norell
For anyone interested, I largely resolved the performance issues with sa-learn training when using txrep with a little mysql server tuning. As a reference point, training with ~6400 messages (most of which had already been learned) took about 14 minutes for both txrep+bayes, and about 3.5 minutes

Re: txrep training performance

2017-07-12 Thread Jesse Norell
to run with txrep > enabled (use_txrep 1), but only 13 minutes with txrep disabled > (use_txrep 0). One of my main gripes with the old AWL was that it > didn't learn/correct when training messages, so I love that txrep does > that, but does anyone have any tips to improve txrep training

txrep training performance

2017-07-12 Thread Jesse Norell
). One of my main gripes with the old AWL was that it didn't learn/correct when training messages, so I love that txrep does that, but does anyone have any tips to improve txrep training performance? Either tweaks/improvements on my end, or even a little thought on logic redesign in that

Re: High CPU utilization and performance decrease after recent sa-update. RESOLVED

2012-09-07 Thread John Hardin
On Thu, 6 Sep 2012, Piotr Kapiszewski wrote: We noticed a big increase in user CPU utilization on our MX servers since Sep 2nd sa-update. On a typical day we process over 2 million emails on our mail cluster. The issue was an old version of __SARE_URI_VISIT_US in 70_sare_uri1.cf If you have

Re: High CPU utilization and performance decrease after recent sa-update.

2012-09-06 Thread Henrik K
On Thu, Sep 06, 2012 at 06:19:52PM +, Piotr Kapiszewski wrote: > > Hi, > > We noticed a big increase in user CPU utilization on our MX servers since Sep > 2nd sa-update. On a typical day we process over 2 million emails on our mail > cluster. Our debugging has so far isolated the problem t

Re: High CPU utilization and performance decrease after recent sa-update.

2012-09-06 Thread Mark Martinec
Piotr, > We noticed a big increase in user CPU utilization on our MX servers since > Sep 2nd sa-update. On a typical day we process over 2 million emails on > our mail cluster. Our debugging has so far isolated the problem to: > 1) iXhash was a problem module, so we disabled it (the remote locat

Re: High CPU utilization and performance decrease after recent sa-update.

2012-09-06 Thread darxus
On 09/06, Piotr Kapiszewski wrote: >$sa_local_tests_only = 1 (amavis hook) SpamAssassin is wrong about three times as often without network tests. But if you're crippling the network tests as much as you mentioned, might as well use the score set which is optimized for having the network tests

High CPU utilization and performance decrease after recent sa-update.

2012-09-06 Thread Piotr Kapiszewski
Hi, We noticed a big increase in user CPU utilization on our MX servers since Sep 2nd sa-update. On a typical day we process over 2 million emails on our mail cluster. Our debugging has so far isolated the problem to: 1) iXhash was a problem module , so we disabled it (the remote location it

Re: Identifying actual performance on rules

2012-01-23 Thread John Hardin
On Tue, 24 Jan 2012, Karsten Bräckelmann wrote: On Mon, 2012-01-23 at 08:03 -0800, John Hardin wrote: On Sun, 22 Jan 2012, Munroe Sollog wrote: I am trying to locate reason for performance spikes. I have read the various wiki pages, and they suggest solutions but not a way to identify the

Re: Identifying actual performance on rules

2012-01-23 Thread Karsten Bräckelmann
On Mon, 2012-01-23 at 08:03 -0800, John Hardin wrote: > On Sun, 22 Jan 2012, Munroe Sollog wrote: > > > I am trying to locate reason for performance spikes. I have read the > > various wiki pages, and they suggest solutions but not a way to identify > > the bott

Re: Identifying actual performance on rules

2012-01-23 Thread John Hardin
On Sun, 22 Jan 2012, Munroe Sollog wrote: I am trying to locate reason for performance spikes. I have read the various wiki pages, and they suggest solutions but not a way to identify the bottleneck. Is there way to increase logging so that I can begin to identify or rule out the actual

Identifying actual performance on rules

2012-01-22 Thread Munroe Sollog
I am trying to locate reason for performance spikes. I have read the various wiki pages, and they suggest solutions but not a way to identify the bottleneck. Is there way to increase logging so that I can begin to identify or rule out the actual performance bottlenecks? Munroe Sollog

Re: Performance Problems Upgrading From 3.2.5 to 3.3.1 on CentOS 5/6

2011-11-18 Thread Karsten Bräckelmann
On Fri, 2011-11-18 at 19:36 +0100, Karsten Bräckelmann wrote: > On Fri, 2011-11-18 at 08:16 +, Tom wrote: > > (apologies if the html doesn't end up translating well!) Damn, sorry. My attempt at pruning the large tables seriously fucked up the formatting. :/ > > output from top, after running

Re: Performance Problems Upgrading From 3.2.5 to 3.3.1 on CentOS 5/6

2011-11-18 Thread Karsten Bräckelmann
On Fri, 2011-11-18 at 08:16 +, Tom wrote: > Here's the stats from my cluster at the moment (8am) (these figures wll > ramp up considerably!) (apologies if the html doesn't end up > translating well!) > > Server > Load Avg > Processed/Min > Busy Child Proc > Proc Time > 10.44.219.192 > 0.34 > 4

Re: Performance Problems Upgrading From 3.2.5 to 3.3.1 on CentOS 5/6

2011-11-17 Thread Karsten Bräckelmann
On Thu, 2011-11-17 at 15:55 +, Tom wrote: > SPAMDOPTIONS="-d -L -i 10.44.219.208 -A 10.44.217.0/20 -m 40 -q -x -u > spamd --min-children=40" Do you really run a single spamd server, serving a /20 of potential SMTP servers? Also, you configured spamd to try hard and always keep exactly 40 chi

Re: Performance Problems Upgrading From 3.2.5 to 3.3.1 on CentOS 5/6

2011-11-17 Thread Martin Hepworth
rt the service, the load average on > the server starts climbing until it roughly equals the number of children > we have configured, and performance starts to get pretty bad. This seems > to happen whether I use the round-robin method or the default scaling > method. > > I ca

Performance Problems Upgrading From 3.2.5 to 3.3.1 on CentOS 5/6

2011-11-17 Thread Tom
art the service, the load average on the server starts climbing until it roughly equals the number of children we have configured, and performance starts to get pretty bad. This seems to happen whether I use the round-robin method or the default scaling method. I can't really see why my lo

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

2011-10-25 Thread John Hardin
On Tue, 25 Oct 2011, RW wrote: On Tue, 25 Oct 2011 06:28:41 -0700 (PDT) John Hardin wrote: Seconded. MTAs typically have efficient facilities for white- or black-listing specific email addresses. Use the capabilities of your MTA and glue layer to completely bypass SA for those addresses since

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

2011-10-25 Thread RW
On Tue, 25 Oct 2011 06:28:41 -0700 (PDT) John Hardin wrote: > Seconded. MTAs typically have efficient facilities for white- or > black-listing specific email addresses. Use the capabilities of your > MTA and glue layer to completely bypass SA for those addresses since > you _know_ you want to r

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

2011-10-25 Thread Benny Pedersen
On Tue, 25 Oct 2011 11:21:07 +0200, Robert Schetterer wrote: you should choose another way for whitelisting, i.e bypass spamassassin for trusted server ips etc anyway why not using i.e. whitelist_from *@somebody.co ? this open forges to numbers of equal senders recipient, never seen in my logs

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

2011-10-25 Thread John Hardin
On Tue, 25 Oct 2011, Robert Schetterer wrote: Am 25.10.2011 09:51, schrieb SuperDuper: I am planning on exporting a list of our client's email addresses into a file with 5000 separate lines as such: whitelist_from cli...@somebody.co you should choose another way for whitelisting, i.e bypass

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

2011-10-25 Thread Martin Gregorie
On Tue, 2011-10-25 at 00:51 -0700, SuperDuper wrote: > I am planning on exporting a list of our client's email addresses into a file > with 5000 separate lines as such: > whitelist_from cli...@somebody.co > I do essentially the same thing with an SA plugin and rule plus a database. Background: I

Re: 5000 x whitelist_from or whitelist_auth entries - performance hit?

2011-10-25 Thread Robert Schetterer
nd 6Gb RAM - > processor fairly underutilised at the moment. Is 5000 whitelist entries > expected to have a dramatic performance influence? > > Also, further to this, will replacing the whitelist_from with whitelist_auth > make a dramatic difference? > > Approximately what percentage

5000 x whitelist_from or whitelist_auth entries - performance hit?

2011-10-25 Thread SuperDuper
s expected to have a dramatic performance influence? Also, further to this, will replacing the whitelist_from with whitelist_auth make a dramatic difference? Approximately what percentage of servers out there arel configured correctly so that whitelist_auth works correctly? -- View this

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
gt; >> > interesting, but as I understand the solution discussed addresses > >> > read performace. I am interested in write performance. How far could > >> > you take it before PSQL topped out? Any special hardware in use? > >> > >> If it were me, I wou

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > On Fri, 29 Jul 2011 22:41:18 +0200 > Patrick Ben Koetter wrote: > > That's ~230 msg/sec. Ever took it to 500 msg/sec? > > No, we lack the hardware to do that. The 230 msgs/sec rate was > reached by a customer with a lot more money for hardware than we have. :) Isn't that th

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Walter Hurry
ion discussed addresses >> > read performace. I am interested in write performance. How far could >> > you take it before PSQL topped out? Any special hardware in use? >> >> If it were me, I wouldn't be using psql, but libpq. > > I take it its faster. (I&#x

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:41:18 +0200 Patrick Ben Koetter wrote: > That's where your product an SA differ, right? SA writes more to > PostgreSQL e.g. it also stores Bayes tokens in PostgreSQL. Right. > That's ~230 msg/sec. Ever took it to 500 msg/sec? No, we lack the hardware to do that. The 230

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* Walter Hurry : > On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > > > Using an asynchronous approach using different databases is interesting, > > but as I understand the solution discussed addresses read performace. I > > am interested in write perfor

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
* David F. Skoll : > On Fri, 29 Jul 2011 21:56:03 +0200 > Patrick Ben Koetter wrote: > > > I am interested in write performance. How far could > > you take it before PSQL topped out? Any special hardware in use? > > We're not writing very much to PostgreSQL.

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 21:56:03 +0200 Patrick Ben Koetter wrote: > I am interested in write performance. How far could > you take it before PSQL topped out? Any special hardware in use? We're not writing very much to PostgreSQL. For each message, we write a small row containing t

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Walter Hurry
On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote: > Using an asynchronous approach using different databases is interesting, > but as I understand the solution discussed addresses read performace. I > am interested in write performance. How far could you take it before >

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Patrick Ben Koetter
r don't really care...") are > ignoring large sites. claiming this, claiming that ... Having a cluster (of SA nodes) share a (Bayes) database is a performance challenge for larger sites. The problem is not specific to SA or Bayes in particular. Using an asynchronous approach using diff

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 22:35:01 +0300 Henrik K wrote: [...] > Feel free to donate your code for SA and stop the pointless bashing. Um? I'm not "bashing" SA. I think it's a fine piece of work. All I asked is if anyone has made a CDB back-end for SA and I explained why I thought it might be a goo

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Henrik K
ck your Bayes database to update it, scanning processes can stall > and accumulate very quickly. Not really a problem with SQL. > It's true that untuned SpamAssassin's performance is fine for small > sites. But I don't think software developers should aim for small > sites

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
n stall and accumulate very quickly. It's true that untuned SpamAssassin's performance is fine for small sites. But I don't think software developers should aim for small sites and ignore large sites. Regards, David.

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Henrik K
On Fri, Jul 29, 2011 at 01:00:52PM -0400, David F. Skoll wrote: > > That's why I was wondering if anyone had looked at using CDB with SA's > Bayes module. Let's be serious. Only people that really need it are the ones with a custom high volume distributed spam appliance thing. Other 99.9% of user

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 12:45:53 -0400 Michael Scheidell wrote: > you need custom code to sync bayes? do expires? or just interesting > entries in local.cf? Ah, I should have mentioned we don't use SpamAssassin's Bayes module. We use our own Bayes implementation. That's why I was wondering if an

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 12:41 PM, David F. Skoll wrote: On Fri, 29 Jul 2011 12:31:01 -0400 Michael Scheidell wrote: ok, but are you using cdb or postgresql for bayes? cdb for the Bayes data; PostgreSQL for the journal table. Regards, David. you need custom code to sync bayes? do expires? or just intere

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 12:31:01 -0400 Michael Scheidell wrote: > ok, but are you using cdb or postgresql for bayes? cdb for the Bayes data; PostgreSQL for the journal table. Regards, David.

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 12:20 PM, David F. Skoll wrote: This INSERT-only operation cannot block under PostgreSQL MVCC. ok, but are you using cdb or postgresql for bayes? -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259 >*| *SECNAP Network Security Corporation * Best Mobile Solutions Product

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:59:14 -0400 Michael Scheidell wrote: > in mysql, we don't journal. what does that journaling time do to SA > processing times? Id hate to think we go from 1 s/email processing > time to 60 seconds or something while journal is locked. Journalling *improves* training spee

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 11:47 AM, David F. Skoll wrote: CDB is*very* fast. If you journal your Bayes training and run the journal every 5-10 minutes, CDB can easily keep up even with a 2GB Bayes database. in mysql, we don't journal. what does that journaling time do to SA processing times? Id hate to thin

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:36:52 -0400 Michael Scheidell wrote: > On 7/29/11 11:33 AM, David F. Skoll wrote: > > Has anyone investigated writing a CDB backend for SpamAssassin's > > Bayes implementation? I'm guessing the need to rewrite the DB each > > time makes it a bit complex. > esp for people

Re: Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread Michael Scheidell
On 7/29/11 11:33 AM, David F. Skoll wrote: Has anyone investigated writing a CDB backend for SpamAssassin's Bayes implementation? I'm guessing the need to rewrite the DB each time makes it a bit complex. esp for people with 2gb db's? -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259

Performance of Bayes Storage Modules (was Re: Conversion Spamassassin(bayes) database to SDBM)

2011-07-29 Thread David F. Skoll
On Fri, 29 Jul 2011 11:26:57 -0400 Michael Scheidell wrote: > if you use mysql.pm for other things (sql params, user's, etc), it > still doesn't seem to make sense to use sdbm AND mysql. We use PostgreSQL for a number of things, but we found that CDB is much faster than all competitors for Bayes

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread David F. Skoll
On Thu, 23 Jun 2011 13:01:54 +0100 Martin Gregorie wrote: > Does that mean you you can take a copy of the current CBD database, > add the new tokens offline and then stop CDB, rename the copy so it > replaces the live copy and restart CDB? If so , that would give a > remarkably short gap in servi

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread Martin Gregorie
On Thu, 2011-06-23 at 07:01 -0400, David F. Skoll wrote: > > yes, the cdb has to be rebuilt every time from scratch. > > Indeed. I mentioned that. > Does that mean you you can take a copy of the current CBD database, add the new tokens offline and then stop CDB, rename the copy so it replaces

MySQL defects (was Re: High Performance Bayes Database Configuration?)

2011-06-23 Thread David F. Skoll
On Thu, 23 Jun 2011 10:29:43 +0200 Matus UHLAR - fantomas wrote: > On 21.06.11 15:17, David F. Skoll wrote: > >No, not really, but MySQL is broken in so many ways I try to stay > >away from it. Many of the design flaws in > >http://sql-info.de/mysql/gotchas.html remain unfixed. For example, > >

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread David F. Skoll
On Thu, 23 Jun 2011 10:26:48 +0200 Matus UHLAR - fantomas wrote: > On 21.06.11 14:19, David F. Skoll wrote: > >InnoDB may be better, but it's not as fast as CDB, and certainly > >slower than a local CDB file. > did you make any measurementsor are you just guessing? I did not measure InnoDB spec

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread a . smith
Quoting a.sm...@ukgrid.net: Quoting Per Jessen : Matus UHLAR - fantomas wrote: ... again, does this affect BAYES? Probably not, but David was asked to explain why he was wary of using mysql, and he did just that. If those don't apply to Bayes, then he has explained why he doesn't tr

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread a . smith
Quoting Per Jessen : Matus UHLAR - fantomas wrote: ... again, does this affect BAYES? Probably not, but David was asked to explain why he was wary of using mysql, and he did just that. If those don't apply to Bayes, then he has explained why he doesn't trust MyISAM with his data

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread Per Jessen
Matus UHLAR - fantomas wrote: >>On Tue, 21 Jun 2011 20:03:57 +0100 >>Dominic Benson wrote: >> >>> To be fair to MySQL, these days it is pretty solid. There are >>> potentially dangerous configuration options, but there are in >>> Postgres too, and you can turn them off. Have you had a bad >>> exp

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread Matus UHLAR - fantomas
On Tue, 21 Jun 2011 20:03:57 +0100 Dominic Benson wrote: To be fair to MySQL, these days it is pretty solid. There are potentially dangerous configuration options, but there are in Postgres too, and you can turn them off. Have you had a bad experience with a recent version? On 21.06.11 15:17,

Re: High Performance Bayes Database Configuration?

2011-06-23 Thread Matus UHLAR - fantomas
On 21.06.11 14:19, David F. Skoll wrote: InnoDB may be better, but it's not as fast as CDB, and certainly slower than a local CDB file. did you make any measurementsor are you just guessing? Just the TCP round-trip time will make MySQL slower than local CDB files. spamd caches the DB conne

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread David F. Skoll
Sorry for the second reply... On Tue, 21 Jun 2011 20:03:57 +0100 Dominic Benson wrote: > ...the database is shipped out to customers as part of an update service Actually yes, we do ship out a largish Bayes database as part of our update service. But we don't ship it out as a CDB file. It goe

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread David F. Skoll
On Tue, 21 Jun 2011 20:03:57 +0100 Dominic Benson wrote: > To be fair to MySQL, these days it is pretty solid. There are > potentially dangerous configuration options, but there are in > Postgres too, and you can turn them off. Have you had a bad > experience with a recent version? No, not reall

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread Dominic Benson
rom it.) To be fair to MySQL, these days it is pretty solid. There are potentially dangerous configuration options, but there are in Postgres too, and you can turn them off. Have you had a bad experience with a recent version? > >> When they lock, they lock the entire table, which is BAD fo

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread David F. Skoll
On Tue, 21 Jun 2011 21:12:58 +0300 Jari Fredriksson wrote: > Are you sure you are not using MyISAM tables? I don't use MySQL at all. (Our CRM system requires it, but apart from that, I stay away from it.) > When they lock, they lock the entire table, which is BAD for > perfor

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread Jari Fredriksson
;t _trust_ MySQL, and can guarantee that local .cdb files will > outperform it for this workload by a factor of 5 or more. > Are you sure you are not using MyISAM tables? When they lock, they lock the entire table, which is BAD for performance if there are multiple clients or threads usin

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread David F. Skoll
On Tue, 21 Jun 2011 21:06:49 +0300 Jari Fredriksson wrote: > > Our main database is 255MB, consisting of 6 220 831 tokens from > > 731 289 spam messages and 900 447 non-spam messages. Each user > > potentially has his/her own Bayes database in addition to the > > central one, so the total Bayes

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread Jari Fredriksson
21.6.2011 17:53, David F. Skoll kirjoitti: > On Tue, 21 Jun 2011 15:46:59 +0100 > a.sm...@ukgrid.net wrote: > >> Thats a bit harsh on MySQL isn't it? > > No. MySQL is horrible, no question about it. > >> Anyway, how big are the DB's of you guys?? > > Our main database is 255MB, consisting of 6

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread David F. Skoll
On Tue, 21 Jun 2011 15:46:59 +0100 a.sm...@ukgrid.net wrote: > Thats a bit harsh on MySQL isn't it? No. MySQL is horrible, no question about it. > Anyway, how big are the DB's of you guys?? Our main database is 255MB, consisting of 6 220 831 tokens from 731 289 spam messages and 900 447 non-sp

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread a . smith
stgreSQL---we would never use MySQL for any data we care about---we finally settled on Dan Bernstein's CDB format. It has by far the best performance. See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ Take a look at the "Random Reads" timings. CDB is 6 times fa

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread Yet Another Ninja
ntation.) After trying Berkeley DB files and PostgreSQL---we would never use MySQL for any data we care about---we finally settled on Dan Bernstein's CDB format. It has by far the best performance. See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ Take a look at the "Rand

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread Toni Mueller
Hi, On Tue, 21.06.2011 at 07:30:51 -0700, Marc Perkel wrote: > Thanks David but I need real time updating and it's spread across > multiple servers. So need PostgreSQL or MySQL. just a shot in the dark: Maybe you can use mnesia, which is distributed and should be _quite_ fast. It's not SQL, ho

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread David F. Skoll
On Tue, 21 Jun 2011 07:30:51 -0700 Marc Perkel wrote: > Thanks David but I need real time updating and it's spread across > multiple servers. So need PostgreSQL or MySQL. That's what we used to think. It turns out that real-time updating is a waste of resources; journalling Bayes updates and t

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread Marc Perkel
and PostgreSQL---we would never use MySQL for any data we care about---we finally settled on Dan Bernstein's CDB format. It has by far the best performance. See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ Take a look at the "Random Reads" timings. CDB is 6

Re: High Performance Bayes Database Configuration?

2011-06-21 Thread David F. Skoll
d never use MySQL for any data we care about---we finally settled on Dan Bernstein's CDB format. It has by far the best performance. See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/ Take a look at the "Random Reads" timings. CDB is 6 times faster than Berkele

High Performance Bayes Database Configuration?

2011-06-21 Thread Marc Perkel
Trying to get MySQL bays working in a high volume environment. Dedicated MySQL server with SSD drives. Can someone send me a sample my.cnf file and make other suggestings to keep it running wihout database corruption and other MySQL "features"? Or - should I be using some other DB? Thanks in a

Re: Mailspike Performance

2011-04-15 Thread RW
On Mon, 11 Apr 2011 22:39:11 -1000 "Warren Togami Jr." wrote: > We haven't had working statistics viewing for a few weeks, but now it > is fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL. > > http://ruleqa.spamassassin.org/20110409-r1090

Re: Mailspike Performance

2011-04-15 Thread Justin Mason
On Thu, Apr 14, 2011 at 22:51, Adam Katz wrote: > RCVD_IN_MSPIKE_BL has 99% overlap with the SA3.3 set and 98% with the > SA3.2 set.  That leaves 0.6758% of spam uniquely hitting this DNSBL (1% > of its 67.5822%).  RCVD_IN_SEMBLACK has the same story, resulting in > 0.5138% unique spam from its 1%

Re: Mailspike Performance

2011-04-14 Thread Adam Katz
On 04/12/2011 01:39 AM, Warren Togami Jr. wrote: > We haven't had working statistics viewing for a few weeks, but now it > is fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL. > > http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_MSPIKE_BL/detail >

Re: Mailspike Performance

2011-04-12 Thread Alex
Hi, > http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_HOSTKARMA_BL/detail > HOSTKARMA_BL overlaps with MSPIKE_BL 88% of the time, but detects far fewer > spam and and with slightly more FP's.  Compared to last year, HOSTKARMA_BL's > safety rating has improved considerably on a sustain

Mailspike Performance

2011-04-12 Thread Warren Togami Jr.
We haven't had working statistics viewing for a few weeks, but now it is fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL. http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_MSPIKE_BL/detail RCVD_IN_MSPIKE_BL has nearly the highest spam detection ratio of all th

  1   2   3   4   >