Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal? NYTProf results TxRep.pm 1720440
vs 1651114
From: RW
To: users@spamassassin.apache.org
Date: Tue Nov 07 2017 03:44:50 GMT+0300 (AST
On 04.11.17 16:09, David Gessel wrote:
so days later, still chunking away, not making much progress.
1. did you enable bayes_learn_to_journal?
2. do you still run multiple sa-learn jobs in parallel?
3. do you still feed thousands of spam messages to it?
there is possibility of storing bayes da
On Tue, 7 Nov 2017 00:59:12 +0300
David Gessel wrote:
> FreeBSD is currently installing TxRep.pm rev 1651114 from Jan 12
> 15:17:46 2015 (it is the only revision that has only whitespace
> differences, all leading padding, there are code differences between
> installed and 1650327 (previous) and 1
continue through 1720440.
However, 1720440 seems to cause massive performance issues.
== With FreeBSD installed TxRep.pm (1651114) ==
# sa-learn --clear
# sa-learn --dump
ERROR: Bayes dump returned an error, please re-run with -D for more information
(folder has 236 messages)
# perl -T -d:NYTProf
Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: David Jones
To: users@spamassassin.apache.org
Date: Sat Nov 04 2017 16:35:02 GMT+0300 (AST)
> On 11/04/2017 08:09 AM, Da
Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: Matus UHLAR - fantomas
To: users@spamassassin.apache.org
Date: Tue Oct 31 2017 23:05:23 GMT+0300 (AST)
> dovecot's antspa
there or not there a few times over various
versions and so may be slightly meaningful to something)
0021 # use bytes;
I'm not sufficiently perl savvy to have any idea whether that's useful to my
performance issues or not, but it an easy enough mod to try.
Any thoughts?
Can you s
times over various
versions and so may be slightly meaningful to something)
0021 # use bytes;
I'm not sufficiently perl savvy to have any idea whether that's useful to my
performance issues or not, but it an easy enough mod to try.
Any thoughts?
-David
Original Message
Oh, I wiped the bayes data and started over already once, it isn't (or
shouldn't be) that big a deal.
Disk performance: seems OK to me.
# diskinfo -t /dev/aacd0
/dev/aacd0
512 # sectorsize
73295462400 # mediasize in bytes (68G)
On 11/01/2017 04:40 PM, David Gessel wrote:
Bill,
Thanks for the advice. I'm not too worried about the permissions config,
though I will make the mods once I get performance up to the point where bayes
is usable at all - I wouldn't want to lose all those sweet, sweet token
Bill,
Thanks for the advice. I'm not too worried about the permissions config,
though I will make the mods once I get performance up to the point where bayes
is usable at all - I wouldn't want to lose all those sweet, sweet tokens to
some unauthorized write premissio
> way, way below what it should be. A hint that suggests it isn't any
> sort of processing performance issue is that CPU load barely
> registers for perl/sa-learn.
First set
bayes_auto_expire 0
This a good idea anyway as auto-expiry can cause problems during
scanning. Running sa
k results suggest is within the range of normal. I'm sure
> I'm doing something really wrong, but not sure what.
sa-learn is more suited to individual or small batches of messages.
You'll get significantly improved performance using spamc -L spam (or
ham, or forget).
Aside from
sults suggest is within the range of normal. I'm sure
> > I'm doing something really wrong, but not sure what.
>
> sa-learn is more suited to individual or small batches of messages.
> You'll get significantly improved performance using spamc -L spam (or
> ham, or
Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: Matus UHLAR - fantomas
To: users@spamassassin.apache.org
Date: Tue Oct 31 2017 23:05:23 GMT+0300 (AST)
>>> On 31.10
On 31 Oct 2017, at 7:27 (-0400), David Gessel wrote:
bayes_file_mode 0777
Don't do that. I know the SiteWideBayes page recommends that, but it's
wrong. It's a bad idea to EVER make ANY file mode 0777 on any normal
system. Something mangled your Bayes DB. Anything running on that system
*cou
On 31.10.17 01:35, David Gessel wrote:
amavisd-new-2.11.0_2,1
I'm finding the command /usr/local/bin/sa-learn --spam --showdots
/mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to
if you use amavis, you must train amavis' bayes database
(/var/lib/amavis/.spamassassin/ here), no
Thank you very much for your help! A few answers inline.
Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: Matus UHLAR - fantomas
To: users@spamassassin.apache.org
Date: Tue Oct
Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: Matus UHLAR - fantomas
To: users@spamassassin.apache.org
Date: Tue Oct 31 2017 13:21:10 GMT+0300 (AST)
>
> 1. spamc requi
at.
Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: Reindl Harald
To: David Gessel , users@spamassassin.apache.org
Date: Tue Oct 31 2017 06:12:43 GMT+0300 (AST)
>
>
> Am 3
t not sure what.
On 31.10.17 08:44, Kevin Golding wrote:
sa-learn is more suited to individual or small batches of messages.
You'll get significantly improved performance using spamc -L spam (or
ham, or forget).
1. spamc requires to be fed with individual messages.
2. spamc communicates
Original Message
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per
token really, really slow or roughly normal?
From: Kevin Golding
To: users@spamassassin.apache.org
Date: Tue Oct 31 2017 11:44:20 GMT+0300 (AST)
> On Mon, 30 Oct 2017 22:35
g, but not sure what.
sa-learn is more suited to individual or small batches of messages. You'll
get significantly improved performance using spamc -L spam (or ham, or
forget).
On 31.10.17 01:35, David Gessel wrote:
amavisd-new-2.11.0_2,1
I'm finding the command /usr/local/bin/sa-learn --spam --showdots
/mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to
if you use amavis, you must train amavis' bayes database
(/var/lib/amavis/.spamassassin/ here), not
FreeBSD 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r322073: Sat Aug 5 01:44:09 PDT
2017
spamassassin-3.4.1_10
amavisd-new-2.11.0_2,1
I'm finding the command /usr/local/bin/sa-learn --spam --showdots
/mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to complete...
by a while I mean it
For anyone interested, I largely resolved the performance issues with
sa-learn training when using txrep with a little mysql server tuning.
As a reference point, training with ~6400 messages (most of which had
already been learned) took about 14 minutes for both txrep+bayes, and
about 3.5 minutes
to run with txrep
> enabled (use_txrep 1), but only 13 minutes with txrep disabled
> (use_txrep 0). One of my main gripes with the old AWL was that it
> didn't learn/correct when training messages, so I love that txrep does
> that, but does anyone have any tips to improve txrep training
). One of my main gripes with the old AWL was that it
didn't learn/correct when training messages, so I love that txrep does
that, but does anyone have any tips to improve txrep training
performance? Either tweaks/improvements on my end, or even a little
thought on logic redesign in that
On Thu, 6 Sep 2012, Piotr Kapiszewski wrote:
We noticed a big increase in user CPU utilization on our MX servers
since Sep 2nd sa-update. On a typical day we process over 2 million
emails on our mail cluster.
The issue was an old version of __SARE_URI_VISIT_US in 70_sare_uri1.cf
If you have
On Thu, Sep 06, 2012 at 06:19:52PM +, Piotr Kapiszewski wrote:
>
> Hi,
>
> We noticed a big increase in user CPU utilization on our MX servers since Sep
> 2nd sa-update. On a typical day we process over 2 million emails on our mail
> cluster. Our debugging has so far isolated the problem t
Piotr,
> We noticed a big increase in user CPU utilization on our MX servers since
> Sep 2nd sa-update. On a typical day we process over 2 million emails on
> our mail cluster. Our debugging has so far isolated the problem to:
> 1) iXhash was a problem module, so we disabled it (the remote locat
On 09/06, Piotr Kapiszewski wrote:
>$sa_local_tests_only = 1 (amavis hook)
SpamAssassin is wrong about three times as often without network tests.
But if you're crippling the network tests as much as you mentioned, might
as well use the score set which is optimized for having the network tests
Hi,
We noticed a big increase in user CPU utilization on our MX servers since Sep
2nd sa-update. On a typical day we process over 2 million emails on our mail
cluster. Our debugging has so far isolated the problem to:
1) iXhash was a problem module , so we disabled it (the remote location it
On Tue, 24 Jan 2012, Karsten Bräckelmann wrote:
On Mon, 2012-01-23 at 08:03 -0800, John Hardin wrote:
On Sun, 22 Jan 2012, Munroe Sollog wrote:
I am trying to locate reason for performance spikes. I have read the
various wiki pages, and they suggest solutions but not a way to identify
the
On Mon, 2012-01-23 at 08:03 -0800, John Hardin wrote:
> On Sun, 22 Jan 2012, Munroe Sollog wrote:
>
> > I am trying to locate reason for performance spikes. I have read the
> > various wiki pages, and they suggest solutions but not a way to identify
> > the bott
On Sun, 22 Jan 2012, Munroe Sollog wrote:
I am trying to locate reason for performance spikes. I have read the
various wiki pages, and they suggest solutions but not a way to identify
the bottleneck. Is there way to increase logging so that I can begin to
identify or rule out the actual
I am trying to locate reason for performance spikes. I have read the various
wiki pages, and they suggest solutions but not a way to identify the
bottleneck. Is there way to increase logging so that I can begin to identify
or rule out the actual performance bottlenecks?
Munroe Sollog
On Fri, 2011-11-18 at 19:36 +0100, Karsten Bräckelmann wrote:
> On Fri, 2011-11-18 at 08:16 +, Tom wrote:
> > (apologies if the html doesn't end up translating well!)
Damn, sorry. My attempt at pruning the large tables seriously fucked up
the formatting. :/
> > output from top, after running
On Fri, 2011-11-18 at 08:16 +, Tom wrote:
> Here's the stats from my cluster at the moment (8am) (these figures wll
> ramp up considerably!) (apologies if the html doesn't end up
> translating well!)
>
> Server
> Load Avg
> Processed/Min
> Busy Child Proc
> Proc Time
> 10.44.219.192
> 0.34
> 4
On Thu, 2011-11-17 at 15:55 +, Tom wrote:
> SPAMDOPTIONS="-d -L -i 10.44.219.208 -A 10.44.217.0/20 -m 40 -q -x -u
> spamd --min-children=40"
Do you really run a single spamd server, serving a /20 of potential SMTP
servers?
Also, you configured spamd to try hard and always keep exactly 40
chi
rt the service, the load average on
> the server starts climbing until it roughly equals the number of children
> we have configured, and performance starts to get pretty bad. This seems
> to happen whether I use the round-robin method or the default scaling
> method.
>
> I ca
art the service, the
load average on the server starts climbing until it roughly equals the
number of children we have configured, and performance starts to get
pretty bad. This seems to happen whether I use the round-robin method
or the default scaling method.
I can't really see why my lo
On Tue, 25 Oct 2011, RW wrote:
On Tue, 25 Oct 2011 06:28:41 -0700 (PDT)
John Hardin wrote:
Seconded. MTAs typically have efficient facilities for white- or
black-listing specific email addresses. Use the capabilities of your
MTA and glue layer to completely bypass SA for those addresses since
On Tue, 25 Oct 2011 06:28:41 -0700 (PDT)
John Hardin wrote:
> Seconded. MTAs typically have efficient facilities for white- or
> black-listing specific email addresses. Use the capabilities of your
> MTA and glue layer to completely bypass SA for those addresses since
> you _know_ you want to r
On Tue, 25 Oct 2011 11:21:07 +0200, Robert Schetterer wrote:
you should choose another way for whitelisting,
i.e bypass spamassassin for trusted server ips etc
anyway why not using i.e. whitelist_from *@somebody.co ?
this open forges to numbers of equal senders recipient, never seen in
my logs
On Tue, 25 Oct 2011, Robert Schetterer wrote:
Am 25.10.2011 09:51, schrieb SuperDuper:
I am planning on exporting a list of our client's email addresses into a file
with 5000 separate lines as such:
whitelist_from cli...@somebody.co
you should choose another way for whitelisting,
i.e bypass
On Tue, 2011-10-25 at 00:51 -0700, SuperDuper wrote:
> I am planning on exporting a list of our client's email addresses into a file
> with 5000 separate lines as such:
> whitelist_from cli...@somebody.co
>
I do essentially the same thing with an SA plugin and rule plus a
database.
Background: I
nd 6Gb RAM -
> processor fairly underutilised at the moment. Is 5000 whitelist entries
> expected to have a dramatic performance influence?
>
> Also, further to this, will replacing the whitelist_from with whitelist_auth
> make a dramatic difference?
>
> Approximately what percentage
s
expected to have a dramatic performance influence?
Also, further to this, will replacing the whitelist_from with whitelist_auth
make a dramatic difference?
Approximately what percentage of servers out there arel configured correctly
so that whitelist_auth works correctly?
--
View this
gt; >> > interesting, but as I understand the solution discussed addresses
> >> > read performace. I am interested in write performance. How far could
> >> > you take it before PSQL topped out? Any special hardware in use?
> >>
> >> If it were me, I wou
* David F. Skoll :
> On Fri, 29 Jul 2011 22:41:18 +0200
> Patrick Ben Koetter wrote:
> > That's ~230 msg/sec. Ever took it to 500 msg/sec?
>
> No, we lack the hardware to do that. The 230 msgs/sec rate was
> reached by a customer with a lot more money for hardware than we have. :)
Isn't that th
ion discussed addresses
>> > read performace. I am interested in write performance. How far could
>> > you take it before PSQL topped out? Any special hardware in use?
>>
>> If it were me, I wouldn't be using psql, but libpq.
>
> I take it its faster. (I
On Fri, 29 Jul 2011 22:41:18 +0200
Patrick Ben Koetter wrote:
> That's where your product an SA differ, right? SA writes more to
> PostgreSQL e.g. it also stores Bayes tokens in PostgreSQL.
Right.
> That's ~230 msg/sec. Ever took it to 500 msg/sec?
No, we lack the hardware to do that. The 230
* Walter Hurry :
> On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote:
>
> > Using an asynchronous approach using different databases is interesting,
> > but as I understand the solution discussed addresses read performace. I
> > am interested in write perfor
* David F. Skoll :
> On Fri, 29 Jul 2011 21:56:03 +0200
> Patrick Ben Koetter wrote:
>
> > I am interested in write performance. How far could
> > you take it before PSQL topped out? Any special hardware in use?
>
> We're not writing very much to PostgreSQL.
On Fri, 29 Jul 2011 21:56:03 +0200
Patrick Ben Koetter wrote:
> I am interested in write performance. How far could
> you take it before PSQL topped out? Any special hardware in use?
We're not writing very much to PostgreSQL. For each message, we
write a small row containing t
On Fri, 29 Jul 2011 21:56:03 +0200, Patrick Ben Koetter wrote:
> Using an asynchronous approach using different databases is interesting,
> but as I understand the solution discussed addresses read performace. I
> am interested in write performance. How far could you take it before
>
r don't really care...") are
> ignoring large sites.
claiming this, claiming that ...
Having a cluster (of SA nodes) share a (Bayes) database is a performance
challenge for larger sites. The problem is not specific to SA or Bayes in
particular.
Using an asynchronous approach using diff
On Fri, 29 Jul 2011 22:35:01 +0300
Henrik K wrote:
[...]
> Feel free to donate your code for SA and stop the pointless bashing.
Um? I'm not "bashing" SA. I think it's a fine piece of work. All I asked
is if anyone has made a CDB back-end for SA and I explained why I thought
it might be a goo
ck your Bayes database to update it, scanning processes can stall
> and accumulate very quickly.
Not really a problem with SQL.
> It's true that untuned SpamAssassin's performance is fine for small
> sites. But I don't think software developers should aim for small
> sites
n stall
and accumulate very quickly.
It's true that untuned SpamAssassin's performance is fine for small
sites. But I don't think software developers should aim for small
sites and ignore large sites.
Regards,
David.
On Fri, Jul 29, 2011 at 01:00:52PM -0400, David F. Skoll wrote:
>
> That's why I was wondering if anyone had looked at using CDB with SA's
> Bayes module.
Let's be serious. Only people that really need it are the ones with a custom
high volume distributed spam appliance thing. Other 99.9% of user
On Fri, 29 Jul 2011 12:45:53 -0400
Michael Scheidell wrote:
> you need custom code to sync bayes? do expires? or just interesting
> entries in local.cf?
Ah, I should have mentioned we don't use SpamAssassin's Bayes module. We
use our own Bayes implementation.
That's why I was wondering if an
On 7/29/11 12:41 PM, David F. Skoll wrote:
On Fri, 29 Jul 2011 12:31:01 -0400
Michael Scheidell wrote:
ok, but are you using cdb or postgresql for bayes?
cdb for the Bayes data; PostgreSQL for the journal table.
Regards,
David.
you need custom code to sync bayes? do expires? or just intere
On Fri, 29 Jul 2011 12:31:01 -0400
Michael Scheidell wrote:
> ok, but are you using cdb or postgresql for bayes?
cdb for the Bayes data; PostgreSQL for the journal table.
Regards,
David.
On 7/29/11 12:20 PM, David F. Skoll wrote:
This INSERT-only
operation cannot block under PostgreSQL MVCC.
ok, but are you using cdb or postgresql for bayes?
--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
>*| *SECNAP Network Security Corporation
* Best Mobile Solutions Product
On Fri, 29 Jul 2011 11:59:14 -0400
Michael Scheidell wrote:
> in mysql, we don't journal. what does that journaling time do to SA
> processing times? Id hate to think we go from 1 s/email processing
> time to 60 seconds or something while journal is locked.
Journalling *improves* training spee
On 7/29/11 11:47 AM, David F. Skoll wrote:
CDB is*very* fast. If you journal your Bayes training and run the
journal every 5-10 minutes, CDB can easily keep up even with a 2GB
Bayes database.
in mysql, we don't journal. what does that journaling time do to SA
processing times? Id hate to thin
On Fri, 29 Jul 2011 11:36:52 -0400
Michael Scheidell wrote:
> On 7/29/11 11:33 AM, David F. Skoll wrote:
> > Has anyone investigated writing a CDB backend for SpamAssassin's
> > Bayes implementation? I'm guessing the need to rewrite the DB each
> > time makes it a bit complex.
> esp for people
On 7/29/11 11:33 AM, David F. Skoll wrote:
Has anyone investigated writing a CDB backend for SpamAssassin's Bayes
implementation? I'm guessing the need to rewrite the DB each time makes
it a bit complex.
esp for people with 2gb db's?
--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
On Fri, 29 Jul 2011 11:26:57 -0400
Michael Scheidell wrote:
> if you use mysql.pm for other things (sql params, user's, etc), it
> still doesn't seem to make sense to use sdbm AND mysql.
We use PostgreSQL for a number of things, but we found that CDB is
much faster than all competitors for Bayes
On Thu, 23 Jun 2011 13:01:54 +0100
Martin Gregorie wrote:
> Does that mean you you can take a copy of the current CBD database,
> add the new tokens offline and then stop CDB, rename the copy so it
> replaces the live copy and restart CDB? If so , that would give a
> remarkably short gap in servi
On Thu, 2011-06-23 at 07:01 -0400, David F. Skoll wrote:
> > yes, the cdb has to be rebuilt every time from scratch.
>
> Indeed. I mentioned that.
>
Does that mean you you can take a copy of the current CBD database, add
the new tokens offline and then stop CDB, rename the copy so it replaces
On Thu, 23 Jun 2011 10:29:43 +0200
Matus UHLAR - fantomas wrote:
> On 21.06.11 15:17, David F. Skoll wrote:
> >No, not really, but MySQL is broken in so many ways I try to stay
> >away from it. Many of the design flaws in
> >http://sql-info.de/mysql/gotchas.html remain unfixed. For example,
> >
On Thu, 23 Jun 2011 10:26:48 +0200
Matus UHLAR - fantomas wrote:
> On 21.06.11 14:19, David F. Skoll wrote:
> >InnoDB may be better, but it's not as fast as CDB, and certainly
> >slower than a local CDB file.
> did you make any measurementsor are you just guessing?
I did not measure InnoDB spec
Quoting a.sm...@ukgrid.net:
Quoting Per Jessen :
Matus UHLAR - fantomas wrote:
... again, does this affect BAYES?
Probably not, but David was asked to explain why he was wary of using
mysql, and he did just that.
If those don't apply to Bayes, then he has explained why he doesn't
tr
Quoting Per Jessen :
Matus UHLAR - fantomas wrote:
... again, does this affect BAYES?
Probably not, but David was asked to explain why he was wary of using
mysql, and he did just that.
If those don't apply to Bayes, then he has explained why he doesn't
trust MyISAM with his data
Matus UHLAR - fantomas wrote:
>>On Tue, 21 Jun 2011 20:03:57 +0100
>>Dominic Benson wrote:
>>
>>> To be fair to MySQL, these days it is pretty solid. There are
>>> potentially dangerous configuration options, but there are in
>>> Postgres too, and you can turn them off. Have you had a bad
>>> exp
On Tue, 21 Jun 2011 20:03:57 +0100
Dominic Benson wrote:
To be fair to MySQL, these days it is pretty solid. There are
potentially dangerous configuration options, but there are in
Postgres too, and you can turn them off. Have you had a bad
experience with a recent version?
On 21.06.11 15:17,
On 21.06.11 14:19, David F. Skoll wrote:
InnoDB may be better, but it's not as fast as CDB, and certainly
slower than a local CDB file.
did you make any measurementsor are you just guessing?
Just the TCP round-trip time will make MySQL slower than local CDB files.
spamd caches the DB conne
Sorry for the second reply...
On Tue, 21 Jun 2011 20:03:57 +0100
Dominic Benson wrote:
> ...the database is shipped out to customers as part of an update service
Actually yes, we do ship out a largish Bayes database as part of our
update service. But we don't ship it out as a CDB file. It goe
On Tue, 21 Jun 2011 20:03:57 +0100
Dominic Benson wrote:
> To be fair to MySQL, these days it is pretty solid. There are
> potentially dangerous configuration options, but there are in
> Postgres too, and you can turn them off. Have you had a bad
> experience with a recent version?
No, not reall
rom it.)
To be fair to MySQL, these days it is pretty solid. There are potentially
dangerous configuration options, but there are in Postgres too, and you can
turn them off. Have you had a bad experience with a recent version?
>
>> When they lock, they lock the entire table, which is BAD fo
On Tue, 21 Jun 2011 21:12:58 +0300
Jari Fredriksson wrote:
> Are you sure you are not using MyISAM tables?
I don't use MySQL at all. (Our CRM system requires it, but apart from
that, I stay away from it.)
> When they lock, they lock the entire table, which is BAD for
> perfor
;t _trust_ MySQL, and can guarantee that local .cdb files will
> outperform it for this workload by a factor of 5 or more.
>
Are you sure you are not using MyISAM tables? When they lock, they lock
the entire table, which is BAD for performance if there are multiple
clients or threads usin
On Tue, 21 Jun 2011 21:06:49 +0300
Jari Fredriksson wrote:
> > Our main database is 255MB, consisting of 6 220 831 tokens from
> > 731 289 spam messages and 900 447 non-spam messages. Each user
> > potentially has his/her own Bayes database in addition to the
> > central one, so the total Bayes
21.6.2011 17:53, David F. Skoll kirjoitti:
> On Tue, 21 Jun 2011 15:46:59 +0100
> a.sm...@ukgrid.net wrote:
>
>> Thats a bit harsh on MySQL isn't it?
>
> No. MySQL is horrible, no question about it.
>
>> Anyway, how big are the DB's of you guys??
>
> Our main database is 255MB, consisting of 6
On Tue, 21 Jun 2011 15:46:59 +0100
a.sm...@ukgrid.net wrote:
> Thats a bit harsh on MySQL isn't it?
No. MySQL is horrible, no question about it.
> Anyway, how big are the DB's of you guys??
Our main database is 255MB, consisting of 6 220 831 tokens from
731 289 spam messages and 900 447 non-sp
stgreSQL---we
would never use MySQL for any data we care about---we finally settled
on Dan Bernstein's CDB format. It has by far the best performance.
See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
Take a look at the "Random Reads" timings. CDB is 6 times fa
ntation.) After trying Berkeley DB files and PostgreSQL---we
would never use MySQL for any data we care about---we finally settled
on Dan Bernstein's CDB format. It has by far the best performance.
See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
Take a look at the "Rand
Hi,
On Tue, 21.06.2011 at 07:30:51 -0700, Marc Perkel
wrote:
> Thanks David but I need real time updating and it's spread across
> multiple servers. So need PostgreSQL or MySQL.
just a shot in the dark: Maybe you can use mnesia, which is distributed
and should be _quite_ fast. It's not SQL, ho
On Tue, 21 Jun 2011 07:30:51 -0700
Marc Perkel wrote:
> Thanks David but I need real time updating and it's spread across
> multiple servers. So need PostgreSQL or MySQL.
That's what we used to think. It turns out that real-time updating
is a waste of resources; journalling Bayes updates and t
and PostgreSQL---we
would never use MySQL for any data we care about---we finally settled
on Dan Bernstein's CDB format. It has by far the best performance.
See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
Take a look at the "Random Reads" timings. CDB is 6
d never use MySQL for any data we care about---we finally settled
on Dan Bernstein's CDB format. It has by far the best performance.
See: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
Take a look at the "Random Reads" timings. CDB is 6 times faster than
Berkele
Trying to get MySQL bays working in a high volume environment. Dedicated
MySQL server with SSD drives. Can someone send me a sample my.cnf file
and make other suggestings to keep it running wihout database corruption
and other MySQL "features"? Or - should I be using some other DB?
Thanks in a
On Mon, 11 Apr 2011 22:39:11 -1000
"Warren Togami Jr." wrote:
> We haven't had working statistics viewing for a few weeks, but now it
> is fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL.
>
> http://ruleqa.spamassassin.org/20110409-r1090
On Thu, Apr 14, 2011 at 22:51, Adam Katz wrote:
> RCVD_IN_MSPIKE_BL has 99% overlap with the SA3.3 set and 98% with the
> SA3.2 set. That leaves 0.6758% of spam uniquely hitting this DNSBL (1%
> of its 67.5822%). RCVD_IN_SEMBLACK has the same story, resulting in
> 0.5138% unique spam from its 1%
On 04/12/2011 01:39 AM, Warren Togami Jr. wrote:
> We haven't had working statistics viewing for a few weeks, but now it
> is fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL.
>
> http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_MSPIKE_BL/detail
>
Hi,
> http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_HOSTKARMA_BL/detail
> HOSTKARMA_BL overlaps with MSPIKE_BL 88% of the time, but detects far fewer
> spam and and with slightly more FP's. Compared to last year, HOSTKARMA_BL's
> safety rating has improved considerably on a sustain
We haven't had working statistics viewing for a few weeks, but now it is
fixed and I'm amazed by the performance of RCVD_IN_MSPIKE_BL.
http://ruleqa.spamassassin.org/20110409-r1090548-n/T_RCVD_IN_MSPIKE_BL/detail
RCVD_IN_MSPIKE_BL has nearly the highest spam detection ratio of all th
1 - 100 of 397 matches
Mail list logo