Hello,
I've decided to write to SA users list about rspamd project[1] status
since I've got the second mention of rspamd in this list. However, I was
not subscribed to it, therefore I cannot reply directly to the original
author of the post.
The phrase mentioned in the original post: "With similar rules, rspamd
is about ten times faster than SpamAssassin", was my mistake, as it only
describes the comparison of SA and rspamd on rather specific ruleset
that was selected after filtering of the overall SA ruleset on our
specific mail payload (and this set included about 100 rules). So I feel
sorry about this phrase that is not true in a common case, as rspamd
does not support all features of SA and has not the same ruleset.
Nevertheless, whilst I was implementing rspamd I took into consideration
main problems with performance I had found in SA: too many regexp checks
for each action (for example, in Received headers parsing code), too
many repeated checks of the same text and so on. Rspamd tries to fix
these problems by using of specified finite state machines, using of
tries for patterns matching, having rules planner to pass more probable
checks before less probable and so on. Moreover, rspamd can use thread
pools for statistic and regexp check that allows to scale easily on
multi-cores machines. As a result, on the rules that we've selected for
porting from SA to rspamd, rspamd was several times faster than SA.
Actually, we could not afford the check speed of SA with our amount of
mail and with our amount of servers. And rspamd solved the problem that
time.
Furthermore, I was focused on maximum performance while writing code for
other rspamd modules, for example, DKIM, SPF or SURBL, trying to avoid
usage of resource greedy libraries (like opendkim or libspf2). The
statistic module was implemented based on Markovian Bayes algorithm with
OSB tokenizer in crm114, that behaves more accurately in my tests than
unigramm bayes that is used in SA by default.
In conclusion, I'd like to add some words about immature state of the
project. Unfortunately, I've developed it focused only on a single
client. Therefore, rspamd can not be compared with SA in terms of
features amount, however, it can be useful for those who do not require
every single feature of SA, but want something oriented on performance
and statistical checks. I'm very keen to attracting more users to rspamd
project, that's why if you have any questions or want to try rspamd,
please feel free to contact me.
Eventually, sorry for this message that is not directly connected with
SA project.
[1]: https://bitbucket.org/vstakhov/rspamd/
--
Vsevolod Stakhov