I see there would be problems in naming your project "RSA". Nevertheless, is 
there any plan to have the current rspamd features in a library, in order to 
allow third-parties to develop their own message handling interface wrapping it?

Giampaolo

Vsevolod Stakhov <vsevo...@highsecure.ru> ha scritto:

Hello,

I've decided to write to SA users list about rspamd project[1] status 
since I've got the second mention of rspamd in this list. However, I was 
not subscribed to it, therefore I cannot reply directly to the original 
author of the post.

The phrase mentioned in the original post: "With similar rules, rspamd 
is about ten times faster than SpamAssassin", was my mistake, as it only 
describes the comparison of SA and rspamd on rather specific ruleset 
that was selected after filtering of the overall SA ruleset on our 
specific mail payload (and this set included about 100 rules). So I feel 
sorry about this phrase that is not true in a common case, as rspamd 
does not support all features of SA and has not the same ruleset.

Nevertheless, whilst I was implementing rspamd I took into consideration 
main problems with performance I had found in SA: too many regexp checks 
for each action (for example, in Received headers parsing code), too 
many repeated checks of the same text and so on. Rspamd tries to fix 
these problems by using of specified finite state machines, using of 
tries for patterns matching, having rules planner to pass more probable 
checks before less probable and so on. Moreover, rspamd can use thread 
pools for statistic and regexp check that allows to scale easily on 
multi-cores machines. As a result, on the rules that we've selected for 
porting from SA to rspamd, rspamd was several times faster than SA. 
Actually, we could not afford the check speed of SA with our amount of 
mail and with our amount of servers. And rspamd solved the problem that 
time.

Furthermore, I was focused on maximum performance while writing code for 
other rspamd modules, for example, DKIM, SPF or SURBL, trying to avoid 
usage of resource greedy libraries (like opendkim or libspf2). The 
statistic module was implemented based on Markovian Bayes algorithm with 
OSB tokenizer in crm114, that behaves more accurately in my tests than 
unigramm bayes that is used in SA by default.

In conclusion, I'd like to add some words about immature state of the 
project. Unfortunately, I've developed it focused only on a single 
client. Therefore, rspamd can not be compared with SA in terms of 
features amount, however, it can be useful for those who do not require 
every single feature of SA, but want something oriented on performance 
and statistical checks. I'm very keen to attracting more users to rspamd 
project, that's why if you have any questions or want to try rspamd, 
please feel free to contact me.

Eventually, sorry for this message that is not directly connected with 
SA project.

[1]: https://bitbucket.org/vstakhov/rspamd/

-- 
Vsevolod Stakhov

Reply via email to