>From: Paul Stead <paul.st...@zeninternet.co.uk> >Sent: Tuesday, May 24, 2016 9:55 AM >To: users@spamassassin.apache.org >Subject: SA Concepts - plugin for email semantics
>Hi guys, >Based upon some information from others on the list I have put together >a plugin for SA which canonicalises an email into it's basic "concepts". >Concepts are converted to tags, which Bayes can use as tokens to further >help identify spammy/hammy characteristics >Here are some examples of tags from some emails today - >---8<--- >X-SA-Concepts: experience regards money optout time-ref dear great home >request member enjoy woman-adj important online click all-rights >email-adr please price best hot-adj >X-SA-Concepts: experience contact optout winner time-ref survey dear >home privacy prize store thankyou important click gift chance please >X-SA-Concepts: google law search-eng optout amazing order facebook >goodtime privacy lotsofmoney request enjoy details service partner >linkedin twitter trust contact time-ref great online click shop >email-adr please customer newsletter news >X-SA-Concepts: photos view-online money contact optout time-ref cost >reply2me service details online click please >X-SA-Concepts: friend hotwords trust experience regards contact time-ref >medical woman drugs consultant pill mailto woman-adj secret health earn >email-adr please security hot-adj day-of-week >X-SA-Concepts: https mailto re euros regards money youtube invoice >email-adr facebook best hair >---8<--- >This plugin essentially adds an extra layer between the raw input >characteristics and recognition types - allowing clustering of different >characteristics to a more generic type - in effect giving Bayes more of >a two-layer neural network approach. >When combined with Bayes learning these email semantics (or Concepts) >can then be combined with the multiple other characteristics of that >email, to then be compared to other email that came before it. >https://github.com/fmbla/spamassassin-concepts >I'd be really interested to hear your feedback/thoughts on this system >and it's approach. >Paul Good idea. I would like to test this out so I put this on my CentOS 6 servers (perl v5.10.1) and got this: May 24 10:59:51.850 [30158] warn: plugin: failed to parse plugin /etc/mail/spamassassin/Concepts.pm: Type of arg 1 to push must be array (not private variable) at /etc/mail/spamassassin/Concepts.pm line 84, near "$headl;" May 24 10:59:51.850 [30158] warn: Type of arg 1 to push must be array (not private variable) at /etc/mail/spamassassin/Concepts.pm line 88, near ");" May 24 10:59:51.850 [30158] warn: Type of arg 1 to keys must be hash (not hash element) at /etc/mail/spamassassin/Concepts.pm line 93, near "}) " May 24 10:59:51.850 [30158] warn: Type of arg 1 to keys must be hash (not private variable) at /etc/mail/spamassassin/Concepts.pm line 104, near "$matched_concepts) " May 24 10:59:51.850 [30158] warn: Type of arg 1 to push must be array (not hash element) at /etc/mail/spamassassin/Concepts.pm line 168, near "$re if" May 24 10:59:51.850 [30158] warn: Type of arg 1 to keys must be hash (not private variable) at /etc/mail/spamassassin/Concepts.pm line 174, near "$concepts;" May 24 10:59:51.850 [30158] warn: Compilation failed in require at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/PluginHandler.pm line 109. May 24 10:59:52.472 [30158] warn: config: failed to parse line, skipping, in "/etc/mail/spamassassin/41_concepts.cf": concepts_dir /etc/mail/spamassassin/concepts May 24 10:59:52.472 [30158] warn: Unrecognized escape \l passed through in regex; marked by <-- HERE in m/\l <-- HERE otsofmoney\b/ at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Conf/Parser.pm line 1388. May 24 10:59:54.646 [30158] warn: lint: 1 issues detected, please rerun with debug enabled for more information Thanks for sharing your code and time you put into this, Dave