On Sun, 31 Aug 2014, Eric Shubert wrote:
I've seen an uptick of spam lately with random low contrast (hidden)
text. This appears to be lowering bayes probabilities.

On 08/31/2014 10:26 PM, John Hardin wrote:
Learn them as spam. That will tend to eliminate that effect.

On 31.08.14 22:54, Eric Shubert wrote:
Been doing that (learning them) for quite a while. I've had that mechanism set up for several years now, and it's working fairly well (after I adjusted the scoring upwards for bayes rules).

It appears to me that the hidden text is being randomly generated. Even saw a random function of some sort in there. I presume it's been designed to 'poison' bayes by vitue of the random text (and a sizable amount of it).

note that even the code for low-contrast HTML may be catched as spam...

bayes poisoning has been considered a myth. With good training, and using
hapaxes (enabled by default) it can even help detecting the spam.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
42.7 percent of all statistics are made up on the spot.

Reply via email to