On 03.08.09 23:17, MySQL Student wrote:
> We have accumulated quite a large list of whitelisted users, primarily
> because they were previously tagged incorrectly. I've extracted a copy
> of all whitelisted mail into a separate mbox.
> 
> Certainly there is some spam in there as well, but assuming I only
> learn the ham, would it make sense to train bayes using the emails
> from this folder?

if you'll separate the spam, yes.

> It's all business-related, but I'm concerned that it
> may have things in the email that caused it to be tagged in the first
> place, like excessive HTML, sent from a host with no reverse DNS, etc.
> -- all the reasons for it being whitelisted in the first place.

If you only do whitelisting, and don't advise users to take care of why
their mail was marked as spam, many things may occur. However it may be
worth it.

> Looking at the logs before the addresses were added to the whitelist,
> I see quite a few that were BAYES_99, probably because they resemble
> mailing lists, such as those from networkworld, for example.

do you train mailing list mail as spam? Or do you mean spamming lists?

> IOW, I
> wouldn't want to whitelist an email from networkworld.com, but one of
> the company's partners could send the company an email that had many
> of those characteristics.
> 
> Someone may also send them a one-line email with a small GIF as an
> attachment, such as their corporate logo in their signature. This
> would be a valid email, but also very much resembles the
> characteristics of a typical spam.
> 
> This is all being done to hopefully train bayes to better recognize
> corporate email, and hopefully cut down on the number of whitelisted
> senders that must be added in the future (or, corporate email that
> gets tagged then must be whitelisted).

Just do the training, altogether with advising users to get reverse DNS,
send html+text instead of html only etc. I think you are not the only one
who marks their mail as spam...

-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
There's a long-standing bug relating to the x86 architecture that
allows you to install Windows.   -- Matthew D. Fuller

Reply via email to