On Wed, Aug 13, 2003 at 05:56:04PM +0200, Richard Lyons wrote: | On Wednesday 13 August 2003 7:46 pm, Jeff Elkins wrote: | > Most of the spam I receive is HTML format. Is there a fairly painless way | > of sending anything formatted HTML to my trash folder?
The Content-Type: header *ought* to provide this information. However, when referring to spam, you can't really assume that any rules will be followed. | > I use kmail and sid. | | I use | <body> contains "a href=" OR <body> contains "/form>" OR <body> contains | "/body>" | | Or similar combinations of two or three of the commonest tags. For some | reason filtering on "<html>" doesn't seem to work. The reason is because a lot of spam is not well-formed HTML. Despite that, the spam is (apparently) effective because mail readers and browsers tend to render bad html fairly reasonably. | Also you need to use small fragments because the tags are often | broken across lines and then not identified. This is an attempt by the spammers to bypass simple-minded text-matching blocks. One solution to this is to parse the HTML the way the mail reader would and match on the parsed version. Better yet, use spamassassin or spambayes to identify just spam. Unfortunately many people send non-spam mail in HTML or with both text/plain and text/html parts. -D -- "Don't use C; In my opinion, C is a library programming language not an app programming language." - Owen Taylor (GTK+ developer) http://dman13.dyndns.org/~dman/
pgp00000.pgp
Description: PGP signature