Re: 'non-pristine message'?

Theo Van Dinter 27 Oct 2004 14:55:46 -0000

On Wed, Oct 27, 2004 at 09:35:11AM -0400, Keith Hackworth wrote:
> > I'm guess you want PMS::get_decoded_stripped_body_text_array().
> 
> Thanks, Theo - this may work for html only messages, which might be good
> enough for what I'm trying to do.  I need just the HTML version of the
> email.  No attachments, just the HTML body.  If the 1st part if multipart,
> I need the 1st html part.


If you want to limit what you're looking at in that way, you'd need to access
the Message object directly and use find_parts to grab just the first matching
part you're interested in.  The PMS functions work on all text/* parts, and
aren't limited to HTML.

> Here's what I'm trying to do:
> I'm trying to find invalid html tags and if there's too many, bump the sa
> score up a bit.  I noticed a bunch of messages come in with obfu like this
> "v-wo<notatag>rd" in the body of the html message, which shows up as
> "v-word" on a normal webmail or outlook email client.  I want to see how
> many "notatag"s we're getting in a message.  I got the code on how to do
> it and it works fine, but it's just WAY too slow using PMS::get_message().

Yeah, that'll get you a bunch of stuff you really don't care about.
get_decoded_stripped... is also not the right thing, since it will have
stripped all the HTML tags.  I'd try get_decoded_body_text_array(),
or since you're doing code anyway, just use find_parts and grab the
[EMAIL PROTECTED]/[EMAIL PROTECTED] parts of the message.  You can then easily 
call decode()
on them (object function) and get the raw HTML out.

Just curious though, why limit yourself to invalid html tags?  Why not just
target the html-tag-in-middle-of-word behavior?   and isn't this the same idea
as the backhair code?

-- 
Randomly Generated Tagline:
"Exactly what it should've been, give people what they expect.  The third 
 one can be clever."               - John Hughes about Home Alone 2

pgpvPH8FZOHTb.pgp
Description: PGP signature

Re: 'non-pristine message'?

Reply via email to