On Wed, Oct 27, 2004 at 09:35:11AM -0400, Keith Hackworth wrote: > > I'm guess you want PMS::get_decoded_stripped_body_text_array(). > > Thanks, Theo - this may work for html only messages, which might be good > enough for what I'm trying to do. I need just the HTML version of the > email. No attachments, just the HTML body. If the 1st part if multipart, > I need the 1st html part.
If you want to limit what you're looking at in that way, you'd need to access the Message object directly and use find_parts to grab just the first matching part you're interested in. The PMS functions work on all text/* parts, and aren't limited to HTML. > Here's what I'm trying to do: > I'm trying to find invalid html tags and if there's too many, bump the sa > score up a bit. I noticed a bunch of messages come in with obfu like this > "v-wo<notatag>rd" in the body of the html message, which shows up as > "v-word" on a normal webmail or outlook email client. I want to see how > many "notatag"s we're getting in a message. I got the code on how to do > it and it works fine, but it's just WAY too slow using PMS::get_message(). Yeah, that'll get you a bunch of stuff you really don't care about. get_decoded_stripped... is also not the right thing, since it will have stripped all the HTML tags. I'd try get_decoded_body_text_array(), or since you're doing code anyway, just use find_parts and grab the [EMAIL PROTECTED]/[EMAIL PROTECTED] parts of the message. You can then easily call decode() on them (object function) and get the raw HTML out. Just curious though, why limit yourself to invalid html tags? Why not just target the html-tag-in-middle-of-word behavior? and isn't this the same idea as the backhair code? -- Randomly Generated Tagline: "Exactly what it should've been, give people what they expect. The third one can be clever." - John Hughes about Home Alone 2
pgpvPH8FZOHTb.pgp
Description: PGP signature