Hello, I want to use some machine learning stuff on mail messages. First step is get some flattened text from a mail message, python's email package does not work as automatically as I wish. Right now I have:
> def mail_preprocessor(str): > msg = email.message_from_string(str) > msg_body = "" > > for part in msg.walk(): > if part.get_content_type() == "text/plain": > msg_body += part.get_payload(decode=True) > > msg_body = msg_body.lower() > msg_body = msg_body.replace("\n", " ") > msg_body = msg_body.replace("\t", " ") > return msg_body For getting a text from html I could use BeautifulSoup. Right now I'm still a layer down (encapsulation etc.) at RFC 2822 stuff. Does anybody knows about some package or code I can throw an email message at and get some kind of text from it? Attachments being discarded, HTML I can take care of... Thanks! Florian -- https://mail.python.org/mailman/listinfo/python-list