On Thu, Jun 25, 2009 at 1:12 PM, Jonas Eckerman<jonas_li...@frukt.org> wrote: >> Already exists, check recent list history for "set_rendered". > > I though that was for text only.
It is only for text. > In any case, any plugin looking for images, or a PDF, will most likely look > at MIME type and/or file name, and then use the "decode" method to get the > data, and AFAICT the "set_rendered" method doesn't have any impact on any of > that. Of course. There are three states for the data in a Message::Node object: - raw: whatever the email had originally. may be encoded, etc. - decoded: the raw content, decoded (ie: base64 or quoted-printable). may be binary. - rendered: the text content. if it was a text part, it's the same as decoded. if it was a html part, the decoded data gets "rendered" into text. if it's anything else, the rendered text is blank because nothing else is supported. The goal with the plugin calls and set_rendered is to allow other plugins to find parts that they understand how to convert into text, and set the rendered version of the part to whatever as appropriate. So if you want to do OCR on image/*, you can do that. If you want to convert PDF/DOC/whatever to text, you can do that. I would comment that plugins should probably skip parts they want to render that already has rendered text available. Rules, Bayes, etc, then take all the rendered parts and use them. > I can't see how "set_rendered" would help in creating a fucntioning chain > where one converter could put an arbitrary extracted object (image, pdf, > whatever) where another converter could have a go at it. Well, you wouldn't do that because there's no point. ;) (feel free to disagree with me though) If a plugin wants to get image/* parts and do something with the contents, they can do that already. If a plugin wants to get application/octet-stream w/ filename "*.pdf" and do something with the contents, they can do that already. If you want to have a plugin do some work on a part's contents, then store that result and let another plugin pick up and continue doing other work ... There's no official method to do that. You can store data as part of the Node object. You could potentially also write a tempfile, though you'll want to be careful to clean up the tempfile as necessary. But what would be a use case for that? I guess something like converting a PDF to a TIFF, then OCR the TIFF? I'd probably implement that as a single plugin w/ "ocr" as a function that gets called from both the PDF and TIFF handlers. Arguably, there could be multiple people developing plugins for different types, but you'd need some coordination for the register_method_priority calls to figure out who goes in what order. (btw: I just found the register_method_priority() method. \o/) Note: Do not try to add or remove parts in the tree. The tree is meant to represent the mime structure of the mail, and each node relates to that specific mime part. The tree is not meant to be a temporary data storage mechanism. Hope this helps.