Citando Timo Sirainen <t...@iki.fi>: > 1. You notice a non-text/* content-type and initialize text extraction > for the MIME part. Like: > > struct attachment_extract_context * > attachment_extract_init(const char *content_type); > > 2. After this you feed all the input belonging to that MIME part to: > > int attachment_extract_add(struct attachment_extract_context *ctx, > const struct message_block *input); > > Don't output anything to FTS backend at this point. The > attachment_extract_add() would probably just basically write to a > temporary file. > > 3. Finally you'll notice that the MIME part ends (either you get headers > for the next MIME part or the entire message ends). Then finish the > extraction, which actually executes the whatever conversion binaries: > > int attachment_extract_finish(struct attachment_extract_context *ctx); > > 4. Get the resulting text to fts_backend_build_more() somehow. Either > some attachment_extract_add_to_fts() which internally adds it or some > kind of an iterator that returns the text in smaller blocks. Either > would work.. > > That kind of an API would also make it possible to pretty easily modify > in future to not write temporary files for specific content types if > it's not required. >
I tried your approach and I think it is working pretty well. Now I only need to look carefully to the output of external programs and build the XML correctly to send to Solr. Thanks Timo Regards, Rui Carneiro -- Portugalmail, Comunicações S.A. www.portugalmail.net