Thank you for all tips. The design look more clear to me now. I have one more question. I looked into fts_build_want_index_part() and I saw that I need to add some flags to message_part_flags, what values should I choose? My first approach was to follow your schema and set MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this?
I already had changed parse_content_type() to set ctx->part->flags correctly but if i choose my custom flag dovecot assume that all attachment lines are headers. I already tried to set those ctx->part->flags as TEXT and the fts_backend was feeded correctly with all attachment lines. I don't know if this is related with the value of MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting block.hdr = NULL or some more code to handle new flags). Thank you, Rui Carneiro On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen <[email protected]> wrote: > On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote: > > I didn't understood yet what is the plugin's design and how the plugins > are > > called from the core system and I was wondering if anyone could help me > with > > that. > > fts-storage.c hooks into all the functions in mail-storage API that it > needs to. Currently indexing isn't done while messages are being saved, > but instead just before searching. The searching functions are: > > - fts_mailbox_search_init() tries to figure out if FTS can optimize the > search. If it does, it tries to figure out if FTS index is up-to-date > and if not, starts the search. > > - fts_mailbox_search_next_nonblock() continues the indexing (or > searching after indexing) for a while. The idea is that IMAP connection > is able to process other commands while doing a long-running search. So > fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It > would be nice if that value was dynamically calculated and also based on > bytes instead of messages, but that's maybe too much trouble. > > - fts_mailbox_search_next_update_seq() uses the fts search results and > updates mail-storage's search stuff so that it doesn't go through > messages that don't match. > > - fts_build_mail() indexes a single mail. It parses the messages and > returns the data in small blocks. For text/* and message/rfc822 parts > those blocks are currently sent to FTS backend. This is where I think > you should look into hooking your attachment parsing. Change > fts_build_want_index_part() to look for more content-types that you're > interested in and then before feeding the blocks to FTS backend put them > through your own converter function, something like: > > int attachment_extract_text(struct attachment_extract_context *ctx, > const struct message_block *input, struct message_block *output); > > > -- mobile: +351 963446125 mail: [email protected] mail: [email protected] website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>
