On 15/09/13 11:41, Michael McCandless wrote:
Your understanding is correct: there are two ways to affect the indexed position.
Thanks for the confirmation, took me a while to figure that out :-)
Either approach would work, but if you do the single-field approach, the challenge is in making a TokenFilter that knows when one chunk ended so it could set the position increment.
Yes, I'd have to find a way to pass some metadata into the tokenizer before feeding it each chunk. Kinda messy.
I think it'd be easier to just add multiple field instances?
Yes, that's the conclusion I came to. It's easy enough to do, I'm using JavaMail to recursively traverse the mail file so I can separate out each mail and also deal with multipart mails as well as attachments, which I'm then feeding into Tika.
Thank you for the information :-) -- Alan Burlison -- --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org