Re: Offset Questions

2008-03-07 Thread Steve Suppe
Hi Erick, Thanks for the response. I think I'm starting to get the hang of this. That's a really good insight, but I'm wondering how to handle that if a document can have multiple instances of the same field. So, instead of Author, say, City names that are mentioned. But, as you said, I co

Re: Offset Questions (Follow-Up)

2008-03-07 Thread Erick Erickson
Our mails are crossing Not that I know of. But why don't you just index (or maybe just store) a separate field containing your offset information? Something like title_offset with, say, a comma-separated pair denoting char position and length that you then read in at search time and parse.

Re: Offset Questions

2008-03-07 Thread Erick Erickson
What is your analyzer doing? Let's assume you're trying to index the title and that your entire text is "this is a book and HERE IS THE TITLE." I *think* your underlying analyzer should be returning 4 tokens with starts of 20 for HERE, 25 for IS, 28 for THE and 32 for TITTLE, with appropriate en

Re: Offset Questions (Follow-Up)

2008-03-07 Thread Steve Suppe
OK, I think I understand what's going on - it looks like I am able to set the token for the full author name (Say, "Steve Suppe") with the correct offsets, but the analyzer takes it once step further and tokenizes 'Steve' and 'Suppe' which is giving me a lot more generated offsets and is confus

Offset Questions

2008-03-07 Thread Steve Suppe
Hi all, I'm trying to index documents so that a) I have all the documents indexed 'normally' (in that I can search for documents that match certain words, and b) parts of the document that I consider important, such as author and title are ALSO stored in their own indexed fields. I have (a)