Re: Question about startOffset and endOffset

2008-05-12 Thread Brendan Grainger
Hi Erick, Thanks for the reply. The use case I have is this: Say you have a synonym expansion like this: ac -> air conditioning And to keep it simple, a document where the first term is ac. When analyzing the document I currently create a token stream that looks something like this for the

Re: Question about startOffset and endOffset

2008-05-12 Thread Karl Wettin
Erick Erickson skrev: Offhand, I expect this will affect up span queries, phrase queries, and who knows what else? Maybe scoring? I belive that the offsets are just meta data stored with the term vectors, used by the highlighter et c. Phrase and span queries use term position in the stream (p

Re: Question about startOffset and endOffset

2008-05-12 Thread Erick Erickson
Is this a theoretical question or is there a use-case you're trying to support? If the latter, a statement of the problem you're trying to solve would be helpful. If the former, setting all your start offsets to 0 seems wrong. You're essentially saying that all tokens are at the beginning of the d

Question about startOffset and endOffset

2008-05-12 Thread Brendan Grainger
Hi, I have a TokenStream that inserts synonym tokens into the stream when matched. One thing I am wondering about is what is the effect of the startOffset and endOffset. I have something like this: Token synonymToken = new Token(originalToken.startOffset(), originalToken.endOffset(), "SYN