Am 16.07.2012 13:07, schrieb karsten-s...@gmx.de:

Dear Karsten,

> abstract of your post:
> you need the offset to perform your search/ranking like the position is 
> needed for phrase queries.
> You are using reader.getTermFreqVector to get the offset. 
> This is to slow for your application and you think about a switch to version 
> 4.0

Yes, that's about it.

> imho you should using payloads.
> You also could switch to version 4 because in version you can store the 
> offset to each term like the position in version 3x.
> But this is basically the same as the use of payloads:
>  * http://lucene.apache.org/core/3_6_0/fileformats.html#Positions
>  * 
> http://lucene.apache.org/core/4_0_0-ALPHA/core/org/apache/lucene/codecs/lucene40/Lucene40PostingsFormat.html#Positions

I now use payloads and this fulfils my functional requirements. I was
hoping to avoid that because I am also storing other information in the
Payload which makes it feel a bit messy; especially as it seemed
sensible to me to actually make use of the Offsets field as it already
exists. Anyway, the problem is solved so far, thank you very much!

I still wonder what the purpose of the Offset field is as it is so
inefficient to access. It seems like a wasteful redundancy to even store
the Offsets during indexing, considering that I also store it as a
payload. Or am I missing something?

Best,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to