subject:"Incorrect Token Offset when using multiple fieldable instance"

Re: Incorrect Token Offset when using multiple fieldable instance

2008-07-02 Thread Michael McCandless

Toph wrote: Michael McCandless-2 wrote: We could alternatively extend TokenStream so you could query it for the final offset, then fix indexing to use that value instead of the endOffset of the last token that it saw. Querying the tokenstream for the final offset would good, but then w

Re: Incorrect Token Offset when using multiple fieldable instance

2008-07-02 Thread Toph

ntWriter directly or available as an option? Chris -- View this message in context: http://www.nabble.com/Incorrect-Token-Offset-when-using-multiple-fieldable-instance-tp15833468p18238566.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Incorrect Token Offset when using multiple fieldable instance

2008-07-02 Thread Michael McCandless

nd of each field, but a better API would be if "offset" became a pair of ints, first being the index of the Field for getFields(name) and the second being the offset in that instance of the field. Christopher -- View this message in context: http://www.nabble.com/Incorrect-Token-Off

Re: Incorrect Token Offset when using multiple fieldable instance

2008-06-30 Thread Toph

;offset" became a pair of ints, first being the index of the Field for getFields(name) and the second being the offset in that instance of the field. Christopher -- View this message in context: http://www.nabble.com/Incorrect-Token-Offset-when-using-multiple-fieldable-instance-tp15833468

Re: Incorrect Token Offset when using multiple fieldable instance

2008-03-05 Thread Michael McCandless

Well, first off, sometimes the thing being indexed isn't a string, so you have no stringValue to get its length. It could be a Reader or a TokenStream. Second off, it's conceivable that an analyzer computes its own "interesting" offsets that are not in fact simple indices into the stri

Re: Incorrect Token Offset when using multiple fieldable instance

2008-03-05 Thread Renaud Delbru

Do you know if there will be side-effects if we replace in DocumentWriter$FieldData#invertField offset = offsetEnd+1; by offset = stringValue.length(); I still not understand the reason of such choice for the incrementation of the start offset. Regards. Michael McCandless wrote: This is ho

Re: Incorrect Token Offset when using multiple fieldable instance

2008-03-05 Thread Michael McCandless

This is how Lucene has worked for quite some time (since 1.9). When there are multiple fields with the same name in one Document, each field's offset starts from the last offset (offset of the last token) seen in the previous field. If tokens are skipped at the end there's no way IndexWri

Incorrect Token Offset when using multiple fieldable instance

2008-03-04 Thread Renaud Delbru

Hi, I currently use multiple fieldable instances for indexing sentences of a document. When there is only one single fieldable instance, the token offset generation performed in DocumentWriter is correct. The problem appears when there is two or more fieldable instances. In DocumentWriter$Fiel

Re: Incorrect Token Offset when using multiple fieldable instance

Re: Incorrect Token Offset when using multiple fieldable instance

Re: Incorrect Token Offset when using multiple fieldable instance

Re: Incorrect Token Offset when using multiple fieldable instance

Re: Incorrect Token Offset when using multiple fieldable instance

Re: Incorrect Token Offset when using multiple fieldable instance

Re: Incorrect Token Offset when using multiple fieldable instance

Incorrect Token Offset when using multiple fieldable instance

8 matches

Site Navigation

Mail list logo

Footer information