Another example is if you used a stemmer, it might change the termLength:
(walking -> walk), but the offsets of the original unstemmed word (walking)
stay the same.

On Fri, Nov 13, 2009 at 6:01 PM, Uwe Schindler <u...@thetaphi.de> wrote:

> This is not coupled because:
>
> termLength() is the number of chars in the term buffer, where the offsets
> give the offsets in the orginal char stream. If you use a CharFilter to
> e.g.
> remove chars, the termLength will get shorter, but the offset are still the
> original ones. Also both things are indexed in different ways, the
> termLength and offsets have no relation and must (as said before) not even
> follow a contract like end-start=length.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -----Original Message-----
> > From: Babak Farhang [mailto:farh...@gmail.com]
> > Sent: Friday, November 13, 2009 11:50 PM
> > To: java-user@lucene.apache.org
> > Subject: Redundant fields Token class?
> >
> > I'm writing a TokenFilter and am confused about why class Token has
> > both an *endOffset* and a *termLength* field.  It would appear that
> > the following invariant should always hold for a Token instance:
> >
> >     termLength() == endOffset() - startOffset()
> >
> > If so, then
> >
> > 1) Why 2 fields, instead of 1?
> > 2) Why isn't the invariant enforced in the class?
> >
> > -Babak
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

Reply via email to