Another example is if you used a stemmer, it might change the termLength: (walking -> walk), but the offsets of the original unstemmed word (walking) stay the same.
On Fri, Nov 13, 2009 at 6:01 PM, Uwe Schindler <u...@thetaphi.de> wrote: > This is not coupled because: > > termLength() is the number of chars in the term buffer, where the offsets > give the offsets in the orginal char stream. If you use a CharFilter to > e.g. > remove chars, the termLength will get shorter, but the offset are still the > original ones. Also both things are indexed in different ways, the > termLength and offsets have no relation and must (as said before) not even > follow a contract like end-start=length. > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Babak Farhang [mailto:farh...@gmail.com] > > Sent: Friday, November 13, 2009 11:50 PM > > To: java-user@lucene.apache.org > > Subject: Redundant fields Token class? > > > > I'm writing a TokenFilter and am confused about why class Token has > > both an *endOffset* and a *termLength* field. It would appear that > > the following invariant should always hold for a Token instance: > > > > termLength() == endOffset() - startOffset() > > > > If so, then > > > > 1) Why 2 fields, instead of 1? > > 2) Why isn't the invariant enforced in the class? > > > > -Babak > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com