Because that'd mean I'll check for abbreviations for every token. Which is a big performance loss. That way, I can just check abbr if I encountered a "." (not even all end-of-sentence tokens).
Why can't State offer a "getAttribute" like AttributeSource? Shai On Sun, Nov 22, 2009 at 4:34 PM, Uwe Schindler <u...@thetaphi.de> wrote: > If you just want to lookup if "Mr" is an abbreviation, why not look it up > when you handle that token and set a boolean variable in the TS > (lastTokenWasAbbreviation). When you process the ".", remove it if the > Boolean is set. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Shai Erera [mailto:ser...@gmail.com] > > Sent: Sunday, November 22, 2009 3:28 PM > > To: java-user@lucene.apache.org > > Subject: Re: How to deal with Token in the new TS API > > > > What I've done is: > > > > State state = in.captureState(); > > ... > > // Upon new call to incrementToken(). > > State tmp = in.captureState(); > > in.restoreState(state); > > // check if termAttribute is an abbreviation. > > If not : in.restoreState(tmp); > > > > But seems a lot of capturing/restoring to me ... how expensive is that? > > > > Shai > > > > On Sun, Nov 22, 2009 at 3:57 PM, Shai Erera <ser...@gmail.com> wrote: > > > > > Perhaps I misunderstand something. The current use case I'm trying to > > solve > > > is - I have an abbreviations TokenFilter which reads a token and stores > > it. > > > If the next token is end-of-sentence, it checks whether the previous > one > > is > > > in the abbreviations list, and discards the end-of-sentence token. I > > need to > > > store the first token somewhere so I can reference it. > > > > > > Example: "hello mr. shai" > > > First token = hello -> store it and return > > > Second token = mr -> store it and return > > > Third token = "." -> check if "mr" is an abbreviation, if so don't > > return > > > ".". > > > Fourth token = "shai" -> store it and return. > > > ... > > > > > > How do I store "mr" (or any of the others)? It was easy w/ copyTo. If I > > > captureState, I get a State, but I can't query it for a TermAttribute. > > Any > > > ideas? > > > > > > Shai > > > > > > > > > On Sun, Nov 22, 2009 at 3:33 PM, Uwe Schindler <u...@thetaphi.de> > wrote: > > > > > >> Use captureState and save the state somewhere. You can restore the > > state > > >> with restoreState to the TokenStream. CachingTokenFilter does this. > > >> > > >> So the new API uses the State object to put away tokens for later > > >> reference. > > >> > > >> ----- > > >> Uwe Schindler > > >> H.-H.-Meier-Allee 63, D-28213 Bremen > > >> http://www.thetaphi.de > > >> eMail: u...@thetaphi.de > > >> > > >> > -----Original Message----- > > >> > From: Shai Erera [mailto:ser...@gmail.com] > > >> > Sent: Sunday, November 22, 2009 2:29 PM > > >> > To: java-user@lucene.apache.org > > >> > Subject: Re: How to deal with Token in the new TS API > > >> > > > >> > ok so from what I understand, I should stop working w/ Token, and > > move > > >> to > > >> > working w/ the Attributes. > > >> > > > >> > addAttribute indeed does not work. Even though it does not through > an > > >> > exception, if I call in.addAttribute(Token.class), I get a new > > instance > > >> of > > >> > Token and not the once that was added by in. So this is even more > > severe > > >> > than just not blocking this option. > > >> > > > >> > I thought I can move to use addAttributeImpl, but that won't help > me, > > >> > because I won't be able to call getAttribute(Token.class). > > >> > > > >> > So this leaves me w/ just working w/ the interfaces. > > >> > > > >> > What do I need to do in order to clone an attribute? Previously I > > used > > >> > token.copyTo(target). How I can do it now if I don't have copyTo on > > the > > >> > interfaces, and/or clone? > > >> > > > >> > Shai > > >> > > > >> > On Sun, Nov 22, 2009 at 2:58 PM, Uwe Schindler <u...@thetaphi.de> > > wrote: > > >> > > > >> > > > But I do use addAttribute(Token.class), so I don't understand > why > > >> you > > >> > say > > >> > > > it's not possible. And I completely don't understand why the new > > API > > >> > > > allows > > >> > > > me to just work w/ interfaces and not impls ... A while ago I > got > > >> the > > >> > > > impression that we're trying to get rid of interfaces because > > >> they're > > >> > not > > >> > > > easy to maintain back-compat with ... > > >> > > > > >> > > AddAttribute(Token.class) should throw an Exception, but it > doesn't > > >> > (it's a > > >> > > bug in 3.0). addAttribute should only affect interfaces, it also > > >> accepts > > >> > > Token, because the AttributeFactory accepts it - bang. > > >> > > > > >> > > Sorry, but you can only pass attribute class literals to > > >> > > addAttribute/getAttribute/hasAttribute and so on. > > >> > > > > >> > > Sorry. > > >> > > > > >> > > Uwe > > >> > > > > >> > > > > >> > > > ------------------------------------------------------------------- > > -- > > >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > > > >> > > > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >