The cast to TermAttributeImpl may not work if the factory creates a Token... So declare termBuf as TermAttribute (without impl).
To clear, you can always downcast the interface to AttributeImpl. Or create a second variable. Alternatively use my second approach. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Shai Erera [mailto:ser...@gmail.com] > Sent: Sunday, November 22, 2009 8:15 PM > To: java-user@lucene.apache.org > Subject: Re: How to deal with Token in the new TS API > > Did you mean something like: > > TermAttributeImpl termBuf = (TermAttributeImpl) > input.getAttributeFactory().createAttributeInstance(TermAttribute.class); > > I need to use the methods on TermAttributeImpl like clear() ... > > Shai > > On Sun, Nov 22, 2009 at 9:03 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > > I said, you *could* if it would be exposed. But the State is a holder > class > > without functionality. Because the internals are impl dependent, maybe > we > > will add such thing in future. But: If the state contains a real map, it > > would be slow, because each captureState call would need to fill the > map, > > which is slow. And: If you use the Token as AttImpl, the state will only > > contain one entry. You cannot control which attribute is implemented by > > what > > impl, so the map approach would never work correct. > > > > > > > > You can allocate a TermAttributeImpl and copyTo, but you should create > the > > instance using the same factory as the tokenstream uses: > > > > > > > > TermAttribute copy = (TermAttribute) > > getAttributeFactory().createAttributeInstance(TermAttribute.class); > > > > > > > > By that you guarantee, that both are from the same implementation type. > > > > > > > > ----- > > > > Uwe Schindler > > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > > http://www.thetaphi.de > > > > eMail: u...@thetaphi.de > > > > > > > > > -----Original Message----- > > > > > From: Shai Erera [mailto:ser...@gmail.com] > > > > > Sent: Sunday, November 22, 2009 7:53 PM > > > > > To: java-user@lucene.apache.org > > > > > Subject: Re: How to deal with Token in the new TS API > > > > > > > > > > Yes I can clone the term itself by instantiating a TermAttributeImpl, > > > > > which > > > > > is better than storing the String, because the latter always allocates > > > > > char[], while the former will reuse the char[] if it's big enough. > > > > > > > > > > What if State included a HashMap of all attributes, in addition to its > > > > > "linked-list" structure? > > > > > > > > > > Anyway, you mention that I can iterate on all Attributes of a State, > but > > > > > it's not clear to me how to do it, since I don't see any relevant > method > > > > > in > > > > > its API. Am I missing something? > > > > > > > > > > Shai > > > > > > > > > > On Sun, Nov 22, 2009 at 4:42 PM, Uwe Schindler <u...@thetaphi.de> > wrote: > > > > > > > > > > > > Because that'd mean I'll check for abbreviations for every token. > > > > > Which > > > > > > is > > > > > > > a > > > > > > > big performance loss. That way, I can just check abbr if I > > encountered > > > > > a > > > > > > > "." > > > > > > > (not even all end-of-sentence tokens). > > > > > > > > > > > > OK, than simply copy the term to a String and store it. The cost is > the > > > > > > same > > > > > > like cloning/copying. If you find the ".", use the String and look > it > > > > > up. > > > > > > > > > > > > > Why can't State offer a "getAttribute" like AttributeSource? > > > > > > > > > > > > Because State is optimized for fast restore. In previous 2.9 > versions > > > > > State > > > > > > was itself an AttributeSource instance, but the capture/store was > very, > > > > > > very > > > > > > slow. > > > > > > > > > > > > If you want to check an State, you would have need to iterate over > all > > > > > > attributes and find the correct one, which is also slow. The best is > to > > > > > > simply clone the term text as a string. You must create new objects > in > > > > > all > > > > > > cases, even with clone/copy. > > > > > > > > > > > > Uwe > > > > > > > > > > > > > Shai > > > > > > > > > > > > > > On Sun, Nov 22, 2009 at 4:34 PM, Uwe Schindler <u...@thetaphi.de> > > > > > wrote: > > > > > > > > > > > > > > > If you just want to lookup if "Mr" is an abbreviation, why not > look > > > > > it > > > > > > > up > > > > > > > > when you handle that token and set a boolean variable in the TS > > > > > > > > (lastTokenWasAbbreviation). When you process the ".", remove it > if > > > > > the > > > > > > > > Boolean is set. > > > > > > > > > > > > > > > > Uwe > > > > > > > > > > > > > > > > ----- > > > > > > > > Uwe Schindler > > > > > > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > > > > > > http://www.thetaphi.de > > > > > > > > eMail: u...@thetaphi.de > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Shai Erera [mailto:ser...@gmail.com] > > > > > > > > > Sent: Sunday, November 22, 2009 3:28 PM > > > > > > > > > To: java-user@lucene.apache.org > > > > > > > > > Subject: Re: How to deal with Token in the new TS API > > > > > > > > > > > > > > > > > > What I've done is: > > > > > > > > > > > > > > > > > > State state = in.captureState(); > > > > > > > > > ... > > > > > > > > > // Upon new call to incrementToken(). > > > > > > > > > State tmp = in.captureState(); > > > > > > > > > in.restoreState(state); > > > > > > > > > // check if termAttribute is an abbreviation. > > > > > > > > > If not : in.restoreState(tmp); > > > > > > > > > > > > > > > > > > But seems a lot of capturing/restoring to me ... how expensive > is > > > > > > > that? > > > > > > > > > > > > > > > > > > Shai > > > > > > > > > > > > > > > > > > On Sun, Nov 22, 2009 at 3:57 PM, Shai Erera <ser...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Perhaps I misunderstand something. The current use case I'm > > > > > trying > > > > > > > to > > > > > > > > > solve > > > > > > > > > > is - I have an abbreviations TokenFilter which reads a token > > and > > > > > > > stores > > > > > > > > > it. > > > > > > > > > > If the next token is end-of-sentence, it checks whether the > > > > > > previous > > > > > > > > one > > > > > > > > > is > > > > > > > > > > in the abbreviations list, and discards the end-of-sentence > > > > > token. > > > > > > I > > > > > > > > > need to > > > > > > > > > > store the first token somewhere so I can reference it. > > > > > > > > > > > > > > > > > > > > Example: "hello mr. shai" > > > > > > > > > > First token = hello -> store it and return > > > > > > > > > > Second token = mr -> store it and return > > > > > > > > > > Third token = "." -> check if "mr" is an abbreviation, if so > > > > > don't > > > > > > > > > return > > > > > > > > > > ".". > > > > > > > > > > Fourth token = "shai" -> store it and return. > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > How do I store "mr" (or any of the others)? It was easy w/ > > > > > copyTo. > > > > > > > If I > > > > > > > > > > captureState, I get a State, but I can't query it for a > > > > > > > TermAttribute. > > > > > > > > > Any > > > > > > > > > > ideas? > > > > > > > > > > > > > > > > > > > > Shai > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Nov 22, 2009 at 3:33 PM, Uwe Schindler < > > u...@thetaphi.de> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> Use captureState and save the state somewhere. You can > restore > > > > > the > > > > > > > > > state > > > > > > > > > >> with restoreState to the TokenStream. CachingTokenFilter > does > > > > > > this. > > > > > > > > > >> > > > > > > > > > >> So the new API uses the State object to put away tokens for > > > > > later > > > > > > > > > >> reference. > > > > > > > > > >> > > > > > > > > > >> ----- > > > > > > > > > >> Uwe Schindler > > > > > > > > > >> H.-H.-Meier-Allee 63, D-28213 Bremen > > > > > > > > > >> http://www.thetaphi.de > > > > > > > > > >> eMail: u...@thetaphi.de > > > > > > > > > >> > > > > > > > > > >> > -----Original Message----- > > > > > > > > > >> > From: Shai Erera [mailto:ser...@gmail.com] > > > > > > > > > >> > Sent: Sunday, November 22, 2009 2:29 PM > > > > > > > > > >> > To: java-user@lucene.apache.org > > > > > > > > > >> > Subject: Re: How to deal with Token in the new TS API > > > > > > > > > >> > > > > > > > > > > >> > ok so from what I understand, I should stop working w/ > > Token, > > > > > > and > > > > > > > > > move > > > > > > > > > >> to > > > > > > > > > >> > working w/ the Attributes. > > > > > > > > > >> > > > > > > > > > > >> > addAttribute indeed does not work. Even though it does > not > > > > > > > through > > > > > > > > an > > > > > > > > > >> > exception, if I call in.addAttribute(Token.class), I get > a > > > > > new > > > > > > > > > instance > > > > > > > > > >> of > > > > > > > > > >> > Token and not the once that was added by in. So this is > even > > > > > > more > > > > > > > > > severe > > > > > > > > > >> > than just not blocking this option. > > > > > > > > > >> > > > > > > > > > > >> > I thought I can move to use addAttributeImpl, but that > won't > > > > > > help > > > > > > > > me, > > > > > > > > > >> > because I won't be able to call > getAttribute(Token.class). > > > > > > > > > >> > > > > > > > > > > >> > So this leaves me w/ just working w/ the interfaces. > > > > > > > > > >> > > > > > > > > > > >> > What do I need to do in order to clone an attribute? > > > > > Previously > > > > > > I > > > > > > > > > used > > > > > > > > > >> > token.copyTo(target). How I can do it now if I don't have > > > > > copyTo > > > > > > > on > > > > > > > > > the > > > > > > > > > >> > interfaces, and/or clone? > > > > > > > > > >> > > > > > > > > > > >> > Shai > > > > > > > > > >> > > > > > > > > > > >> > On Sun, Nov 22, 2009 at 2:58 PM, Uwe Schindler > > > > > <u...@thetaphi.de > > > > > > > > > > > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > > >> > > > But I do use addAttribute(Token.class), so I don't > > > > > > understand > > > > > > > > why > > > > > > > > > >> you > > > > > > > > > >> > say > > > > > > > > > >> > > > it's not possible. And I completely don't understand > why > > > > > the > > > > > > > new > > > > > > > > > API > > > > > > > > > >> > > > allows > > > > > > > > > >> > > > me to just work w/ interfaces and not impls ... A > while > > > > > ago > > > > > > I > > > > > > > > got > > > > > > > > > >> the > > > > > > > > > >> > > > impression that we're trying to get rid of interfaces > > > > > > because > > > > > > > > > >> they're > > > > > > > > > >> > not > > > > > > > > > >> > > > easy to maintain back-compat with ... > > > > > > > > > >> > > > > > > > > > > > >> > > AddAttribute(Token.class) should throw an Exception, > but > > it > > > > > > > > doesn't > > > > > > > > > >> > (it's a > > > > > > > > > >> > > bug in 3.0). addAttribute should only affect > interfaces, > > it > > > > > > > also > > > > > > > > > >> accepts > > > > > > > > > >> > > Token, because the AttributeFactory accepts it - bang. > > > > > > > > > >> > > > > > > > > > > > >> > > Sorry, but you can only pass attribute class literals > to > > > > > > > > > >> > > addAttribute/getAttribute/hasAttribute and so on. > > > > > > > > > >> > > > > > > > > > > > >> > > Sorry. > > > > > > > > > >> > > > > > > > > > > > >> > > Uwe > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > ---------------------------------------------------------------- > --- > > > > > > > > > -- > > > > > > > > > >> > > To unsubscribe, e-mail: > > > > > > java-user-unsubscr...@lucene.apache.org > > > > > > > > > >> > > For additional commands, e-mail: java-user- > > > > > > > h...@lucene.apache.org > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > ------------------------------------------------------------------- > > > > > > > -- > > > > > > > > > >> To unsubscribe, e-mail: > > java-user-unsubscr...@lucene.apache.org > > > > > > > > > >> For additional commands, e-mail: java-user- > > > > > h...@lucene.apache.org > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- > > > > > - > > > > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > > > > For additional commands, e-mail: java-user- > h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------- > - > > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org