That would be awesome, thanks! Mike McCandless
http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 3:40 PM, Benson Margulies <ben...@basistech.com> wrote: > I think I had better build you a test case for this situation, and > attach it to a JIRA. > > On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> Something is wrong; I'm not sure what offhand, but calling peekToken >> 10 times should not stack all tokens @ position 0; it should stack the >> tokens at the positions where they occurred. Are you sure the posIncr >> att is sometimes 1 (i.e., the position is in fact moving forward for >> some tokens)? >> >> nextToken() only calls peekToken() once the lookahead buffer is exhausted. >> >> afterPosition() should be called within nextToken(), for each >> position, once all tokens leaving that position are done. >> >> You use case *should* be working: inside your incrementToken() you >> call peekToken() over and over until you've seen the full sentence >> (saving away any state in your subclass of Position), then nextToken() >> to emit the buffered tokens, and to insert your own tokens when >> afterPosition() is called ... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Sat, Sep 7, 2013 at 1:10 PM, Benson Margulies <ben...@basistech.com> >> wrote: >>> nextToken() calls peekToken(). That seems to prevent my lookahead >>> processing from seeing that item later. Am I missing something? >>> >>> >>> On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies <ben...@basistech.com> >>> wrote: >>>> I think that the penny just dropped, and I should not be using this class. >>>> >>>> If I call peekToken 10 times while sitting at token 0, this class will >>>> stack up all 10 of these _at token position 0_. That's not really very >>>> helpful for what I'm doing. I need to borrow code from this class and >>>> not use it. >>>> >>>> On Fri, Sep 6, 2013 at 9:10 PM, Benson Margulies <ben...@basistech.com> >>>> wrote: >>>>> Michael, >>>>> >>>>> I'm apparently not fully deconfused yet. >>>>> >>>>> I've got a very simple incrementToken function. It calls peekToken to >>>>> stack up the tokens. >>>>> >>>>> afterPosition is never called; I expected it to be called as each of >>>>> the peeked tokens gets next-ed back out. >>>>> >>>>> I assume that I'm missing something simple. >>>>> >>>>> >>>>> public boolean incrementToken() throws IOException { >>>>> if (positions.getMaxPos() < 0) { >>>>> peekSentence(); >>>>> } >>>>> return nextToken(); >>>>> } >>>>> >>>>> >>>>> >>>>> On Fri, Sep 6, 2013 at 8:13 AM, Benson Margulies <ben...@basistech.com> >>>>> wrote: >>>>>> On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless >>>>>> <luc...@mikemccandless.com> wrote: >>>>>>> >>>>>>> On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies <ben...@basistech.com> >>>>>>> wrote: >>>>>>> > I'm trying to work through the logic of reading ahead until I've seen >>>>>>> > marker for the end of a sentence, then applying some analysis to all >>>>>>> > of the >>>>>>> > tokens of the sentence, and then changing some attributes of each >>>>>>> > token to >>>>>>> > reflect the results. >>>>>>> > >>>>>>> > The queue of tokens for a position is just a State, so there isn't an >>>>>>> > API >>>>>>> > there to set any values. >>>>>>> > >>>>>>> > So do I need to subclass Position for myself, store the additional >>>>>>> > information in there, and set the attributes as each token comes by >>>>>>> > on the >>>>>>> > output side? >>>>>>> >>>>>>> Yes, that sounds right. Either that or, on emitting the eventual >>>>>>> Tokens, apply your logic there (because at that point, after >>>>>>> restoreState, you have access to all the attr values for that token). >>>>>>> >>>>>>> > I would be grateful for a bit more explanation of afterPosition versus >>>>>>> > incrementToken; some of the mock classes call peek from >>>>>>> > afterPosition, and >>>>>>> > I expected to see peek called in incrementToken based on the javadoc. >>>>>>> >>>>>>> afterPosition is where your subclass can "insert" new tokens. >>>>>>> >>>>>>> I think (it's been a while here...) you are allowed to call peekToken >>>>>>> in afterPosition; this is necessary if your logic about inserting >>>>>>> additional tokens leaving a given position depends on future tokens. >>>>>>> >>>>>>> But: are you doing any new token insertion? Or are you just tweaking >>>>>>> the attributes of the tokens that pass through the filter? If it's >>>>>>> the latter then this class may be overkill ... you could make a simple >>>>>>> TokenFilter.incrementToken that just enumerates & saves all input >>>>>>> tokens, does its processing, then returns those tokens one by one, >>>>>>> instead. >>>>>> >>>>>> I'm not adding tokens yet, but I will be soon, so all of this isn't >>>>>> entirely crazy. The underlying capability here includes decompounding. >>>>>> (I have mixed feelings about just adding all the fragments to the >>>>>> token stream, as it can reduce precision, but there isn't an obvious >>>>>> alternative (except perhaps to suppress the super-common ones)). >>>>>> >>>>>> So, to summarize, logic might be: >>>>>> >>>>>> in incrementToken: >>>>>> >>>>>> If positions.getMaxPos() > -1. just return nextToken(). If not, loop >>>>>> calling peekToken to acquire a sentence, process the sentence, and >>>>>> attach the lemmas and compound-pieces to the Position subclass >>>>>> objects. >>>>>> >>>>>> in afterPosition, as each token comes 'into focus', splat the lemma >>>>>> from the Position into the char term attribute, and insert new tokens >>>>>> as needed for the compound components. >>>>>> >>>>>> Thanks, >>>>>> benson >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> Mike McCandless >>>>>>> >>>>>>> http://blog.mikemccandless.com >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org