nextToken() calls peekToken(). That seems to prevent my lookahead processing from seeing that item later. Am I missing something?
On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies <ben...@basistech.com> wrote: > I think that the penny just dropped, and I should not be using this class. > > If I call peekToken 10 times while sitting at token 0, this class will > stack up all 10 of these _at token position 0_. That's not really very > helpful for what I'm doing. I need to borrow code from this class and > not use it. > > On Fri, Sep 6, 2013 at 9:10 PM, Benson Margulies <ben...@basistech.com> wrote: >> Michael, >> >> I'm apparently not fully deconfused yet. >> >> I've got a very simple incrementToken function. It calls peekToken to >> stack up the tokens. >> >> afterPosition is never called; I expected it to be called as each of >> the peeked tokens gets next-ed back out. >> >> I assume that I'm missing something simple. >> >> >> public boolean incrementToken() throws IOException { >> if (positions.getMaxPos() < 0) { >> peekSentence(); >> } >> return nextToken(); >> } >> >> >> >> On Fri, Sep 6, 2013 at 8:13 AM, Benson Margulies <ben...@basistech.com> >> wrote: >>> On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless >>> <luc...@mikemccandless.com> wrote: >>>> >>>> On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies <ben...@basistech.com> >>>> wrote: >>>> > I'm trying to work through the logic of reading ahead until I've seen >>>> > marker for the end of a sentence, then applying some analysis to all of >>>> > the >>>> > tokens of the sentence, and then changing some attributes of each token >>>> > to >>>> > reflect the results. >>>> > >>>> > The queue of tokens for a position is just a State, so there isn't an API >>>> > there to set any values. >>>> > >>>> > So do I need to subclass Position for myself, store the additional >>>> > information in there, and set the attributes as each token comes by on >>>> > the >>>> > output side? >>>> >>>> Yes, that sounds right. Either that or, on emitting the eventual >>>> Tokens, apply your logic there (because at that point, after >>>> restoreState, you have access to all the attr values for that token). >>>> >>>> > I would be grateful for a bit more explanation of afterPosition versus >>>> > incrementToken; some of the mock classes call peek from afterPosition, >>>> > and >>>> > I expected to see peek called in incrementToken based on the javadoc. >>>> >>>> afterPosition is where your subclass can "insert" new tokens. >>>> >>>> I think (it's been a while here...) you are allowed to call peekToken >>>> in afterPosition; this is necessary if your logic about inserting >>>> additional tokens leaving a given position depends on future tokens. >>>> >>>> But: are you doing any new token insertion? Or are you just tweaking >>>> the attributes of the tokens that pass through the filter? If it's >>>> the latter then this class may be overkill ... you could make a simple >>>> TokenFilter.incrementToken that just enumerates & saves all input >>>> tokens, does its processing, then returns those tokens one by one, >>>> instead. >>> >>> I'm not adding tokens yet, but I will be soon, so all of this isn't >>> entirely crazy. The underlying capability here includes decompounding. >>> (I have mixed feelings about just adding all the fragments to the >>> token stream, as it can reduce precision, but there isn't an obvious >>> alternative (except perhaps to suppress the super-common ones)). >>> >>> So, to summarize, logic might be: >>> >>> in incrementToken: >>> >>> If positions.getMaxPos() > -1. just return nextToken(). If not, loop >>> calling peekToken to acquire a sentence, process the sentence, and >>> attach the lemmas and compound-pieces to the Position subclass >>> objects. >>> >>> in afterPosition, as each token comes 'into focus', splat the lemma >>> from the Position into the char term attribute, and insert new tokens >>> as needed for the compound components. >>> >>> Thanks, >>> benson >>> >>> >>> >>> >>> >>>> >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org