Hi, I never got a response to this and thought maybe I was too wordy.
I'm wondering if there's a way where given a position in the original text you can retrieve the token index that is nearest to that position using the StandardToken/StandardTokenizer classes? --JP On 7/3/07, John Paul Sondag <[EMAIL PROTECTED]> wrote:
Hi, I was wondering if it's possible to get the token offset based of the position in the original text. My problem is I'm working on my own "Snippet Generator" and I'm giving a token index (call it t) as input and need to make a snippet of the original text. I want the Snippet to be some number of tokens (call it n tokens). But to make the Snippet easier to read I want to see if it's close to the end of a paragraph (if it is I'll make more of the Snippet before the token than usual). So I'm scanning the original text forward some number of characters looking for a new line or tab. If I find it I'd like to get the token before that new line (and it's offset, call it y). Once I have the offset I know I have y - t tokens after my token, and finally I know I put n-(y-t) tokens before my token and can successfully make my Snippet. Thanks in advance! --JP