Hi,

I never got a response to this and thought maybe I was too wordy.

I'm wondering if there's a way where given a position in the original text
you can retrieve the token index that is nearest to that position using the
StandardToken/StandardTokenizer classes?



--JP

On 7/3/07, John Paul Sondag <[EMAIL PROTECTED]> wrote:

Hi,

I was wondering if it's possible to get the token offset based of the
position in the original text.

My problem is I'm working on my own "Snippet Generator" and I'm giving a
token index (call it t) as input and need to make a snippet of the original
text.  I want the Snippet to be some number of tokens (call it n tokens).
But to make the Snippet easier to read I want to see if it's close to the
end of a paragraph (if it is I'll make more of the Snippet before the token
than usual).  So I'm scanning the original text forward some number of
characters looking for a new line or tab.  If I find it I'd like to get the
token before that new line (and it's offset, call it y).  Once I have the
offset I know I have y - t tokens after my token, and finally I know I put
n-(y-t) tokens before my token and can successfully make my Snippet.

Thanks in advance!

--JP

Reply via email to