Re: aligning a set of word substrings to sentence

Steven Bethard Thu, 01 Dec 2005 16:10:56 -0800

Paul McGuire wrote:
> "Steven Bethard" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
> 
>>I've got a list of word substrings (the "tokens") which I need to align
>>to a string of text (the "sentence").  The sentence is basically the
>>concatenation of the token list, with spaces sometimes inserted beetween
>>tokens.  I need to determine the start and end offsets of each token in
>>the sentence.  For example::
>>
>>py> tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?']
>>py> text = '''\
>>... She's gonna write
>>... a book?'''
>>py> list(offsets(tokens, text))
>>[(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)]
> 
> ===================
> from pyparsing import oneOf
> 
> tokens = ['She', "'s", 'gon', 'na', 'write', 'a', 'book', '?']
> text = '''\
> She's gonna write
> a book?'''
> 
> tokenlist = oneOf( " ".join(tokens) )
> offsets = [(start,end) for token,start,end in tokenlist.scanString(text) ]
> 
> print offsets
> ===================
> [(0, 3), (3, 5), (6, 9), (9, 11), (12, 17), (18, 19), (20, 24), (24, 25)]


Now that's a pretty solution. Three cheers for pyparsing! :)

STeVe
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: aligning a set of word substrings to sentence

Reply via email to