On Thursday, 19 April 2012 07:11:54 UTC+1, Sania wrote:
> Hi,
> So I am trying to get the number of casualties in a text. After 'death
> toll' in the text the number I need is presented as you can see from
> the variable called text. Here is my code
> I'm pretty sure my regex is correct, I think it's the group part
> that's the problem.
> I am using nltk by python. Group grabs the string in parenthesis and
> stores it in deadnum and I make deadnum into a list.
>
> text="accounts put the death toll at 637 and those missing at
> 653 , but the total number is likely to be much bigger"
> dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
> deadnum=dead.group(1)
> deaths.append(deadnum)
> print deaths
>
> Any help would be appreciated,
> Thank you,
> Sania
Or just don't fully rely on a regex. I would, for time, and the little sanity I
believe I have left, would just do something like:
death_toll = re.search(r'death toll.*\d+', text).group().rsplit(' ', 1)[1]
hth,
Jon.
--
http://mail.python.org/mailman/listinfo/python-list