# http://gist.github.com/271661
import lxml.html import re src = """ lksjdfls <div id ='amazon_345343'> kdjff lsdfs </div> sdjfls <div id = "amazon_35343433">sdfsd</div><div id='amazon_8898'>welcome</div> hello, my age is 86 years old and I was born in 1945. Do you know that PI is roughly 3.1443534534534534534 """ regex = re.compile('amazon_(\d+)') doc = lxml.html.document_fromstring(src) for div in doc.xpath('//div[starts-with(@id, "amazon_")]'): match = regex.match(div.get('id')) if match: print match.groups()[0] On Thu, Jan 7, 2010 at 4:42 PM, Aahz <a...@pythoncraft.com> wrote: > In article > <19de1d6e-5ba9-42b5-9221-ed7246e39...@u36g2000prn.googlegroups.com>, > Oltmans <rolf.oltm...@gmail.com> wrote: >> >>I've written this regex that's kind of working >>re.findall("\w+\s*\W+amazon_(\d+)",str) >> >>but I was just wondering that there might be a better RegEx to do that >>same thing. Can you kindly suggest a better/improved Regex. Thank you >>in advance. > > 'Some people, when confronted with a problem, think "I know, I'll use > regular expressions." Now they have two problems.' > --Jamie Zawinski > > Take the advice other people gave you and use BeautifulSoup. > -- > Aahz (a...@pythoncraft.com) <*> http://www.pythoncraft.com/ > > "If you think it's expensive to hire a professional to do the job, wait > until you hire an amateur." --Red Adair > -- > http://mail.python.org/mailman/listinfo/python-list > -- Rolando Espinoza La fuente www.rolandoespinoza.info -- http://mail.python.org/mailman/listinfo/python-list