problem with regex, how to conclude more than one character
I always have no idea about how to express "conclude the entire word" with regexp, while using python, I encountered this problem again... for example, if I want to match the "string" in "test a string", re.findall(r"[^a]* (\w+)","test a string") will work, but what if there is not "a" but "an"(test a string)? the [^an] will failed because it will stop at the first character "a". I guess people not always use this kind of way to filter words? Here comes the real problem I encountered: I want to filter the text both in "" block and the ""'s title attribute ## code # import re content='''LA11/10/20081340/1430PF1/5UnderstandCharismaBooked''' re.findall(r'''([^<]+)([^<]+)([^<]+)([^<]+)" block but I can just get the "title" attribute of the first "" using regexp. for the second, which should be "Charisma" I need to use some kind of [^]* to match "class="MouseCursor">Understand", then I can continue match the second "" block. Maybe I didn't describe this clearly, then feel free to tell me:) thanks for any further reply! -- http://mail.python.org/mailman/listinfo/python-list
Re: problem with regex, how to conclude more than one character
On Nov 7, 3:06 pm, [EMAIL PROTECTED] wrote: > I always have no idea about how to express "conclude the entire word" > with regexp, while using python, I encountered this problem again... > > for example, if I want to match the "string" in "test a string", > re.findall(r"[^a]* (\w+)","test a string") will work, but what if > there is not "a" but "an"(test a string)? the [^an] will failed > because it will stop at the first character "a". > > I guess people not always use this kind of way to filter words? > Here comes the real problem I encountered: > I want to filter the text both in "" block and the ""'s > title attribute > ## code # > import re > content=''' valign="middle">LA11/10/2008 valign="middle">1340/1430PF1/5 valign="middle"> class="MouseCursor">Understand valign="middle">CharismaBooked valign="middle">''' > > re.findall(r'''([^<]+) valign="middle">([^<]+)([^<]+) valign="middle">([^<]+) title="([^"]*)"''',content) > > code end > As you saw above, > I get the results with "LA,11/10/2008,1340/1430,PF1/5,Understanding > the stock market" > there are two "" block but I can just get the "title" attribute > of the first "" using regexp. > for the second, which should be "Charisma" I need to use some kind of > [^]* to match "class="MouseCursor">Understand", > then I can continue match the second "" block. > > Maybe I didn't describe this clearly, then feel free to tell me:) > thanks for any further reply! And by the way, I've tried both (!) and (?:!), many ways doesn't work so sad... -- http://mail.python.org/mailman/listinfo/python-list
Re: problem with regex, how to conclude more than one character
On Nov 7, 3:13 pm, "Chris Rebert" <[EMAIL PROTECTED]> wrote: > On Thu, Nov 6, 2008 at 11:06 PM, <[EMAIL PROTECTED]> wrote: > > I always have no idea about how to express "conclude the entire word" > > with regexp, while using python, I encountered this problem again... > > > for example, if I want to match the "string" in "test a string", > > re.findall(r"[^a]* (\w+)","test a string") will work, but what if > > there is not "a" but "an"(test a string)? the [^an] will failed > > because it will stop at the first character "a". > > > I guess people not always use this kind of way to filter words? > > Here comes the real problem I encountered: > > I want to filter the text both in "" block and the ""'s > > title attribute > > Is there any particularly good reason why you're using regexps for > this rather than, say, an actual (X)HTML parser? > > Cheers, > Chris > -- > Follow the path of the Iguana...http://rebertia.com > > > > > ## code # > > import re > > content=''' > valign="middle">LA11/10/2008 > valign="middle">1340/1430PF1/5 > valign="middle"> > class="MouseCursor">Understand > valign="middle">CharismaBooked > valign="middle">''' > > > re.findall(r'''([^<]+) > valign="middle">([^<]+)([^<]+) > valign="middle">([^<]+) > title="([^"]*)"''',content) > > > code end > > As you saw above, > > I get the results with "LA,11/10/2008,1340/1430,PF1/5,Understanding > > the stock market" > > there are two "" block but I can just get the "title" attribute > > of the first "" using regexp. > > for the second, which should be "Charisma" I need to use some kind of > > [^]* to match "class="MouseCursor">Understand", > > then I can continue match the second "" block. > > > Maybe I didn't describe this clearly, then feel free to tell me:) > > thanks for any further reply! > > -- > >http://mail.python.org/mailman/listinfo/python-list- Hide quoted text - > > - Show quoted text - Really thanks for quickly reply Chris! Actually I tried BeautifulSoup and it's great. But I'm not very familiar with it and it need more codes to parse the html and get the right text. I think regexp is more convenient if there is a way to filter out the list just in one line:) I did this all the way but stopped here... -- http://mail.python.org/mailman/listinfo/python-list