problem with regex, how to conclude more than one character

2008-11-06 Thread tecspring
I always have no idea about how to express "conclude the entire word"
with regexp,  while using python, I encountered this problem again...

for example, if I want to match the "string" in "test a string",
re.findall(r"[^a]* (\w+)","test a string") will work, but what if
there is not "a" but "an"(test a string)? the [^an] will failed
because it will stop at the first character "a".

I guess people not always use this kind of way to filter words?
Here comes the real problem I encountered:
I want to filter the text both in "" block and the ""'s
title attribute
## code #
import re
content='''LA11/10/20081340/1430PF1/5UnderstandCharismaBooked'''

re.findall(r'''([^<]+)([^<]+)([^<]+)([^<]+)" block but I can just get the "title" attribute
of the first "" using regexp.
for the second, which should be "Charisma" I need to use some kind of
[^]* to match "class="MouseCursor">Understand",
then I can continue match the second "" block.

Maybe I didn't describe this clearly, then feel free to tell me:)
thanks for any further reply!
--
http://mail.python.org/mailman/listinfo/python-list


Re: problem with regex, how to conclude more than one character

2008-11-06 Thread tecspring
On Nov 7, 3:06 pm, [EMAIL PROTECTED] wrote:
> I always have no idea about how to express "conclude the entire word"
> with regexp,  while using python, I encountered this problem again...
>
> for example, if I want to match the "string" in "test a string",
> re.findall(r"[^a]* (\w+)","test a string") will work, but what if
> there is not "a" but "an"(test a string)? the [^an] will failed
> because it will stop at the first character "a".
>
> I guess people not always use this kind of way to filter words?
> Here comes the real problem I encountered:
> I want to filter the text both in "" block and the ""'s
> title attribute
> ## code #
> import re
> content=''' valign="middle">LA11/10/2008 valign="middle">1340/1430PF1/5 valign="middle"> class="MouseCursor">Understand valign="middle">CharismaBooked valign="middle">'''
>
> re.findall(r'''([^<]+) valign="middle">([^<]+)([^<]+) valign="middle">([^<]+) title="([^"]*)"''',content)
>
>  code end 
> As you saw above,
> I get the results with "LA,11/10/2008,1340/1430,PF1/5,Understanding
> the stock market"
> there are two "" block but I can just get the "title" attribute
> of the first "" using regexp.
> for the second, which should be "Charisma" I need to use some kind of
> [^]* to match "class="MouseCursor">Understand",
> then I can continue match the second "" block.
>
> Maybe I didn't describe this clearly, then feel free to tell me:)
> thanks for any further reply!

And by the way, I've tried both (!) and (?:!), many ways
doesn't work so sad...
--
http://mail.python.org/mailman/listinfo/python-list


Re: problem with regex, how to conclude more than one character

2008-11-07 Thread tecspring
On Nov 7, 3:13 pm, "Chris Rebert" <[EMAIL PROTECTED]> wrote:
> On Thu, Nov 6, 2008 at 11:06 PM,  <[EMAIL PROTECTED]> wrote:
> > I always have no idea about how to express "conclude the entire word"
> > with regexp,  while using python, I encountered this problem again...
>
> > for example, if I want to match the "string" in "test a string",
> > re.findall(r"[^a]* (\w+)","test a string") will work, but what if
> > there is not "a" but "an"(test a string)? the [^an] will failed
> > because it will stop at the first character "a".
>
> > I guess people not always use this kind of way to filter words?
> > Here comes the real problem I encountered:
> > I want to filter the text both in "" block and the ""'s
> > title attribute
>
> Is there any particularly good reason why you're using regexps for
> this rather than, say, an actual (X)HTML parser?
>
> Cheers,
> Chris
> --
> Follow the path of the Iguana...http://rebertia.com
>
>
>
> > ## code #
> > import re
> > content=''' > valign="middle">LA11/10/2008 > valign="middle">1340/1430PF1/5 > valign="middle"> > class="MouseCursor">Understand > valign="middle">CharismaBooked > valign="middle">'''
>
> > re.findall(r'''([^<]+) > valign="middle">([^<]+)([^<]+) > valign="middle">([^<]+) > title="([^"]*)"''',content)
>
> >  code end 
> > As you saw above,
> > I get the results with "LA,11/10/2008,1340/1430,PF1/5,Understanding
> > the stock market"
> > there are two "" block but I can just get the "title" attribute
> > of the first "" using regexp.
> > for the second, which should be "Charisma" I need to use some kind of
> > [^]* to match "class="MouseCursor">Understand",
> > then I can continue match the second "" block.
>
> > Maybe I didn't describe this clearly, then feel free to tell me:)
> > thanks for any further reply!
> > --
> >http://mail.python.org/mailman/listinfo/python-list- Hide quoted text -
>
> - Show quoted text -

Really thanks for quickly reply Chris!
Actually I tried BeautifulSoup and it's great.
But I'm not very familiar with it and it need more codes to parse the
html and get the right text.
I think regexp is more convenient if there is a way to filter out the
list just in one line:)
I did this all the way but stopped here...
--
http://mail.python.org/mailman/listinfo/python-list