Hi, I've met a problem in match a regular expression in python. Hope any of you could help me. Here are the details:
I have many tags like this: xxx<a href="http://xxx.xxx.xxx" xxx>xxx xxx<a href="wap://xxx.xxx.xxx" xxx>xxx xxx<a href="http://xxx.xxx.xxx" xxx>xxx ..... And I want to find all the "http://xxx.xxx.xxx" out, so I do it like this: httpPat = re.compile("(<a )(href=\")(http://.*)(\")") result = httpPat.findall(data) I use this to observe my output: for i in result: print i[2] Surprisingly I will get some output like this: http://xxx.xxx.xxx">xxx</a>xxx In fact it's filtered from this kind of source: <a href="http://xxx.xxx.xxx">xxx</a>xxx" But some result are right, I wonder how can I get the all the answers clean like "http://xxx.xxx.xxx"? Thanks for your help. Regards, Johnny -- http://mail.python.org/mailman/listinfo/python-list