That's it! Thank you~~ On Apr 7, 2005 11:29 AM, Sidharth Kuruvila <[EMAIL PROTECTED]> wrote: > Reading the documentation on re might be helpfull here :-P > > findall returns a tuple of all the groups in each match. > > You might find finditer usefull. > > for m in re.finditer(url, html) : > print m.group() > > or you could replace all your paranthesis with the non-grouping > version. That is, all brackets (...) with (?:...) > > > On Apr 7, 2005 7:35 AM, could ildg <[EMAIL PROTECTED]> wrote: > > I want to retrieve all urls in a string. When I use re.fiandall, I get > > a list of tuples. > > My code is like below: > > > > [code] > > url=unicode(r"((http|ftp)://)?(((([\d]+\.)+){3}[\d]+(/[\w./]+)?)|([a-z]\w*((\.\w+)+){2,})([/][\w.~]*)*)") > > m=re.findall(url,html) > > for i in m: > > print i > > [/code] > > > > html is a variable of string type which contains many urls in it. > > the code will print many tuples, and each tuple seems not to represent > > a url. e.g, one of them is as below: > > > > (u'http://', u'http', u'image.zhongsou.com/image/netchina.gif', u'', > > u'', u'', u'', u'image.zhongsou.com', u'.com', u'.com', > > u'/netchina.gif') > > > > Why is there two "http" in it? and why are there so many ampty strings > > in the tupe above? It's obviously not a url. How can I get the urls > > correctly? > > > > Thanks in advance. > > -- > > 鹦鹉聪明绝顶、搞笑之极,是人类的好朋友。 > > 直到有一天,我才发觉,我是鹦鹉。 > > 我是翻墙的鹦鹉。 > > -- > > http://mail.python.org/mailman/listinfo/python-list > > > > -- > http://blogs.applibase.net/sidharth > -- > http://mail.python.org/mailman/listinfo/python-list >
-- 鹦鹉聪明绝顶、搞笑之极,是人类的好朋友。 直到有一天,我才发觉,我是鹦鹉。 我是翻墙的鹦鹉。 -- http://mail.python.org/mailman/listinfo/python-list