Philipp Kraus wrote: > I have create a short script: > > --------- > #!/usr/bin/env python > > import re, urllib2 > > > def URLReader(url) : > f = urllib2.urlopen(url) > data = f.read() > f.close() > return data > > > print re.match( "\<small\ \>.*\<\/small\>", > URLReader("http://sourceforge.net/projects/boost/") ) > --------- > > Within the data the string "<small>boost_1_56_0.tar.gz</small>" should > be machted, but I get always a None result on the re.match, re.search > returns also a None.
>>> help(re.match) Help on function match in module re: match(pattern, string, flags=0) Try to apply the pattern at the start of the string, returning a match object, or None if no match was found. As the string doesn't start with your regex re.match() is clearly wrong, but re.search() works for me: >>> import re, urllib2 >>> >>> >>> def URLReader(url) : ... f = urllib2.urlopen(url) ... data = f.read() ... f.close() ... return data ... >>> data = URLReader("http://sourceforge.net/projects/boost/") >>> re.search("\<small\ \>.*\<\/small\>", data) <_sre.SRE_Match object at 0x7f282dd58718> >>> _.group() '<small >boost_1_56_pdf.7z</small>' > I have tested the regex under http://regex101.com/ with the HTML code > and on the page the regex is matched. > > Can you help me please to fix the problem, I don't understand that the > match returns None -- https://mail.python.org/mailman/listinfo/python-list