hello dear list! i'm very new to programming and self teaching myself. I'm having a problem with a little project.
I'm trying to preform an fetch-process, but every time i try it i runs into errors. i have read the Python-documents for more than ten hours now! And i have several books here - but they do not help at the moment. This code runs like a charme!! import urllib import urlparse import re url = "http://search.cpan.org/author/?W" html = urllib.urlopen(url).read() for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html): alk = urlparse.urljoin(url, lk) data = { 'url':alk, 'name':name, 'cname':capname } phtml = urllib.urlopen(alk).read() memail = re.search('<a href="mailto:(.*?)">', phtml) if memail: data['email'] = memail.group(1) print data Note the above mentioned code runs very very good. All is nice. Now i want to apply it on a new target. I can learn alot with this ...Let us say on this swiss-site:educa.ch: What is aimed: I want to adopt it on a new target to learn mor about regex and to do some homework - (working as a teacher - and collecting some data bout colleagues) How should we fetch the sites - that is the problem..i want to learn while applying the code...What is necessary to apply the example on the target!? the target: http://www.educa.ch/dyn/79362.asp?action=search But the code (see below) does not run - i tried several things to debug - can yozu help me!? BTW - should i fetch the pages and load them into an array or should i loop over the http://www.educa.ch/dyn/79376.asp?id=2635 http://www.educa.ch/dyn/79376.asp?id=3493 and so on... see the code that does not work!? import urllib import urlparse import re url = "http://www.educa.ch/dyn/" html = urllib.urlopen("http://www.educa.ch/dyn/79362.asp? action=search").read() for capname, lk in re.findall('<a name="\d+"></a><br><img [^>]+>([^<] +).*?<a href="#\d+" onclick="javascript: window.open\(\'(\d+.asp?id=\d +)\'', html): alk = urlparse.urljoin(url, lk) data = { 'url':alk, 'cname':capname } phtml = urllib.urlopen(alk).read() memail = re.search('<a href="mailto.*?)">', phtml) if memail: data['email'] = memail.group(1) print data Look forward to get some starting points... thx matze -- http://mail.python.org/mailman/listinfo/python-list