cut strings and parse for images
Hi, I used SGMLParser to parse all href's in a html file. Now I need to cut some strings. For example: http://www.example.com/dir/example.html Now I like to cut the string, so that only domain and directory is left over. Expected result: http://www.example.com/dir/ I know how to do this in bash programming, but not in python. How could this be done? The next problem is not only to extract href's, but also images. A href is easy: Install But a image is a little harder: This is my current example code: from sgmllib import SGMLParser leach_url = "http://stargus.sourceforge.net/"; class URLLister(SGMLParser): def reset(self): SGMLParser.reset(self) self.urls = [] def start_a(self, attrs): href = [v for k, v in attrs if k=='href'] if href: self.urls.extend(href) if __name__ == "__main__": import urllib usock = urllib.urlopen(leach_url) parser = URLLister() parser.feed(usock.read()) parser.close() usock.close() for url in parser.urls: print url Perhaps you've some tips how to solve this problems? regards Andreas -- http://mail.python.org/mailman/listinfo/python-list
Re: cut strings and parse for images
Am Mon, 06 Dec 2004 20:36:36 GMT schrieb Paul McGuire: > Check out the urlparse module (in std distribution). For images, you > can provide a default addressing scheme, so you can expand > "images/marine.jpg" relative to the current location. Ok, this looks good. But I'm a really newbie to python and not able to create a minimum example. Could you please give me a very small example how to use urlparse? Or point me to an example in the web? regards Andreas -- http://mail.python.org/mailman/listinfo/python-list
regex syntax
Hi, ich kann nicht gut regex, aber für das nötigste reicht es eigentlich. Irgendwie komm ich aber mit der Syntax der re.* Befehle in Python nicht klar Vielleicht kann mir das an diesem Beispiel jemand zeigen: string = "bild.jpg" ich möchte jetzt einfach wissen ob in dem string ein ".jpg" vorkommt oder nicht und dann eine Entscheidung treffen. Ich hab mir schon überlegt einfach die letzten viel Stellen des strings "per Hand" auf die Zeichenfolge zu vergleichen und so regex zu umgehen. Aber ich muss es irgendwann ja doch mal nutzen ;-) Gruß Andreas -- http://mail.python.org/mailman/listinfo/python-list
Re: cut strings and parse for images
Am Tue, 07 Dec 2004 00:40:02 GMT schrieb Paul McGuire: > Is this in the ballpark of where you are trying to go? Yes, thanks. You helped me a lot. Andreas -- http://mail.python.org/mailman/listinfo/python-list
Re: regex syntax
Am Mon, 6 Dec 2004 17:24:35 -0800 (PST) schrieb [EMAIL PROTECTED]: > Ich kann nicht spricht Deutch, aber: Ahh! Sorry for this! It was a mistake :-( regards Andreas -- http://mail.python.org/mailman/listinfo/python-list
Re: regex syntax
Am 6 Dec 2004 17:43:21 -0800 schrieb [EMAIL PROTECTED]: > viel besser als das vergleichbare Regexp: > > >>> re.match('.*\.jpg$', filename) Ok,now I've choosen this regex: > '.*\.(?i)jpe?g' to get .jpg .JPG .jpeg .JPEG seems to work. Is this correct? regards Andreas -- http://mail.python.org/mailman/listinfo/python-list
regex for url paramter
Hi, I try to extract a http target from a URL that is given as parameter. urlparse couldn't really help me. I tried it like this url="http://www.example.com/example.html?url=http://www.example.org/exa mple.html" p = re.compile( '.*url=') url = p.sub( '', url) print url > http://www.example.org/example.html This works, but if there're more parameters it doesn't work: url2="http://www.example.com/example.html?url=http://www.example.org/exa mple.html¶m=1" p = re.compile( '.*url=') url2 = p.sub( '', url2) print url2 > http://www.example.org/example.html¶m=1 I played with regex to find one that matches also second case with multible parameters. I think it's easy, but I don't know how to do. Can you help me? regards Andreas -- http://mail.python.org/mailman/listinfo/python-list