"Andreas Volz" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > > I used SGMLParser to parse all href's in a html file. Now I need to cut > some strings. For example: > > http://www.example.com/dir/example.html > > Now I like to cut the string, so that only domain and directory is > left over. Expected result: > > http://www.example.com/dir/ > > I know how to do this in bash programming, but not in python. How could > this be done? > > The next problem is not only to extract href's, but also images. A href > is easy: > > <a href="install.php">Install</a> > > But a image is a little harder: > > <img class="bild" src="images/marine.jpg"> >
Check out the urlparse module (in std distribution). For images, you can provide a default addressing scheme, so you can expand "images/marine.jpg" relative to the current location. -- Paul -- http://mail.python.org/mailman/listinfo/python-list