"Andreas Volz" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Hi,
>
> I used SGMLParser to parse all href's in a html file. Now I need to cut
> some strings. For example:
>
> http://www.example.com/dir/example.html
>
> Now I like to cut the string, so that only domain and directory is
> left over. Expected result:
>
> http://www.example.com/dir/
>
> I know how to do this in bash programming, but not in python. How could
> this be done?
>
> The next problem is not only to extract href's, but also images. A href
> is easy:
>
> <a href="install.php">Install</a>
>
> But a image is a little harder:
>
> <img class="bild" src="images/marine.jpg">
>

Check out the urlparse module (in std distribution).  For images, you can
provide a default addressing scheme, so you can expand "images/marine.jpg"
relative to the current location.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to