Re: changing URLs in webpages, python solutions?

Simon Forman Sun, 18 Jan 2009 16:40:51 -0800

On Jan 18, 8:40 am, Stefan Behnel <stefan...@behnel.de> wrote:
> Simon Forman wrote:
> > I want to take a webpage, find all URLs (links, img src, etc.) and
> > rewrite them in-place, and I'd like to do it in python (pure python
> > preferred.)
>
> lxml.html has functions specifically for this problem.
>
> http://codespeak.net/lxml/lxmlhtml.html#working-with-links
>
> Code would be something like
>
>         html_doc = lxml.html.parse(b"http://.../xyz.html";)
>         html_doc.rewrite_links( ... )
>         print( lxml.html.tostring(html_doc) )
>
> It also handles links in CSS or JavaScript, as well as broken HTML documents.
>
> Stefan


Thank you so much! This is exactly what I needed.

(for what it's worth, parse() seems to return a "plain" ElementTree
object but document_fromstring() gave me a special html object with
the rewrite_links() method.  I think, but I didn't try it, that

html_doc = lxml.html.parse(b"http://.../xyz.html";)
lxml.html.rewrite_links(html_doc, ... )

would work too.)

Thanks again!
Regards,
~Simon
--
http://mail.python.org/mailman/listinfo/python-list

Re: changing URLs in webpages, python solutions?

Reply via email to