Re: Unescaping URLs in Python

John Nagle Mon, 25 Dec 2006 10:16:02 -0800

Lawrence D'Oliveiro wrote:
> In message <[EMAIL PROTECTED]>, John Nagle
> wrote:
> 
> 
>>Here's a URL from a link on the home page of a major company.
>>
>><a href="/adsk/servlet/index?siteID=123112&amp;id=1860142">About Us</a>
>>
>>What's the appropriate Python function to call to unescape a URL
>>which might contain things like that?
> 
> 
> Just use any HTML-parsing library. I think the standard Python HTMLParser
> will do the trick, provided there aren't any errors in the HTML.


    I'm using BeautifulSoup, because I need to process real world
HTML.  At least by default, it doesn't unescape URLs like that.

    Nor, on the output side, does it escape standalone "&" characters,
as in text like "Sales & Advertising Department".
But there are various BeautifulSoup options; more on this later.

                                John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unescaping URLs in Python

Reply via email to