Lawrence D'Oliveiro wrote: > In message <[EMAIL PROTECTED]>, John Nagle > wrote: > > >>Here's a URL from a link on the home page of a major company. >> >><a href="/adsk/servlet/index?siteID=123112&id=1860142">About Us</a> >> >>What's the appropriate Python function to call to unescape a URL >>which might contain things like that? > > > Just use any HTML-parsing library. I think the standard Python HTMLParser > will do the trick, provided there aren't any errors in the HTML.
I'm using BeautifulSoup, because I need to process real world HTML. At least by default, it doesn't unescape URLs like that. Nor, on the output side, does it escape standalone "&" characters, as in text like "Sales & Advertising Department". But there are various BeautifulSoup options; more on this later. John Nagle -- http://mail.python.org/mailman/listinfo/python-list