Steven D'Aprano wrote: > I'm using urllib.urlretrieve() to download HTML pages, and I've hit a > snag with URLs containing ampersands: > > http://www.example.com/parrot.php?x=1&y=2 > > Somewhere in the process, urls like the above are escaped to: > > http://www.example.com/parrot.php?x=1&y=2 > > which naturally fails to exist. > > I could just do a string replace, but is there a "right" way to escape > and unescape URLs? I've looked through the standard lib, but I can't find > anything helpful.
I don't believe there is a concept of 'escaping a URL' as such. How you escape or unescape a URL depends on what context you're embedding it in or extracting it from. In this case, it looks like you have URLs which have been escaped to go into an html CDATA attribute value (such as <a href="...">). I believe there is no documented function in the Python standard library which reverses this escaping (short of putting your string into a larger document and parsing that with a full html or xml parser). -M- -- http://mail.python.org/mailman/listinfo/python-list