I'm trying to split a URL into components. For example: URL = 'http://steve:[EMAIL PROTECTED]:82/dir" + \ 'ectory/file.html;params?query#fragment'
(joining the strings above with plus has no significance, it's just to avoid word-wrapping) If I split the URL, I would like to get the following components: scheme = 'http' netloc = 'steve:[EMAIL PROTECTED]:82' username = 'steve' password = 'secret' hostname = 'www.domain.com.au' port = 82 path = '/directory/file.html' parameters = 'params' query = 'query' fragment = 'fragment' I can get *most* of the way with urlparse.urlparse: it will split the URL into a tuple: ('http', 'steve:[EMAIL PROTECTED]:82', '/directory/file.html', 'params', 'query', 'fragment') If I'm using Python 2.5, I can split the netloc field further with named attributes. Unfortunately, I can't rely on Python 2.5 (for my sins I have to support 2.4). Before I write code to split the netloc field by hand (a nuisance, but doable) I thought I'd ask if there was a function somewhere in the standard library I had missed. This second question isn't specifically Python related, but I'm asking it anyway... I'd also like to split the domain part of a HTTP netloc into top level domain (.au), second level (.com), etc. I don't need to validate the TLD, I just need to split it. Is splitting on dots sufficient, or will that miss some odd corner case of the HTTP specification? (If it does, I might decide to live with the lack... it depends on how odd the corner is, and how much work it takes to fix.) -- Steven. -- http://mail.python.org/mailman/listinfo/python-list