On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva <s.selvams...@gmail.com> wrote: > Hi all, > > I need to extract the domain-name from a given url(without sub-domains). > With urlparse, i am able to fetch only the domain-name(which includes the > sub-domain also). > > eg: > http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/, > .... all must lead to huffingtonpost.com or huffingtonpost.de > > Please suggest me some ideas regarding this problem.
That would require (pardon the pun) domain-specific logic. For most TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org, etc. But for ccTLDs, often only second-level registrations are allowed, e.g. for www.bbc.co.uk, so the main domain name would be bbc.co.uk I think a few TLDs have even more complicated rules. I doubt anyone's created a general ready-made solution for this, you'd have to code it yourself. To handle the common case, you can cheat and just .split() at the periods and then slice and rejoin the list of domain parts, ex: '.'.join(domain.split('.')[-2:]) Cheers, Chris -- Follow the path of the Iguana... http://rebertia.com -- http://mail.python.org/mailman/listinfo/python-list