Re: Extracting real-domain-name (without sub-domains) from a given URL

Chris Rebert Tue, 13 Jan 2009 00:21:28 -0800

On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva <[email protected]> wrote:
> Hi all,
>
>   I need to extract the domain-name from a given url(without sub-domains).
> With urlparse, i am able to fetch only the domain-name(which includes the
> sub-domain also).
>
> eg:
>   http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/,
> .... all must lead to huffingtonpost.com or huffingtonpost.de
>
> Please suggest me some ideas regarding this problem.


That would require (pardon the pun) domain-specific logic. For most
TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org,
etc. But for ccTLDs, often only second-level registrations are
allowed, e.g. for www.bbc.co.uk, so the main domain name would be
bbc.co.uk  I think a few TLDs have even more complicated rules.

I doubt anyone's created a general ready-made solution for this, you'd
have to code it yourself.
To handle the common case, you can cheat and just .split() at the
periods and then slice and rejoin the list of domain parts, ex:
'.'.join(domain.split('.')[-2:])

Cheers,
Chris

-- 
Follow the path of the Iguana...
http://rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list

Re: Extracting real-domain-name (without sub-domains) from a given URL

Reply via email to