Re: Automate extract domain

Jon Ribbens via Python-list Sun, 12 May 2019 09:13:43 -0700

On 2019-05-12, Birdep <[email protected]> wrote:
> I am trying to extract domain name from a adblock rule , so what
> pattern should i used to extract domain name only?
>
> import re
> domains = ['ru', ' fr' ,'eu', 'com'] with open('easylist.txt', 'r') as f:
>       a=f.read() result=re.findall(r'[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+',a)
> unique_result = list(set(result))
> for r in unique_result: #Extract domain name out of url domain_name = 
> r.split('.')[1] #Check if domain name is in list of domains, only then add it
> if domain_name in domains: print(r)
>
> this one is labours process for that I have to find extension of all
> domain nd then add it into the domains. So I want something which
> could automate extract domain only


What do you mean by "domain name"? Do you mean just the top level?
In which case you can just do fullname.rsplit(".", 1)[-1]. If you
mean "the registrable domain" (such as example.com, example.co.uk,
etc) then you will need to look at https://publicsuffix.org/
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Automate extract domain

Reply via email to