On 2019-05-12, Birdep <bir...@free.net> wrote: > I am trying to extract domain name from a adblock rule , so what > pattern should i used to extract domain name only? > > import re > domains = ['ru', ' fr' ,'eu', 'com'] with open('easylist.txt', 'r') as f: > a=f.read() result=re.findall(r'[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+',a) > unique_result = list(set(result)) > for r in unique_result: #Extract domain name out of url domain_name = > r.split('.')[1] #Check if domain name is in list of domains, only then add it > if domain_name in domains: print(r) > > this one is labours process for that I have to find extension of all > domain nd then add it into the domains. So I want something which > could automate extract domain only
What do you mean by "domain name"? Do you mean just the top level? In which case you can just do fullname.rsplit(".", 1)[-1]. If you mean "the registrable domain" (such as example.com, example.co.uk, etc) then you will need to look at https://publicsuffix.org/ -- https://mail.python.org/mailman/listinfo/python-list