On Apr 15, 2007, at 4:24 PM, [EMAIL PROTECTED] wrote: > On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: >> In <[EMAIL PROTECTED]>, >> Marko.Cain.23 >> wrote: >> >> >> >>> On Apr 14, 10:36 am, [EMAIL PROTECTED] wrote: >>>> On Apr 14, 12:02 am, Michael Bentley <[EMAIL PROTECTED]> >>>> wrote: >> >>>>> On Apr 13, 2007, at 11:49 PM, [EMAIL PROTECTED] wrote: >> >>>>>> Hi, >> >>>>>> I have a list of url names like this, and I am trying to strip >>>>>> out the >>>>>> domain name using the following code: >> >>>>>> http://www.cnn.com >>>>>> www.yahoo.com >>>>>> http://www.ebay.co.uk >> >>>>>> pattern = re.compile("http:\\\\(.*)\.(.*)", re.S) >>>>>> match = re.findall(pattern, line) >> >>>>>> if (match): >>>>>> s1, s2 = match[0] >> >>>>>> print s2 >> >>>>>> but none of the site matched, can you please tell me what am i >>>>>> missing? >> >>>>> change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile >>>>> ("http:\/ >>>>> \/(.*)\.(.*)", re.S) >> >>>> Thanks. I try this: >> >>>> but when the 'line' ishttp://www.cnn.com, I get 's2' com, >>>> but i want 'cnn.com' (everything after the first '.'), how can I do >>>> that? >> >>>> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S) >> >>>> match = re.findall(pattern, line) >> >>>> if (match): >> >>>> s1, s2 = match[0] >> >>>> print s2 >> >>> Can anyone please help me with my problem? I still can't solve it. >> >>> Basically, I want to strip out the text after the first '.' in url >>> address: >> >>> http://www.cnn.com-> cnn.com >> >> from urlparse import urlsplit >> >> def get_domain(url): >> net_location = urlsplit(url)[1] >> return '.'.join(net_location.rsplit('.', 2)[-2:]) >> >> def main(): >> print get_domain('http://www.cnn.com') >> >> Ciao, >> Marc 'BlackJack' Rintsch > > Thanks for your help. > > But if the input string is "http://www.ebay.co.uk/", I only get > "co.uk" > > how can I change it so that it works for both www.ebay.co.uk and > www.cnn.com? >
from urlparse import urlsplit def get_domain(url): net_location = ( urlsplit(url)[1] and urlsplit(url)[1].split('.') or urlsplit(url)[2].split('.') ) # tricksy way to get long line into email if net_location[0].lower() == 'www': net_location = net_location[1:] return '.'.join(net_location) def main(): testItems = ['http://www.cnn.com', 'www.yahoo.com', 'http://www.ebay.co.uk'] for testItem in testItems: print get_domain(testItem) if __name__ == '__main__': main() -- http://mail.python.org/mailman/listinfo/python-list