[EMAIL PROTECTED] wrote: > On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: >> In <[EMAIL PROTECTED]>, Marko.Cain.23 >> wrote: >> >> >> >>> On Apr 14, 10:36 am, [EMAIL PROTECTED] wrote: >>>> On Apr 14, 12:02 am, Michael Bentley <[EMAIL PROTECTED]> >>>> wrote: >>>>> On Apr 13, 2007, at 11:49 PM, [EMAIL PROTECTED] wrote: >>>>>> Hi, >>>>>> I have a list of url names like this, and I am trying to strip out the >>>>>> domain name using the following code: >>>>>> http://www.cnn.com >>>>>> www.yahoo.com >>>>>> http://www.ebay.co.uk >>>>>> pattern = re.compile("http:\\\\(.*)\.(.*)", re.S) >>>>>> match = re.findall(pattern, line) >>>>>> if (match): >>>>>> s1, s2 = match[0] >>>>>> print s2 >>>>>> but none of the site matched, can you please tell me what am i >>>>>> missing? >>>>> change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/ >>>>> \/(.*)\.(.*)", re.S) >>>> Thanks. I try this: >>>> but when the 'line' ishttp://www.cnn.com, I get 's2' com, >>>> but i want 'cnn.com' (everything after the first '.'), how can I do >>>> that? >>>> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S) >>>> match = re.findall(pattern, line) >>>> if (match): >>>> s1, s2 = match[0] >>>> print s2 >>> Can anyone please help me with my problem? I still can't solve it. >>> Basically, I want to strip out the text after the first '.' in url >>> address: >>> http://www.cnn.com-> cnn.com >> from urlparse import urlsplit >> >> def get_domain(url): >> net_location = urlsplit(url)[1] >> return '.'.join(net_location.rsplit('.', 2)[-2:]) >> >> def main(): >> print get_domain('http://www.cnn.com') >> >> Ciao, >> Marc 'BlackJack' Rintsch > > Thanks for your help. > > But if the input string is "http://www.ebay.co.uk/", I only get > "co.uk" > > how can I change it so that it works for both www.ebay.co.uk and www.cnn.com? > >>> def get_domain(url): ... net_location = urlsplit(url)[1] ... return net_location.split(".", 1)[1] ... >>> print get_domain('http://www.cnn.com') cnn.com >>> print get_domain('http://www.ebay.co.uk') ebay.co.uk >>>
regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Recent Ramblings http://holdenweb.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list