On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > In <[EMAIL PROTECTED]>, Marko.Cain.23 > wrote: > > > > > On Apr 14, 10:36 am, [EMAIL PROTECTED] wrote: > >> On Apr 14, 12:02 am, Michael Bentley <[EMAIL PROTECTED]> > >> wrote: > > >> > On Apr 13, 2007, at 11:49 PM, [EMAIL PROTECTED] wrote: > > >> > > Hi, > > >> > > I have a list of url names like this, and I am trying to strip out the > >> > > domain name using the following code: > > >> > >http://www.cnn.com > >> > >www.yahoo.com > >> > >http://www.ebay.co.uk > > >> > > pattern = re.compile("http:\\\\(.*)\.(.*)", re.S) > >> > > match = re.findall(pattern, line) > > >> > > if (match): > >> > > s1, s2 = match[0] > > >> > > print s2 > > >> > > but none of the site matched, can you please tell me what am i > >> > > missing? > > >> > change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/ > >> > \/(.*)\.(.*)", re.S) > > >> Thanks. I try this: > > >> but when the 'line' ishttp://www.cnn.com, I get 's2' com, > >> but i want 'cnn.com' (everything after the first '.'), how can I do > >> that? > > >> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S) > > >> match = re.findall(pattern, line) > > >> if (match): > > >> s1, s2 = match[0] > > >> print s2 > > > Can anyone please help me with my problem? I still can't solve it. > > > Basically, I want to strip out the text after the first '.' in url > > address: > > >http://www.cnn.com-> cnn.com > > from urlparse import urlsplit > > def get_domain(url): > net_location = urlsplit(url)[1] > return '.'.join(net_location.rsplit('.', 2)[-2:]) > > def main(): > print get_domain('http://www.cnn.com') > > Ciao, > Marc 'BlackJack' Rintsch
Thanks for your help. But if the input string is "http://www.ebay.co.uk/", I only get "co.uk" how can I change it so that it works for both www.ebay.co.uk and www.cnn.com? -- http://mail.python.org/mailman/listinfo/python-list