In <[EMAIL PROTECTED]>, Marko.Cain.23 wrote: > On Apr 14, 10:36 am, [EMAIL PROTECTED] wrote: >> On Apr 14, 12:02 am, Michael Bentley <[EMAIL PROTECTED]> >> wrote: >> >> >> >> > On Apr 13, 2007, at 11:49 PM, [EMAIL PROTECTED] wrote: >> >> > > Hi, >> >> > > I have a list of url names like this, and I am trying to strip out the >> > > domain name using the following code: >> >> > >http://www.cnn.com >> > >www.yahoo.com >> > >http://www.ebay.co.uk >> >> > > pattern = re.compile("http:\\\\(.*)\.(.*)", re.S) >> > > match = re.findall(pattern, line) >> >> > > if (match): >> > > s1, s2 = match[0] >> >> > > print s2 >> >> > > but none of the site matched, can you please tell me what am i >> > > missing? >> >> > change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/ >> > \/(.*)\.(.*)", re.S) >> >> Thanks. I try this: >> >> but when the 'line' ishttp://www.cnn.com, I get 's2' com, >> but i want 'cnn.com' (everything after the first '.'), how can I do >> that? >> >> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S) >> >> match = re.findall(pattern, line) >> >> if (match): >> >> s1, s2 = match[0] >> >> print s2 > > Can anyone please help me with my problem? I still can't solve it. > > Basically, I want to strip out the text after the first '.' in url > address: > > http://www.cnn.com -> cnn.com
from urlparse import urlsplit def get_domain(url): net_location = urlsplit(url)[1] return '.'.join(net_location.rsplit('.', 2)[-2:]) def main(): print get_domain('http://www.cnn.com') Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list