On Apr 14, 10:36 am, [EMAIL PROTECTED] wrote: > On Apr 14, 12:02 am, Michael Bentley <[EMAIL PROTECTED]> > wrote: > > > > > On Apr 13, 2007, at 11:49 PM, [EMAIL PROTECTED] wrote: > > > > Hi, > > > > I have a list of url names like this, and I am trying to strip out the > > > domain name using the following code: > > > >http://www.cnn.com > > >www.yahoo.com > > >http://www.ebay.co.uk > > > > pattern = re.compile("http:\\\\(.*)\.(.*)", re.S) > > > match = re.findall(pattern, line) > > > > if (match): > > > s1, s2 = match[0] > > > > print s2 > > > > but none of the site matched, can you please tell me what am i > > > missing? > > > change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/ > > \/(.*)\.(.*)", re.S) > > Thanks. I try this: > > but when the 'line' ishttp://www.cnn.com, I get 's2' com, > but i want 'cnn.com' (everything after the first '.'), how can I do > that? > > pattern = re.compile("http:\/\/(.*)\.(.*)", re.S) > > match = re.findall(pattern, line) > > if (match): > > s1, s2 = match[0] > > print s2
Can anyone please help me with my problem? I still can't solve it. Basically, I want to strip out the text after the first '.' in url address: http://www.cnn.com -> cnn.com -- http://mail.python.org/mailman/listinfo/python-list