[EMAIL PROTECTED] wrote:
Generalized, you'll want to add some 'try' test to verify the not returning just root domains ('com', 'edu', 'net' etc)On Apr 14, 10:36 am, [EMAIL PROTECTED] wrote:On Apr 14, 12:02 am, Michael Bentley <[EMAIL PROTECTED]> wrote:On Apr 13, 2007, at 11:49 PM, [EMAIL PROTECTED] wrote:Hi,I have a list of url names like this, and I am trying to strip out the domain name using the following code:http://www.cnn.com www.yahoo.com http://www.ebay.co.ukpattern = re.compile("http:\\\\(.*)\.(.*)", re.S) match = re.findall(pattern, line)if (match): s1, s2 = match[0]print s2but none of the site matched, can you please tell me what am i missing?change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/ \/(.*)\.(.*)", re.S)Thanks. I try this: but when the 'line' ishttp://www.cnn.com, I get 's2' com, but i want 'cnn.com' (everything after the first '.'), how can I do that? pattern = re.compile("http:\/\/(.*)\.(.*)", re.S) match = re.findall(pattern, line) if (match): s1, s2 = match[0] print s2Can anyone please help me with my problem? I still can't solve it. Basically, I want to strip out the text after the first '.' in url address: http://www.cnn.com -> cnn.com Start with spliting? from string import split, find url='' url.split('//') ['http:', 'www.cnn.com'] site = url.split('//')[1:][0] site ='www.cnn.com' site.find('.') 3 site[site.find('.')+1:] 'cnn.com' domain = site[site.find('.')+1:] from string import split, find def getDomain( url=''): site = url.split('//')[1:][0] domain = site[site.find('.')+1:] return domain |
-- http://mail.python.org/mailman/listinfo/python-list