On 17/08/07, Beema shafreen <[EMAIL PROTECTED]> wrote: > hi everybody, > i have a file with data separated by tab > mydata: > fhl1 fkh2 > dfp1 chk1 > mal3 alp14 > mal3 moe1 > mal3 spi1 > mal3 bub1 > mal3 bub3 > mal3 mph1 > mal3 mad3 > hob1 nak1 > hob1 wsp1 > hob1 rad3 > cdr2 cdc13 > cdr2 cdc2 > shows these two are separated by tab represented as columns > i have to check the common data between these two coloumn1 an coloumn2 > my code: > data = [] > data1 = [] > result = [] > fh = open('sheet1','r') > for line in fh.readlines(): > splitted = line.strip().split('\t') > data.append(splitted[0]) > data1.append(splitted[1]) > for k in data: > if k in data1: > result.append(k) > print result > fh.close() > > can you tell me problem with my script and what should is do for this
For a start, you are iterating k in data *everytime* you iterate a line in fh which will give you a speed issue and give you duplicates in the result. The following is probably what you intended to do > for line in fh.readlines(): > do stuff > for k in data: > do stuff .split() splits by spaces, newlines AND tabs so you just need > splitted = line.split() eg >>> ln = 'fhl1\tfkh2\r\n' >>> ln.split() ['fhl1', 'fkh2'] I think I would have done something like this (not tested) Input = open('sheet1').read().split() data = set(Input[::2]) data1 = set (Input[1::2]) result = data.intersection(data1) or even this (if you don't need data and data1 later in the code) Input = open('sheet1').read().split() result = set(Input[::2]).intersection(set (Input[1::2])) HTH :) -- http://mail.python.org/mailman/listinfo/python-list