Maybe this code will be faster? (If it even does the same thing: largely untested)
filehandle = open("data",'r',buffering=1000) fileIter = iter(filehandle) lastLine = fileIter.next() lastTokens = lastLine.strip().split(delimiter) lastGeno = extract(lastTokens[0]) for currentLine in fileIter: currentTokens = currentLine.strip().split(delimiter) currentGeno = extract(currentTokens[0]) if lastGeno == currentGeno: table.markEquivalent(int(lastTokens[1]),int(currentTokens[1])) # prepare for next iteration lastLine = currentLine lastTokens = currentTokens lastGeno = currentGeno I'd be tempted to try a bigger file buffer too, personally. -- Ben Sizer -- http://mail.python.org/mailman/listinfo/python-list