Dennis Lee Bieber wrote: > On Thu, 29 Apr 2010 11:38:28 +0200, "Karin Lagesen" > <karin.lage...@bio.uio.no> declaimed the following in comp.lang.python: > >> Hello. >> >> I have approx 83 million strings, all 14 characters long. I need to be >> able to take another string and find out whether this one is present >> within the 83 million strings. >>
>> > So don't load them into memory... First use a file-based (not memory > > > That lets you do a binary search on the file. Much faster than a > linear search (linear search will average out to 41.5M read operations; > binary should be around 10000 reads) Don't you meant 27 reads instead of 41.5 M reads? >>> from math import log >>> log(83e6)/log(2) 26.306608000671101 >>> N -- http://mail.python.org/mailman/listinfo/python-list