MRAB <pyt...@mrabarnett.plus.com> wrote: > Karin Lagesen wrote: >> Hello. >> >> I have approx 83 million strings, all 14 characters long. I need to >> be able to take another string and find out whether this one is >> present within the 83 million strings. >> >> Now, I have tried storing these strings as a list, a set and a >> dictionary. I know that finding things in a set and a dictionary is >> very much faster than working with a list, so I tried those first. >> However, I run out of memory building both the set and the >> dictionary, so what I seem to be left with is the list, and using the >> in method. >> >> I imagine that there should be a faster/better way than this? >> > You could sort the list and then use a binary chop (Google can tell > you what that is if you don't know already). >
... and Python comes batteries included. You can use bisect.bisect_left to do the binary chop and then you just have to check whether it found the string. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list