Dennis Lee Bieber wrote:
> On Thu, 29 Apr 2010 11:38:28 +0200, "Karin Lagesen"
> declaimed the following in comp.lang.python:
>
>> Hello.
>>
>> I have approx 83 million strings, all 14 characters long. I need to be
>> able to take another string and find out whether this one is present
>> within
Karin Lagesen wrote:
> I have approx 83 million strings, all 14 characters long. I need to be
> able to take another string and find out whether this one is present
> within the 83 million strings.
[...]
> I run out of memory building both the set and the dictionary, so
> what I seem to be left wit
In article <877hnpjtdw@rudin.co.uk>,
Paul Rudin wrote:
>"Karin Lagesen" writes:
>
>> Hello.
>>
>> I have approx 83 million strings, all 14 characters long. I need to be
>> able to take another string and find out whether this one is present
>> within the 83 million strings.
>>
>> Now, I have
Dennis Lee Bieber wrote:
> On Sat, 01 May 2010 13:48:02 +0200, News123 declaimed
> the following in gmane.comp.python.general:
>
>> Dennis Lee Bieber wrote:
>>> That lets you do a binary search on the file. Much faster than a
>>> linear search (linear search will average out to 41.5M read ope
Duncan Booth, 30.04.2010 10:20:
So more than 3GB just for the strings (and that's for Python 2.x on
Python 3.x you'll need nearly 5GB).
Running on a 64 bit version of Python should be fine, but for a 32 bit
system a naive approach just isn't going to work.
Option 1: use a trie. That should redu
Dennis Lee Bieber wrote:
> On Thu, 29 Apr 2010 11:38:28 +0200, "Karin Lagesen"
> declaimed the following in comp.lang.python:
>
>> Hello.
>>
>> I have approx 83 million strings, all 14 characters long. I need to be
>> able to take another string and find out whether this one is present
>> within
http://www.swizwatch.com/
All Cartier replica watches sold at Hotwristwatch.com are brand-new and high
quality. Each Cartier Replica Watch produced is examined carefully by our
quality test department and each watch is inspected again before being sent
to our customer. It is our desire that you do
Helmut Jarausch wrote:
I think one could apply an external hashing technique which would require only
very few disk accesses per lookup.
Unfortunately, I'm now aware of an implementation in Python.
Does anybody know about a Python implementation of external hashing?
Thanks,
Helmut.
That's wh
On 04/30/2010 12:51 PM, Helmut Jarausch wrote:
I think one could apply an external hashing technique which would require only
very few disk accesses per lookup.
Unfortunately, I'm now aware of an implementation in Python.
Does anybody know about a Python implementation of external hashing?
Whil
I think one could apply an external hashing technique which would require only
very few disk accesses per lookup.
Unfortunately, I'm now aware of an implementation in Python.
Does anybody know about a Python implementation of external hashing?
Thanks,
Helmut.
--
Helmut Jarausch
Lehrstuhl fuer N
s = "12345678901234"
assert len(s) == 14
import sys
sys.getsizeof(s)
38
So a single 14 char string takes 38 bytes.
Make that at least 40 bytes. You have to take memory alignment into account.
So a set with 83000 such strings takes approximately 1 MB. So far fairly
trivial. But that's just th
Duncan Booth writes:
> Paul Rudin wrote:
>
>> Shouldn't a set with 83 million 14 character strings be fine in memory
>> on a stock PC these days? I suppose if it's low on ram you might start
>> swapping which will kill performance. Perhaps the method you're using
>> to build the data structures
On Fri, 30 Apr 2010 08:23:39 +0100, Paul Rudin wrote:
> "Karin Lagesen" writes:
>
>> Hello.
>>
>> I have approx 83 million strings, all 14 characters long. I need to be
>> able to take another string and find out whether this one is present
>> within the 83 million strings.
>>
>> Now, I have tri
Paul Rudin wrote:
> Shouldn't a set with 83 million 14 character strings be fine in memory
> on a stock PC these days? I suppose if it's low on ram you might start
> swapping which will kill performance. Perhaps the method you're using
> to build the data structures creates lots of garbage? How m
"Karin Lagesen" writes:
> Hello.
>
> I have approx 83 million strings, all 14 characters long. I need to be
> able to take another string and find out whether this one is present
> within the 83 million strings.
>
> Now, I have tried storing these strings as a list, a set and a dictionary.
> I kn
On 4/29/2010 5:38 AM, Karin Lagesen wrote:
Hello.
I have approx 83 million strings, all 14 characters long. I need to be
able to take another string and find out whether this one is present
within the 83 million strings.
If the 'other string' is also 14 chars, so that you are looking for
exac
> I have approx 83 million strings, all 14 characters long. I need to be
> able to take another string and find out whether this one is present
> within the 83 million strings.
Have a look at the shelve module.
If you want to write the algorithm yourself, I suggest
http://en.wikipedia.org/wiki/Tr
MRAB wrote:
> Karin Lagesen wrote:
>> Hello.
>>
>> I have approx 83 million strings, all 14 characters long. I need to
>> be able to take another string and find out whether this one is
>> present within the 83 million strings.
>>
>> Now, I have tried storing these strings as a list, a set and
Karin Lagesen wrote:
Hello.
I have approx 83 million strings, all 14 characters long. I need to be
able to take another string and find out whether this one is present
within the 83 million strings.
Now, I have tried storing these strings as a list, a set and a dictionary.
I know that finding t
"Karin Lagesen" wrote in message
news:416f727c6f5b0edb932b425db9579808.squir...@webmail.uio.no...
Hello.
I have approx 83 million strings, all 14 characters long. I need to be
able to take another string and find out whether this one is present
within the 83 million strings.
Now, I have trie
Karin Lagesen, 29.04.2010 11:38:
I have approx 83 million strings, all 14 characters long. I need to be
able to take another string and find out whether this one is present
within the 83 million strings.
Now, I have tried storing these strings as a list, a set and a dictionary.
I know that findi
Karin Lagesen wrote:
> I have approx 83 million strings, all 14 characters long. I need to be
> able to take another string and find out whether this one is present
> within the 83 million strings.
>
> Now, I have tried storing these strings as a list, a set and a dictionary.
> I know that findin
Hello.
I have approx 83 million strings, all 14 characters long. I need to be
able to take another string and find out whether this one is present
within the 83 million strings.
Now, I have tried storing these strings as a list, a set and a dictionary.
I know that finding things in a set and a di
23 matches
Mail list logo