Paul McGuire wrote: > "Claudio Grondi" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> Chris Foote wrote: >>> Hi all. >>> >>> I have the need to store a large (10M) number of keys in a hash table, >>> based on a tuple of (long_integer, integer). The standard python >>> dictionary works well for small numbers of keys, but starts to >>> perform badly for me inserting roughly 5M keys: >>> >>> # keys dictionary metakit (both using psyco) >>> ------ ---------- ------- >>> 1M 8.8s 22.2s >>> 2M 24.0s 43.7s >>> 5M 115.3s 105.4s >>> >>> Has anyone written a fast hash module which is more optimal for >>> large datasets ? >>> >>> p.s. Disk-based DBs are out of the question because most >>> key lookups will result in a miss, and lookup time is >>> critical for this application. >>> >> Python Bindings (\Python24\Lib\bsddb vers. 4.3.0) and the DLL for >> BerkeleyDB (\Python24\DLLs\_bsddb.pyd vers. 4.2.52) are included in the >> standard Python 2.4 distribution. >> >> "Berkeley DB was 20 times faster than other databases. It has the >> operational speed of a main memory database, the startup and shut down >> speed of a disk-resident database, and does not have the overhead of >> a client-server inter-process communication." >> Ray Van Tassle, Senior Staff Engineer, Motorola >> >> Please let me/us know if it is what you are looking for. > > sqlite also supports an in-memory database - use pysqlite > (http://initd.org/tracker/pysqlite/wiki) to access this from Python.
Hi Paul. I tried that, but the overhead of parsing SQL queries is too high: dictionary metakit sqlite[1] ---------- ------- --------- 1M numbers 8.8s 22.2s 89.6s 2M numbers 24.0s 43.7s 190.0s 5M numbers 115.3s 105.4s N/A Thanks for the suggestion, but no go. Cheers, Chris [1] pysqlite V1 & sqlite V3. -- http://mail.python.org/mailman/listinfo/python-list