Bash associative arrays are slow, don't scale

David Kuehling Wed, 04 May 2011 22:49:25 -0700

Hi,

I'm using a bash script (bash 4.1) to parse debian package indices (kind
of a mirroring script).  This stores all package fields for every
package into one big hash table.


That's about 300.000 hash array elements for Debian sid's "Sources"
index file.  Unfortunately performance doesn't scale at all.  Needs
about 2m30 for parsing that Sources file.  Needs exactly the same time
after completely re-writing the parser using a sed preprocessor script.

Looking at the hashlib.c source-code it's somewhat apparant why it's
that slow: bash's hashtables implement open hashing with 64 hash chains.
So performance for search and insert is linear in number N of stored
elements, as soon as N is much larger then 64.  Especially insert is
extremely slow as it has to scan the full chain from start to end.

Are there any plans to change that?  Would be helpful if the man page
stated that bash arrays are O(N) in performance for large N.  Had I
known that, I wouldn't have relied on using bash at all for such kind of
script.

cheers,

David
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

pgpMKhA7Q14kW.pgp
Description: PGP signature

Bash associative arrays are slow, don't scale

Reply via email to