For the insertion stage, the Dict is likely ideal. For the second stage, a Dict that is based on hashing is a good data structure. Another data structure worth examining would be a sorted vector of `(Int64, Float64)` pairs (sorted by the `Int64` keys). Interpolation search <https://en.wikipedia.org/wiki/Interpolation_search> can then have a complexity of `O(log log N)`. My personal guess is that a well-optimized Dict (i.e. with both hash function and hash table size "optimized") will be faster if we assume a close to random access, as it will have fewer memory accesses.
Julia's Dict implementation (see base/dict.jl) has constants that you could tune (read: play with). There is also a function `Base.rehash!` that you can call to increase the size of the hash table, which might increase performance by avoiding hash collisions, if you have sufficient memory. -erik On Wed, Aug 17, 2016 at 7:58 PM, 'Greg Plowman' via julia-users < [email protected]> wrote: > I need fast random access to a Dict{Int64,Float64}. > My application has a first phase in which the Dict is populated, and a > second phase where it accessed randomly (with no further additions or > deletions). > There are about 50,000 to 100,000 entries, with keys in the range 10^9 to > 10^14. > > Firstly is a Dict the most optimal data structure? (presumably Dict is > faster than SparseVector for random lookup?) > > Secondly, is there any "preconditioning" I can do after creating the Dict > in phase 1 but before phase 2, to optimise the Dict for random access. > (e.g. sizehint, sorting keys) > > I do appreciate access might already be fast and optimal and I should just > benchmark, but I'm just looking for some theoretical > insight before benchmarking. > > > -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
