... and please consider
Bug 6812862 <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862>
- provide customizable hash() algorithm in HashMap for speed tuning
again and too for HashSet.
-Ulf
Am 18.03.2010 14:54, schrieb Ulf Zibis:
+1
-Ulf
Am 18.03.2010 14:22, schrieb Osvaldo Doederlein:
The oldest collection classes were designed for the needs of J2SE
1.2, a full decade ago. This was discussed before, IIRC there was
some reply from Josh agreeing that some speeed-vs-size tradeoffs made
last decade should be revisited today. The extra runtime size/bloat
that a specialized HashSet implementation would cost, was reasonably
significant in 1999 but completely irrelevant in 2010. I mean,
HashSet is a HUGELY important collection, and the benefit of any
optimization of its implementation would spread to many APIs and
applications.
And the problem is not only the extra value field, there is also
overhead from the extra indirection (plus extra polymorphic call)
from the HashSet object to the internal HashMap object. This overhead
may sometimes be sufficient to block inlining and devirtualization,
so it's a potentially bigger cost than just a single extra memory
load (which is easily hoisted out of loops etc.). Look at this code
inside HashSet for a much worse cost:
public Iterator<E> iterator() {
return map.keySet().iterator();
}
Yeah we pay the cost of building the internal HashMap's key-set
(which is lazily-built), just to iterate the freaking HashSet.
(Notice that differently from HashMap, a Set is a true Collection
that we can iterate directly without any view-collection of
keys/values/entries.)
IMHO all this adds evidence that the current HashSet implementation
is a significant performance bug. We need a brand-new impl that does
the hashing internally, without relying on HashMap, without any
unused fields, extra indirections, or surprising costs like that for
iterator(). I guess it would be relatively simple to copy-paste
HashMap's code, cut stuff until just a Set of keys is left, and merge
in the most specific pieces of HashSet (basically just
readObject()/writeObject()).