2010/3/18 Osvaldo Doederlein <[email protected]> > Hi, > > I've tread the google-groups thread, it seems you didn't test on a 64-bit > VM. Could you do that test, with and without CompressedOops, and using > latest HotSpot (7b85 or 6u20ea)? I guess we should see advantages in both > memory savings and speed, at least with CompressedOops. > > It is too easy to dismiss an optimization on the basis of "doesn't deliver > benefit on a particular VM". It may be good on a different implementation, > or a different architecture like 32 vs. 64 bits. Object headers, field > layouts, alignments etc., are not portable, and the best rule of thumb is > that any removed field _will_ reduce memory usage at least in some > implementation. > > The oldest collection classes were designed for the needs of J2SE 1.2, a > full decade ago. This was discussed before, IIRC there was some reply from > Josh agreeing that some speeed-vs-size tradeoffs made last decade should be > revisited today. The extra runtime size/bloat that a specialized HashSet > implementation would cost, was reasonably significant in 1999 but completely > irrelevant in 2010. I mean, HashSet is a HUGELY important collection, and > the benefit of any optimization of its implementation would spread to many > APIs and applications. > > And the problem is not only the extra value field, there is also overhead > from the extra indirection (plus extra polymorphic call) from the HashSet > object to the internal HashMap object. This overhead may sometimes be > sufficient to block inlining and devirtualization, so it's a potentially > bigger cost than just a single extra memory load (which is easily hoisted > out of loops etc.). Look at this code inside HashSet for a much worse cost: > > public Iterator<E> iterator() { > return map.keySet().iterator(); > } > > Yeah we pay the cost of building the internal HashMap's key-set (which is > lazily-built), just to iterate the freaking HashSet. (Notice that > differently from HashMap, a Set is a true Collection that we can iterate > directly without any view-collection of keys/values/entries.) > > IMHO all this adds evidence that the current HashSet implementation is a > significant performance bug. We need a brand-new impl that does the hashing > internally, without relying on HashMap, without any unused fields, extra > indirections, or surprising costs like that for iterator(). I guess it would > be relatively simple to copy-paste HashMap's code, cut stuff until just a > Set of keys is left, and merge in the most specific pieces of HashSet > (basically just readObject()/writeObject()). >
Hi, Sorry, I was disappointed by the result and sent the code to /dev/null, so can't readily test that, but yes, it is a relatively simple exercise. In my opinion, if someone is going to undertake the task of creating a new HashSet, he'd better start from a white page, not going the "HashMap-->snip-->snip-->HashSet" path. Even if for some platforms there would be some gains through this path, not reducing memory footprint on a large number of 32-bit platforms would be quite a pity. (About runtime performance, given the amount of "magic" in the JVM, I dare to say even less). I find what Martin suggests on that thread (his second-to-last post) a quite promising alternative (open addressing plus two parallel arrays, for keys and for hashes). Just my 2c. I would love to know in more detail Doug's opinion on the matter. Dimitris > > A+ > Osvaldo > > 2010/3/17 Dimitris Andreou <[email protected]> > >> >> 2010/3/18 Rémi Forax <[email protected]> >> >> Le 18/03/2010 00:59, Paulo Levi a écrit : >>> >>> My understanding is that set implementations are implemented by using >>>> Maps internally + a marker object, and that since Maps are implemented >>>> using >>>> arrays of entries this is at least n*3 references more that what is needed, >>>> since there are never multiple values. >>>> >>>> Any plans to change this? I suspect it would be a boon for programs that >>>> use the correct data structure. >>>> >>> >>> You have to test it. My guess is that there will be no difference. >>> As far as I remember, an object needs to be aligned on a valid 64bits >>> address even in 32bits mode, >>> Hotspot uses a 64bits header and the internal hash map entry contains 4 >>> ints, >>> if you remove the reference corresponding to the value, the empty place >>> will be >>> considered as garbage and not used. >>> >> >> Else, you can try to remove the internal entry object but in that case >>> the hashcode of the element will be not stored anymore and you will >>> have a slowdown for all objects that doesn't cache their hashcode by >>> itself. >>> >>> >> See my second-to-last post in this thread: >> >> http://groups.google.com/group/guava-discuss/browse_thread/thread/23bc8fa5ae479698 >> >> In short, I tested removing the "value" field of a HashMap's entry object, >> and indeed (through Instrumentation#getObjectSize) I observed no reduction >> in memory. I had to remove one further field (e.g. "hash") to make a >> reduction (of 8 bytes per entry). >> > >
