Re: RFR: 8324573: HashMap::putAll should resize to sum of both map sizes

Hannes Greule Wed, 24 Jan 2024 13:57:50 -0800

On Wed, 24 Jan 2024 20:54:43 GMT, Joshua Cao <d...@openjdk.org> wrote:


> > The current benchmark and the change don't really cover the case where many 
> > keys exist in _both_ maps. Could you add a benchmark for that?
> 
> I added a benchmark that assumes the worse case. Please see the top post. 
> Yes, this change is not always beneficial. I'd like to think the original 
> benchmark, which uses random keys, is closer to real life scenarios. There 
> may be some duplicate keys, but most keys are unique. But this is just my 
> intuition, I do not have data to back this up.

Thanks, I don't have any data on in either, I just think it's worth to consider 
such cases. But the regression is probably acceptable here.

> One solution would be to loop through both maps and count number of unique 
> keys. My guess is that this would mostly add unnecessary computations. We can 
> compromise with something like `size() / 2 + m.size()`. My guess is the 
> current aggressive approach is the best most of the time, but I don't have 
> strong opinions and am happy to take suggestions.

Something like that might make sense, but I doubt it will be possible to find a 
sweet spot that works *always*. Doing more complex calculation likely doesn't 
help.

> > Also `int s = size() + m.size();` can overflow now, leading to different 
> > behavior. It might make sense to just cap the value?
> 
> Is it necessary? We can go with @simonis 's suggestion. I think if we move 
> forward with this, it would make sense to change all instances of `size` to 
> long type, which would then suggest we should change the `Map::size()` api to 
> return a long. The reason I am not concerned about overflow is that the Map 
> API implies that we should only care about maps with at most 2^32 items.

With the current implementation it is necessary, especially when considering 
duplicated keys. The current implementation seems to just skip inserting 
anything if `s <= 0`, so an overflow would cause different behavior now. Maybe 
it makes sense to add a early return at the top of the method if `m` is empty 
and remove the `if (s > 0)`?

Using `double` would make sense because the value is used in double contexts 
afterwards anyway.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1908977135

Re: RFR: 8324573: HashMap::putAll should resize to sum of both map sizes

Reply via email to