On Wed, 24 Jan 2024 20:54:43 GMT, Joshua Cao <d...@openjdk.org> wrote:
> > The current benchmark and the change don't really cover the case where many > > keys exist in _both_ maps. Could you add a benchmark for that? > > I added a benchmark that assumes the worse case. Please see the top post. > Yes, this change is not always beneficial. I'd like to think the original > benchmark, which uses random keys, is closer to real life scenarios. There > may be some duplicate keys, but most keys are unique. But this is just my > intuition, I do not have data to back this up. Thanks, I don't have any data on in either, I just think it's worth to consider such cases. But the regression is probably acceptable here. > One solution would be to loop through both maps and count number of unique > keys. My guess is that this would mostly add unnecessary computations. We can > compromise with something like `size() / 2 + m.size()`. My guess is the > current aggressive approach is the best most of the time, but I don't have > strong opinions and am happy to take suggestions. Something like that might make sense, but I doubt it will be possible to find a sweet spot that works *always*. Doing more complex calculation likely doesn't help. > > Also `int s = size() + m.size();` can overflow now, leading to different > > behavior. It might make sense to just cap the value? > > Is it necessary? We can go with @simonis 's suggestion. I think if we move > forward with this, it would make sense to change all instances of `size` to > long type, which would then suggest we should change the `Map::size()` api to > return a long. The reason I am not concerned about overflow is that the Map > API implies that we should only care about maps with at most 2^32 items. With the current implementation it is necessary, especially when considering duplicated keys. The current implementation seems to just skip inserting anything if `s <= 0`, so an overflow would cause different behavior now. Maybe it makes sense to add a early return at the top of the method if `m` is empty and remove the `if (s > 0)`? Using `double` would make sense because the value is used in double contexts afterwards anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17544#issuecomment-1908977135