Re: RFR: 8305734: BitSet.get(int, int) always returns the empty BitSet when the Integer.MAX VALUE is set

Stuart Marks Fri, 21 Jul 2023 21:47:32 -0700

On Fri, 7 Apr 2023 12:22:03 GMT, Andy-Tatman <[email protected]> wrote:


> See https://bugs.java.com/bugdatabase/view_bug?bug_id=8305734 and 
> https://bugs.java.com/bugdatabase/view_bug?bug_id=JDK-8311905

Hi, thanks for the background and the discussion of the alternatives. I'm not 
sure that drafting the CSR is the right next step; there are a number of 
foibles about editing the specifications that come into play when preparing the 
CSR. However, what you've written above is a good design discussion about the 
different alternatives and how they affect the specification and the 
implementation.

One thing that we need to consider is compatibility. From an excessively 
pedantic standpoint, any change in behavior is a potential incompatibility 
(somebody might be relying on that bug!) but I think that fixing obviously 
incorrect behavior is reasonable. As much as possible I'd like to make sure 
that things that worked, at least partially, continue to work.

With this in mind I'm leaning toward allowing a BitSet to contain bits 
including the Integer.MAX_VALUE bit, and adjusting various specs accordingly. I 
think the best way to specify this is to say that length() returns an int in 
the range [0, 2^31] as an unsigned value. In practice of course this means it 
can return any Java `int` value in the range [0, MAX_VALUE] and also 
Integer.MIN_VALUE. A sufficiently careful programmer can use such values 
correctly, as long as they avoid doing certain things such as comparing against 
zero. (We might want to have a note in the spec to that effect.)

It's good that you analyzed the `valueOf` methods. It looks to me like the 
implementation will store an actual array potentially containing bits beyond 
the MAX_VALUE bit, and this will affect things like length() and size() and 
such bits won't be accessible via get() or set(). So, on the one hand, this 
behavior seems clearly broken and ought to be fixed by limiting the input array 
along the lines suggested by your three options.

On the other hand, it seems that from looking at the code, it's possible to 
create an "over-size" BitSet with valueOf(), and perform bulk bit manipulations 
on it and other BitSets using methods like and(), or(), and xor(). It also 
appears possible to read out the bits successfully using toByteArray() or 
toLongArray(). Thus an application might be able to manipulate bit arrays of up 
to about 2^37 bits long by using BitSet with long arrays or LongBuffers. 
Restricting the input of valueOf() to 2^31 bits would break such applications.

Also note that the specification says the bits are numbered by nonnegative 
integers (that is, zero to 2^31) which would seem to preclude longer bit 
arrays. However, if somebody does have an application that works with longer 
bit arrays, it seems unreasonable for us to break that application, unless we 
have good justification for doing so. I think it's a bit unlikely that somebody 
is actually relying on the feature of over-size arrays, but it seems possible, 
and I don't see any evidence otherwise (but I haven't looked too hard yet).

If we were to restrict BitSet to 2^31 bits, I'd rule out (2) from the three 
options. It doesn't seem sensible to check for and throw an exception only if 
certain bits are set. If the system did that, it might mean that code sometimes 
works but sometimes fails, depending on the _values_ contained in the array or 
buffer. Seems like it would make things hard to diagnose. Of the remaining 
options, I'd prefer option (1) to rejecting long arrays over (3) ignoring the 
extra values in the array. (1) is fail-fast but arguably not very friendly. 
However, (3) could potentially could result in silent data loss, which 
definitely seems like a bad idea.

However I'm still uncomfortable with restricting BitSet to 2^31 bits and thus 
changing the behavior of the various valueOf() methods. I'd welcome further 
discussion or evidence regarding whether applications might or might not be 
relying on the over-sized behavior. I did a quick survey of open source code 
but I haven't found anything conclusive yet.

Another approach would be to adjust the specifications to allow for over-size 
arrays. Methods that operate on bit indexes would for the most part not be able 
to read or write to the areas of the array beyond 2^31, but the bulk operations 
(and/or/xor) would work. Methods such as cardinality(), length(), nextSetBit(), 
and similar would be more difficult. We can accommodate the full 31-bit range 
using the "unsigned" carve-out mentioned previously. But that doesn't work for 
larger values.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13388#issuecomment-1646454706

Re: RFR: 8305734: BitSet.get(int, int) always returns the empty BitSet when the Integer.MAX VALUE is set

Reply via email to