On Fri, 14 Apr 2023 15:53:41 GMT, Alan Bateman <al...@openjdk.org> wrote:

>> @AlanBateman 
>> It is a known issue that size() may return a negative integer, see 
>> [JDK-8230557](https://bugs.openjdk.org/browse/JDK-8230557), and the accepted 
>> workaround is to interpret the returned integer as an unsigned value and 
>> convert the output to a long. This same workaround works if a user would 
>> call length() with Integer.MAX_VALUE set. Changing the specification to 
>> reject setting Integer.MAX_VALUE may end up breaking the implementation of 
>> clients who rely on setting Integer.MAX_VALUE and use this workaround.
>> 
>> Furthermore, the other methods (including ones that use length()) still 
>> function correctly whether or not the Integer.MAX_VALUE bit is set, except 
>> for get(int,int) as reported here. For example, clear(int, int) works as 
>> expected if Integer.MAX_VALUE is set as length() then is not called.
>> Changing the specification to reject setting Integer.MAX_VALUE may break 
>> user code that use this bit and/or users that rely on the above workaround.
>> 
>> So while changing the specifications is possible, it can potentially break 
>> existing clients. The change suggested in this pull request avoids this and 
>> instead fixes the internal bug of the get function locally, without 
>> affecting the other methods and without affecting existing clients.
>
>> So while changing the specifications is possible, it can potentially break 
>> existing clients. The change suggested in this pull request avoids this and 
>> instead fixes the internal bug of the get function locally, without 
>> affecting the other methods and without affecting existing clients.
> 
> I think it will require re-visting the spec, maybe deprecating and/or 
> introducing new methods, it's unfortunate that this wasn't recognised in 
> JDK-8230557.
> 
> Update: @stuart-marks has added a comment to JDK-8230557 on the workaround 
> that someone added to that issue in 2019.

Hi everyone @AlanBateman @liach @stuart-marks  ,

Based on the article's two main options (either banning or still allowing the 
Integer.MAX_VALUE bit), I have prepared two different CSRs.
These address the changes in specification that are required for each choice, 
as well as the change in implementation needed in `valueOf(..)`.
I have lumped Solution and Specification together here. Once we decide what we 
want to actually submit as the CSR, then we can look at making the proper git 
patch files.

### Permit Integer.MAX_VALUE
**Summary**
While still permitting setting the Integer.MAX VALUE bit in a BitSet, change 
the specification of `length()` to clarify that it can return Integer.MIN 
VALUE. 
For the `valueOf(..)` methods, change the specification to prevent the internal 
value of wordsInUse in BitSet from overflowing and to prevent the bitset 
storing bits it cannot access

**Problem**
When the Integer.MAX_VALUE bit is set in a BitSet, `length()` returns 
Integer.MAX_VALUE+1 which overflows to Integer.MIN_VALUE. The specification of 
the method does not make this clear: The "logical size" of the BitSet is not 
logically a negative number. 
The `valueOf(..)` methods allow any long[], LongBuffer, byte[] or ByteBuffer to 
make a bitset. In doing this, wordsInUse gets set to the largest element of 
words in the new BitSet with at least 1 bit set. However, the internal BitSet 
code relies on wordsInUse <= Integer.MAX_VALUE / 64 + 1 (we abbreviate this to 
a constant MAX_WIU), and the bitset class cannot access bits stored in 
words[MAX_WIU] and above.  
If this is not the case, then the return value of `length()` is no longer 
reliable: it is determined by the element stored in the long[] or LongBuffer or 
..., but these are bits that are not accessible by the BitSet itself. 
Technically, the expression BITS_PER_WORD * (wordsInUse - 1) will overflow, and 
`length()` will refer to a bit (with negative index or with incorrect positive 
index) that is not accessible by the other methods of the BitSet class. 

**Solution/Specification** 
`length()`: Specify that the method can return Integer.MIN_VALUE, and this can 
be interpreted as the UNsigned integer Integer.MAX_VALUE+1.
`valueOf(..)`: There are multiple options. We use `valueOf(long[] longs)` as an 
example, but equivalent would apply to the other 3 `valueOf(..)` methods.
Option 1: Raise an exception if longs.length is greater than MAX_WIU.
Option 2: Raise an exception if longs.length is greater than MAX_WIU AND there 
is at least one non-zero element in longs[MAX_WIU]-longs[ length-1 ]
Option 3: No exception, but only take the first MAX_WIU of longs for the 
bitset. 

**Risk**
Low.
The change in the `length()` specification matches existing behaviour.
The change in the `valueOf(..)` only involves exceptionally large objects. If 
these were used regularly, it is likely that the `valueOf(..)` bug would have 
been discovered earlier. 


### Forbid Int.MAX
*For the `valueOf(..)` methods, the same applies here as discussed in the other 
CSR. The only difference is mentioned in the Solution/Specification section.

**Summary**
Forbid interacting with the Integer.MAX_VALUE bit, preventing an overflow from 
occurring in `length()`. This involves single-parameter methods in the BitSet 
class raising an exception if the class tries to access Integer.MAX_VALUE.


**Problem**
When the Integer.MAX_VALUE bit is set, length() overflows to Integer.MIN_VALUE 
(Integer.MAX_VALUE+1). Moreover, a method such as `set(int bitIndex)` can set 
the Integer.MAX_VALUE bit, while `set(int fromIndex, int toIndex)` cannot, 
because the method sets up to but not including toIndex. 


**Solution/Specification** 
Ban accessing the Integer.MAX_VALUE bit. This includes the following changes to 
the specification (and implementation):
`set(int)`, `set(int, bool)`, `clear(int)`, `get(int)`, `flip(int)`: raise an 
exception if bitIndex = Integer.MAX_VALUE.

`valueOf(..)`: In addition to the changes described in the **'Permit 
Integer.MAX_VALUE'** CSR, the `valueOf(..)` methods need to raise an exception 
if the Integer.MAX_VALUE bit is set (Option 1 and 2) or ignore the bit stored 
at index Integer.MAX_VALUE, and ensure it is 0 (option 3).


**Risk**
Medium / high?
Forbidding access to the Integer.MAX_VALUE bit is a change to the specification 
of the essential methods of BitSet, and potentially affects actual, existing 
use cases. 
This does however require people to be using Integer.MAX_VALUE bits, which is 
an exceptionally large amount of bits.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13388#issuecomment-1642101764

Reply via email to