On Fri, 14 Apr 2023 15:53:41 GMT, Alan Bateman <al...@openjdk.org> wrote:
>> @AlanBateman >> It is a known issue that size() may return a negative integer, see >> [JDK-8230557](https://bugs.openjdk.org/browse/JDK-8230557), and the accepted >> workaround is to interpret the returned integer as an unsigned value and >> convert the output to a long. This same workaround works if a user would >> call length() with Integer.MAX_VALUE set. Changing the specification to >> reject setting Integer.MAX_VALUE may end up breaking the implementation of >> clients who rely on setting Integer.MAX_VALUE and use this workaround. >> >> Furthermore, the other methods (including ones that use length()) still >> function correctly whether or not the Integer.MAX_VALUE bit is set, except >> for get(int,int) as reported here. For example, clear(int, int) works as >> expected if Integer.MAX_VALUE is set as length() then is not called. >> Changing the specification to reject setting Integer.MAX_VALUE may break >> user code that use this bit and/or users that rely on the above workaround. >> >> So while changing the specifications is possible, it can potentially break >> existing clients. The change suggested in this pull request avoids this and >> instead fixes the internal bug of the get function locally, without >> affecting the other methods and without affecting existing clients. > >> So while changing the specifications is possible, it can potentially break >> existing clients. The change suggested in this pull request avoids this and >> instead fixes the internal bug of the get function locally, without >> affecting the other methods and without affecting existing clients. > > I think it will require re-visting the spec, maybe deprecating and/or > introducing new methods, it's unfortunate that this wasn't recognised in > JDK-8230557. > > Update: @stuart-marks has added a comment to JDK-8230557 on the workaround > that someone added to that issue in 2019. Hi everyone @AlanBateman @liach @stuart-marks , Based on the article's two main options (either banning or still allowing the Integer.MAX_VALUE bit), I have prepared two different CSRs. These address the changes in specification that are required for each choice, as well as the change in implementation needed in `valueOf(..)`. I have lumped Solution and Specification together here. Once we decide what we want to actually submit as the CSR, then we can look at making the proper git patch files. ### Permit Integer.MAX_VALUE **Summary** While still permitting setting the Integer.MAX VALUE bit in a BitSet, change the specification of `length()` to clarify that it can return Integer.MIN VALUE. For the `valueOf(..)` methods, change the specification to prevent the internal value of wordsInUse in BitSet from overflowing and to prevent the bitset storing bits it cannot access **Problem** When the Integer.MAX_VALUE bit is set in a BitSet, `length()` returns Integer.MAX_VALUE+1 which overflows to Integer.MIN_VALUE. The specification of the method does not make this clear: The "logical size" of the BitSet is not logically a negative number. The `valueOf(..)` methods allow any long[], LongBuffer, byte[] or ByteBuffer to make a bitset. In doing this, wordsInUse gets set to the largest element of words in the new BitSet with at least 1 bit set. However, the internal BitSet code relies on wordsInUse <= Integer.MAX_VALUE / 64 + 1 (we abbreviate this to a constant MAX_WIU), and the bitset class cannot access bits stored in words[MAX_WIU] and above. If this is not the case, then the return value of `length()` is no longer reliable: it is determined by the element stored in the long[] or LongBuffer or ..., but these are bits that are not accessible by the BitSet itself. Technically, the expression BITS_PER_WORD * (wordsInUse - 1) will overflow, and `length()` will refer to a bit (with negative index or with incorrect positive index) that is not accessible by the other methods of the BitSet class. **Solution/Specification** `length()`: Specify that the method can return Integer.MIN_VALUE, and this can be interpreted as the UNsigned integer Integer.MAX_VALUE+1. `valueOf(..)`: There are multiple options. We use `valueOf(long[] longs)` as an example, but equivalent would apply to the other 3 `valueOf(..)` methods. Option 1: Raise an exception if longs.length is greater than MAX_WIU. Option 2: Raise an exception if longs.length is greater than MAX_WIU AND there is at least one non-zero element in longs[MAX_WIU]-longs[ length-1 ] Option 3: No exception, but only take the first MAX_WIU of longs for the bitset. **Risk** Low. The change in the `length()` specification matches existing behaviour. The change in the `valueOf(..)` only involves exceptionally large objects. If these were used regularly, it is likely that the `valueOf(..)` bug would have been discovered earlier. ### Forbid Int.MAX *For the `valueOf(..)` methods, the same applies here as discussed in the other CSR. The only difference is mentioned in the Solution/Specification section. **Summary** Forbid interacting with the Integer.MAX_VALUE bit, preventing an overflow from occurring in `length()`. This involves single-parameter methods in the BitSet class raising an exception if the class tries to access Integer.MAX_VALUE. **Problem** When the Integer.MAX_VALUE bit is set, length() overflows to Integer.MIN_VALUE (Integer.MAX_VALUE+1). Moreover, a method such as `set(int bitIndex)` can set the Integer.MAX_VALUE bit, while `set(int fromIndex, int toIndex)` cannot, because the method sets up to but not including toIndex. **Solution/Specification** Ban accessing the Integer.MAX_VALUE bit. This includes the following changes to the specification (and implementation): `set(int)`, `set(int, bool)`, `clear(int)`, `get(int)`, `flip(int)`: raise an exception if bitIndex = Integer.MAX_VALUE. `valueOf(..)`: In addition to the changes described in the **'Permit Integer.MAX_VALUE'** CSR, the `valueOf(..)` methods need to raise an exception if the Integer.MAX_VALUE bit is set (Option 1 and 2) or ignore the bit stored at index Integer.MAX_VALUE, and ensure it is 0 (option 3). **Risk** Medium / high? Forbidding access to the Integer.MAX_VALUE bit is a change to the specification of the essential methods of BitSet, and potentially affects actual, existing use cases. This does however require people to be using Integer.MAX_VALUE bits, which is an exceptionally large amount of bits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13388#issuecomment-1642101764