I think Java was usually raised as the odd child out when this has come up 
before. Since Java 8 there are standard library methods to manipulate signed 
integers as if they were unsigned, so in principle Java shouldn't be a blocker 
anymore.

That said, ByteBuffer is still indexed by int so in practice Java wouldn't be 
able to handle more than 2 GB in a single buffer, at least until we can use the 
Java 21+ APIs (MemorySegment is finally indexed by (signed) long).

On Tue, Sep 12, 2023, at 11:40, Benjamin Kietzman wrote:
> Hello all,
>
> Utf8View was recently accepted [1] and I've opened a PR to add the
> spec/schema changes [2]. In review [3], it was requested that signed 32 bit
> integers be used for the fields of view structs instead of 32 bit unsigned.
>
> This divergence has been discussed on the ML previously [4], but in light
> of my reviewer's request for a change it should be raised again for focused
> discussion. (At this stage, I don't *think* the change would require
> another vote.) I'll enumerate the motivations for signed and unsigned as I
> understand them.
>
> Signed:
> - signed integers are conventional in the arrow format
> - unsigned integers may cause some difficulty of implementation in
> languages which don't natively support them
>
> Unsigned:
> - unsigned integers are used by engines which already implement Utf8View
>
> My own bias is toward compatibility with existing implementers, but using
> signed integers will only affect the case of arrays which include data
> buffers larger than 2GB. For reference, the default buffer size in velox is
> 32KB so such a massive data buffer would only occur when a single slot of a
> string array has 2.1GB of characters. This seems sufficiently unlikely that
> I wouldn't consider it a blocker.
>
> Sincerely,
> Ben Kietzman
>
> [1] https://lists.apache.org/thread/wt9j3q7qd59cz44kyh1zkts8s6wo1dn6
> [2] https://github.com/apache/arrow/pull/37526
> [3] https://github.com/apache/arrow/pull/37526#discussion_r1323029022
> [4] https://lists.apache.org/thread/w88tpz76ox8h3rxkjl4so6rg3f1rv7wt
> [5]
> https://github.com/facebookincubator/velox/blob/947d98c99a7cf05bfa4e409b1542abc89a28cb29/velox/vector/FlatVector.h#L46-L50

Reply via email to