Re: [LAST CALL][DISCUSS] Unsigned integers in Utf8View

2023-09-20 Thread Benjamin Kietzman
Thanks for the feedback, everyone! I'm calling this in favor of signed integers. That's the last change requested on the format PR, so I'll probably merge shortly. On Wed, Sep 20, 2023, 17:43 Matt Topol wrote: > Just to chime in (and add yet another voice into the mix here), I'd have a > prefe

Re: [LAST CALL][DISCUSS] Unsigned integers in Utf8View

2023-09-20 Thread Matt Topol
Just to chime in (and add yet another voice into the mix here), I'd have a preference for it being signed integers for the same reasons as most everyone else: consistency with everything else in the spec. Since we use signed integers everywhere, I'd prefer to keep it consistent rather than introduc

Re: [LAST CALL][DISCUSS] Unsigned integers in Utf8View

2023-09-20 Thread Benjamin Kietzman
Hello all, Thanks for the input! @Will > Could we name which ones it would be compatible with? UmbraDB [1], Velox [2], and DuckDB [3] all use unsigned integers for size. > As I understood it originally, the current implementations use raw pointers Yes, these implementations use raw pointers e

Re: [LAST CALL][DISCUSS] Unsigned integers in Utf8View

2023-09-19 Thread Will Jones
Hi Ben, I'm open to the idea of using unsigned if it allows compatibility with an existing implementation or two. Could we name which ones it would be compatible with? Links to implementation code would be very welcome, if available. As I understood it originally, the current implementations use

Re: [LAST CALL][DISCUSS] Unsigned integers in Utf8View

2023-09-19 Thread Dewey Dunnington
Hi all, Sorry for the late reply! I would lean towards signed integers because we don't use unsigned integers anywhere in the existing specification (other than as a data type). While they are allowed as dictionary index values, the spec specifically discourages their use [1]. If the times have c

Re: [LAST CALL][DISCUSS] Unsigned integers in Utf8View

2023-09-19 Thread Benjamin Kietzman
Hello again all, It seems there hasn't been much interest in this point so I'm leaning toward keeping unsigned integers. If anyone has a concern please respond here and/or on the PR [1]. Sincerely, Ben Kietzman [1] https://github.com/apache/arrow/pull/37526#discussion_r1323029022 On Thu, Sep 14