I think there's a couple of embedded / entangled questions here that about this:

* Should Arrow be able to be used to *transport* narrow decimals — for
the (now very abundant) use cases where Arrow is being used as an
internal wire protocol or client/server interface

* Should *compute engines* that are Arrow-native or Arrow-compatible
provide guarantees about when and how decimals will be widened, or
whether narrow decimals input which can technically (from a pedantic
mathematical standpoint) yield narrow output

I think supporting the serialization-free transport case is pretty
important, since systems with narrow decimals have to pre-widen them
before sending them over Arrow. Clickhouse for example has decimal32
through decimal256 [1]. Result sets there returned would have to be
serialized to decimal128, or else define extension types which could
have compatibility issues.

On the latter question, I think that no query engine for Arrow should
be compelled to offer pedantically-consistent support for narrow
decimals — so if a query engine received decimal32 or decimal64, it
could define implicit cast to decimal128 and implement all kernels and
algorithms for decimal128. I note the comment from the Clickhouse link
that "Because modern CPUs do not support 128-bit integers natively,
operations on Decimal128 are emulated. Because of this Decimal128
works significantly slower than Decimal32/Decimal64." To not afford
query engines for Arrow the option to optimize some frequently-used
calculations on narrow decimals (even though the implementation is
burdensome) seems unfortunate.

[1]: https://clickhouse.com/docs/en/sql-reference/data-types/decimal/

On Sat, Apr 23, 2022 at 9:15 PM Jacques Nadeau <jacq...@apache.org> wrote:
>
> I'm generally -0.01 against narrow decimals. My experience in practice has
> been that widening happens so quickly that they are little used and add
> unnecessary complexity. For reference, the original Arrow code actually
> implemented Decimal9 [1] and Decimal18 [2] but we removed both because of
> this experience of complexity. (Good to note that we worked with them for
> several years before the model was in the Arrow project before we came to
> this conclusion.)
>
> One of the other commenters here spoke of the benefit to things like tpch.
> I doubt this would be meaningful as I believe most (if not all) decimal
> operations in TPCH would typically immediately widen to DECIMAL38.
>
> Another possible approach here might be to add DECIMAL18 to the spec and
> see the usage with it (and how much value it really added) before
> adding DECIMAL9.
>
> It's easy to add types to the spec, hard to remove them.
>
> [1]
> https://github.com/apache/arrow/blob/fa5f0299f046c46e1b2f671e5e3b4f1956522711/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L66
> [2]
> https://github.com/apache/arrow/blob/fa5f0299f046c46e1b2f671e5e3b4f1956522711/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L81
>
>
>
> >

Reply via email to