I think there's a couple of embedded / entangled questions here that about this:
* Should Arrow be able to be used to *transport* narrow decimals — for the (now very abundant) use cases where Arrow is being used as an internal wire protocol or client/server interface * Should *compute engines* that are Arrow-native or Arrow-compatible provide guarantees about when and how decimals will be widened, or whether narrow decimals input which can technically (from a pedantic mathematical standpoint) yield narrow output I think supporting the serialization-free transport case is pretty important, since systems with narrow decimals have to pre-widen them before sending them over Arrow. Clickhouse for example has decimal32 through decimal256 [1]. Result sets there returned would have to be serialized to decimal128, or else define extension types which could have compatibility issues. On the latter question, I think that no query engine for Arrow should be compelled to offer pedantically-consistent support for narrow decimals — so if a query engine received decimal32 or decimal64, it could define implicit cast to decimal128 and implement all kernels and algorithms for decimal128. I note the comment from the Clickhouse link that "Because modern CPUs do not support 128-bit integers natively, operations on Decimal128 are emulated. Because of this Decimal128 works significantly slower than Decimal32/Decimal64." To not afford query engines for Arrow the option to optimize some frequently-used calculations on narrow decimals (even though the implementation is burdensome) seems unfortunate. [1]: https://clickhouse.com/docs/en/sql-reference/data-types/decimal/ On Sat, Apr 23, 2022 at 9:15 PM Jacques Nadeau <jacq...@apache.org> wrote: > > I'm generally -0.01 against narrow decimals. My experience in practice has > been that widening happens so quickly that they are little used and add > unnecessary complexity. For reference, the original Arrow code actually > implemented Decimal9 [1] and Decimal18 [2] but we removed both because of > this experience of complexity. (Good to note that we worked with them for > several years before the model was in the Arrow project before we came to > this conclusion.) > > One of the other commenters here spoke of the benefit to things like tpch. > I doubt this would be meaningful as I believe most (if not all) decimal > operations in TPCH would typically immediately widen to DECIMAL38. > > Another possible approach here might be to add DECIMAL18 to the spec and > see the usage with it (and how much value it really added) before > adding DECIMAL9. > > It's easy to add types to the spec, hard to remove them. > > [1] > https://github.com/apache/arrow/blob/fa5f0299f046c46e1b2f671e5e3b4f1956522711/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L66 > [2] > https://github.com/apache/arrow/blob/fa5f0299f046c46e1b2f671e5e3b4f1956522711/java/vector/src/main/codegen/data/ValueVectorTypes.tdd#L81 > > > > >