On the 1.0.0 protocol discussion, one item that we've skirted for some
time is other decimal sizes:

https://issues.apache.org/jira/browse/ARROW-2009

I understand this is a loaded subject since a deliberate decision was
made to remove types from the initial Java implementation of Arrow
that was forked from Apache Drill. However, it's a friction point that
has come up in a number of scenarios as many database and storage
systems have 32- and 64-bit variants for low precision decimal data.
As an example Apache Kudu [1] has all three types, and the Parquet
columnar format allows not only 32/64 bit storage but fixed size
binary (size a function of precision) and variable-length binary
encoding [2].

One of the arguments against using these types in a computational
setting is that many mathematical operations will necessarily trigger
an up-promotion to a larger type. It's hard for us to predict how
people will use the Arrow format, though, and the current situation is
forcing an up-promotion regardless of how the format is being used,
even for simple data transport

In anticipation of long-term needs, I would suggest a possible solution of:

* Adding bitWidth field to Decimal table in Schema.fbs [3] with
default value of 128
* Constraining bit widths to 32, 64, and 128 bits for the time being
* Permit storage of smaller precision decimals in larger storage like
we have now

If this isn't deemed desirable by the community, decimal extension
types could be employed for serialization-free transport for smaller
decimals, but I view this as suboptimal.

Interested in the thoughts of others.

thanks
Wes

[1]: https://github.com/apache/kudu/blob/master/src/kudu/common/common.proto#L55
[2]: 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
[3]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L121

Reply via email to