Note that if we do make this change as described, it will probably need to accompany a bump in the MetadataVersion (for forward-compatibility reasons, otherwise old clients won't be able to distinguish one decimal type from another). But that seems prudent regardless to force an upgrade to the stable 1.x.x series of releases.
Are there any other opinions about this? I can bring a vote about it and we can decide when to actually commit a patch based on the rest of the 1.0.0 timeline. On Tue, Jun 11, 2019 at 11:29 AM Ravindra Pindikura <ravin...@dremio.com> wrote: > > On Tue, Jun 11, 2019 at 2:48 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > On the 1.0.0 protocol discussion, one item that we've skirted for some > > time is other decimal sizes: > > > > https://issues.apache.org/jira/browse/ARROW-2009 > > > > I understand this is a loaded subject since a deliberate decision was > > made to remove types from the initial Java implementation of Arrow > > that was forked from Apache Drill. However, it's a friction point that > > has come up in a number of scenarios as many database and storage > > systems have 32- and 64-bit variants for low precision decimal data. > > As an example Apache Kudu [1] has all three types, and the Parquet > > columnar format allows not only 32/64 bit storage but fixed size > > binary (size a function of precision) and variable-length binary > > encoding [2]. > > > > One of the arguments against using these types in a computational > > setting is that many mathematical operations will necessarily trigger > > an up-promotion to a larger type. It's hard for us to predict how > > people will use the Arrow format, though, and the current situation is > > forcing an up-promotion regardless of how the format is being used, > > even for simple data transport > > > > In anticipation of long-term needs, I would suggest a possible solution of: > > > > * Adding bitWidth field to Decimal table in Schema.fbs [3] with > > default value of 128 > > > > +1 > > > > * Constraining bit widths to 32, 64, and 128 bits for the time being > > * Permit storage of smaller precision decimals in larger storage like > > we have now > > > > If this isn't deemed desirable by the community, decimal extension > > types could be employed for serialization-free transport for smaller > > decimals, but I view this as suboptimal. > > > > Interested in the thoughts of others. > > > > thanks > > Wes > > > > [1]: > > https://github.com/apache/kudu/blob/master/src/kudu/common/common.proto#L55 > > [2]: > > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal > > [3]: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L121 > > > > > -- > Thanks and regards, > Ravindra.