Hi Wes, Just a question, I'm ok going either way on this but why not a new variable width decimal type and deprecating the old one instead of breaking forward compatibility?
Thanks, Micah On Tuesday, July 2, 2019, Wes McKinney <wesmck...@gmail.com> wrote: > Note that if we do make this change as described, it will probably > need to accompany a bump in the MetadataVersion (for > forward-compatibility reasons, otherwise old clients won't be able to > distinguish one decimal type from another). But that seems prudent > regardless to force an upgrade to the stable 1.x.x series of releases. > > Are there any other opinions about this? I can bring a vote about it > and we can decide when to actually commit a patch based on the rest of > the 1.0.0 timeline. > > On Tue, Jun 11, 2019 at 11:29 AM Ravindra Pindikura <ravin...@dremio.com> > wrote: > > > > On Tue, Jun 11, 2019 at 2:48 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > On the 1.0.0 protocol discussion, one item that we've skirted for some > > > time is other decimal sizes: > > > > > > https://issues.apache.org/jira/browse/ARROW-2009 > > > > > > I understand this is a loaded subject since a deliberate decision was > > > made to remove types from the initial Java implementation of Arrow > > > that was forked from Apache Drill. However, it's a friction point that > > > has come up in a number of scenarios as many database and storage > > > systems have 32- and 64-bit variants for low precision decimal data. > > > As an example Apache Kudu [1] has all three types, and the Parquet > > > columnar format allows not only 32/64 bit storage but fixed size > > > binary (size a function of precision) and variable-length binary > > > encoding [2]. > > > > > > One of the arguments against using these types in a computational > > > setting is that many mathematical operations will necessarily trigger > > > an up-promotion to a larger type. It's hard for us to predict how > > > people will use the Arrow format, though, and the current situation is > > > forcing an up-promotion regardless of how the format is being used, > > > even for simple data transport > > > > > > In anticipation of long-term needs, I would suggest a possible > solution of: > > > > > > * Adding bitWidth field to Decimal table in Schema.fbs [3] with > > > default value of 128 > > > > > > > +1 > > > > > > > * Constraining bit widths to 32, 64, and 128 bits for the time being > > > * Permit storage of smaller precision decimals in larger storage like > > > we have now > > > > > > If this isn't deemed desirable by the community, decimal extension > > > types could be employed for serialization-free transport for smaller > > > decimals, but I view this as suboptimal. > > > > > > Interested in the thoughts of others. > > > > > > thanks > > > Wes > > > > > > [1]: > > > https://github.com/apache/kudu/blob/master/src/kudu/ > common/common.proto#L55 > > > [2]: > > > https://github.com/apache/parquet-format/blob/master/ > LogicalTypes.md#decimal > > > [3]: https://github.com/apache/arrow/blob/master/format/ > Schema.fbs#L121 > > > > > > > > > -- > > Thanks and regards, > > Ravindra. >