Le 07/03/2022 à 20:26, Micah Kornfield a écrit :

Relaxing from {128,256} to {32,64,128,256} seems a low risk
from an integration perspective, as implementations already need to read
the bitwidth to select the appropriate physical representation (if they
support it).

I think there are two reasons for having implementations first.
1.  Lower risk bugs in implementation/spec.
2.  A mechanism to ensure that there is some boot-strapped coverage in
commonly used reference implementations.

That sounds reasonable.

Another question that came to my mind is: traditionally, we've mandated implementations in the two reference Arrow implementations (C++ and Java). However, our implementation landscape is now much richer than it used to be (for example, there is a tremendous activity on the Rust side). Do we want to keep the historical "C++ and Java" requirement or do we want to make it a more flexible "two independent official implementations", which could be for example C++ and Rust, Rust and Java, etc.

(by "independent" I mean that one should not be based on the other, for example it should not be "C++ and Python" :-))

Regards

Antoine.



I agree 1, is fairly low-risk.

On Mon, Mar 7, 2022 at 11:11 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

+1 adding 32 and 64 bit decimals.

+0 to release it without integration tests - both IPC and the C data
interface use a variable bit width to declare the appropriate size for
decimal types. Relaxing from {128,256} to {32,64,128,256} seems a low risk
from an integration perspective, as implementations already need to read
the bitwidth to select the appropriate physical representation (if they
support it).

Best,
Jorge




On Mon, Mar 7, 2022, 11:41 Antoine Pitrou <anto...@python.org> wrote:


Le 03/03/2022 à 18:05, Micah Kornfield a écrit :
I think this makes sense to add these.  Typically when adding new
types,
we've waited  on the official vote until there are two reference
implementations demonstrating compatibility.

You are right, I had forgotten about that.  Though in this case, it
might be argued we are just relaxing the constraints on an existing type.

What do others think?

Regards

Antoine.



On Thu, Mar 3, 2022 at 6:55 AM Antoine Pitrou <anto...@python.org>
wrote:


Hello,

Currently, the Arrow format specification restricts the bitwidth of
decimal numbers to either 128 or 256 bits.

However, there is interest in allowing other bitwidths, at least 32
and
64 bits for this proposal. A 64-bit (respectively 32-bit) decimal
datatype would allow for precisions of up to 18 digits (respectively 9
digits), which are sufficient for some applications which are mainly
looking for exact computations rather than sheer precision. Obviously,
smaller datatypes are cheaper to store in memory and cheaper to run
computations on.

For example, the Spark documentation mentions that some decimal types
may fit in a Java int (32 bits) or long (64 bits):



https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DecimalType.html

... and a draft PR had even been filed for initial support in the C++
implementation (https://github.com/apache/arrow/pull/8578).

I am therefore proposing that we relax the wording in the Arrow format
specification to also allow 32- and 64-bit decimal types.

This is a preliminary discussion to gather opinions and potential
counter-arguments against this proposal. If no strong counter-argument
emerges, we will probably run a vote in a week or two.

Best regards

Antoine.





Reply via email to