>
> I’d also like to chime in in favor of 32- and 64-bit decimals because
> it’ll help achieve better performance on TPC-H (and maybe other
> benchmarks). The decimal columns need only 12 digits of precision, for
> which a 64-bit decimal is sufficient. It’s currently wasteful to use a
> 128-bit decimal. You can technically use a float too, but I expect 64-bit
> decimal to be faster.


We should be careful here.  If this assumes loading from Parquet or other
file formats currently in the library, arbitrarily changing the type to
load the minimum data-length possible could break users, this should
probably be a configuration option.  This also reminds me I think there is
some technical debt with decimals and parquet.

[1] https://issues.apache.org/jira/browse/ARROW-12022

On Tue, Mar 8, 2022 at 11:05 AM Sasha Krassovsky <krassovskysa...@gmail.com>
wrote:

> I’d also like to chime in in favor of 32- and 64-bit decimals because
> it’ll help achieve better performance on TPC-H (and maybe other
> benchmarks). The decimal columns need only 12 digits of precision, for
> which a 64-bit decimal is sufficient. It’s currently wasteful to use a
> 128-bit decimal. You can technically use a float too, but I expect 64-bit
> decimal to be faster.
>
> Sasha Krassovsky
>
> > 8 марта 2022 г., в 09:01, Micah Kornfield <emkornfi...@gmail.com>
> написал(а):
> >
> > 
> >>
> >>
> >> Do we want to keep the historical "C++ and Java" requirement or
> >> do we want to make it a more flexible "two independent official
> >> implementations", which could be for example C++ and Rust, Rust and
> >> Java, etc.
> >
> >
> > I think flexibility here is a good idea, I'd like to hear other opinions.
> >
> > For this particular case if there aren't volunteers to help out in
> another
> > implementation I'm willing to help with Java (I don't have bandwidth to
> > do both C++ and Java).
> >
> > Cheers,
> > -Micah
> >
> >> On Tue, Mar 8, 2022 at 8:23 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >>
> >>
> >> Le 07/03/2022 à 20:26, Micah Kornfield a écrit :
> >>>>
> >>>> Relaxing from {128,256} to {32,64,128,256} seems a low risk
> >>>> from an integration perspective, as implementations already need to
> read
> >>>> the bitwidth to select the appropriate physical representation (if
> they
> >>>> support it).
> >>>
> >>> I think there are two reasons for having implementations first.
> >>> 1.  Lower risk bugs in implementation/spec.
> >>> 2.  A mechanism to ensure that there is some boot-strapped coverage in
> >>> commonly used reference implementations.
> >>
> >> That sounds reasonable.
> >>
> >> Another question that came to my mind is: traditionally, we've mandated
> >> implementations in the two reference Arrow implementations (C++ and
> >> Java).  However, our implementation landscape is now much richer than it
> >> used to be (for example, there is a tremendous activity on the Rust
> >> side).  Do we want to keep the historical "C++ and Java" requirement or
> >> do we want to make it a more flexible "two independent official
> >> implementations", which could be for example C++ and Rust, Rust and
> >> Java, etc.
> >>
> >> (by "independent" I mean that one should not be based on the other, for
> >> example it should not be "C++ and Python" :-))
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >>
> >>>
> >>> I agree 1, is fairly low-risk.
> >>>
> >>> On Mon, Mar 7, 2022 at 11:11 AM Jorge Cardoso Leitão <
> >>> jorgecarlei...@gmail.com> wrote:
> >>>
> >>>> +1 adding 32 and 64 bit decimals.
> >>>>
> >>>> +0 to release it without integration tests - both IPC and the C data
> >>>> interface use a variable bit width to declare the appropriate size for
> >>>> decimal types. Relaxing from {128,256} to {32,64,128,256} seems a low
> >> risk
> >>>> from an integration perspective, as implementations already need to
> read
> >>>> the bitwidth to select the appropriate physical representation (if
> they
> >>>> support it).
> >>>>
> >>>> Best,
> >>>> Jorge
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Mar 7, 2022, 11:41 Antoine Pitrou <anto...@python.org> wrote:
> >>>>
> >>>>>
> >>>>> Le 03/03/2022 à 18:05, Micah Kornfield a écrit :
> >>>>>> I think this makes sense to add these.  Typically when adding new
> >>>> types,
> >>>>>> we've waited  on the official vote until there are two reference
> >>>>>> implementations demonstrating compatibility.
> >>>>>
> >>>>> You are right, I had forgotten about that.  Though in this case, it
> >>>>> might be argued we are just relaxing the constraints on an existing
> >> type.
> >>>>>
> >>>>> What do others think?
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Antoine.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> On Thu, Mar 3, 2022 at 6:55 AM Antoine Pitrou <anto...@python.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> Currently, the Arrow format specification restricts the bitwidth of
> >>>>>>> decimal numbers to either 128 or 256 bits.
> >>>>>>>
> >>>>>>> However, there is interest in allowing other bitwidths, at least 32
> >>>> and
> >>>>>>> 64 bits for this proposal. A 64-bit (respectively 32-bit) decimal
> >>>>>>> datatype would allow for precisions of up to 18 digits
> (respectively
> >> 9
> >>>>>>> digits), which are sufficient for some applications which are
> mainly
> >>>>>>> looking for exact computations rather than sheer precision.
> >> Obviously,
> >>>>>>> smaller datatypes are cheaper to store in memory and cheaper to run
> >>>>>>> computations on.
> >>>>>>>
> >>>>>>> For example, the Spark documentation mentions that some decimal
> types
> >>>>>>> may fit in a Java int (32 bits) or long (64 bits):
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/DecimalType.html
> >>>>>>>
> >>>>>>> ... and a draft PR had even been filed for initial support in the
> C++
> >>>>>>> implementation (https://github.com/apache/arrow/pull/8578).
> >>>>>>>
> >>>>>>> I am therefore proposing that we relax the wording in the Arrow
> >> format
> >>>>>>> specification to also allow 32- and 64-bit decimal types.
> >>>>>>>
> >>>>>>> This is a preliminary discussion to gather opinions and potential
> >>>>>>> counter-arguments against this proposal. If no strong
> >> counter-argument
> >>>>>>> emerges, we will probably run a vote in a week or two.
> >>>>>>>
> >>>>>>> Best regards
> >>>>>>>
> >>>>>>> Antoine.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Reply via email to