Re: [ARROW-3329] Re: Decimal casting or scaling

Wes McKinney Wed, 12 Feb 2020 14:08:13 -0800

On Wed, Feb 12, 2020 at 2:37 PM Jacek Pliszka <[email protected]> wrote:
>
> Actually these options still make some sense - but not as much as before.
>
> The use case: unit conversion
>
> Data about prices exported from sql in Decimal(38,10) which uses 128
> bit but the numbers are actually prices which expressed in cents fit
> perfectly in uint32
>
> Having scaling would reduce bandwidth/disk usage by factor of 4.


You'd need to implement a separate function for this since you're
changing the semantics of the cast. I don't think it makes sense to
convert from 123.45 (decimal) to 12345 (uint32) in Cast

> What would be the best approach to such use case?
>
> Would decimal_scale CastOption be OK or should it rather be compute
> 'multiply' kernel ?
>
> BR,
>
> Jacek
>
>
> śr., 12 lut 2020 o 19:32 Jacek Pliszka <[email protected]> napisał(a):
> >
> > OK, then what I proposed does not make sense and I can just copy the
> > solution you pointed out.
> >
> > Thank you,
> >
> > Jacek
> >
> > śr., 12 lut 2020 o 19:27 Wes McKinney <[email protected]> napisał(a):
> > >
> > > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka <[email protected]> 
> > > wrote:
> > > >
> > > > Hi!
> > > >
> > > > ARROW-3329 - we can discuss there.
> > > >
> > > > > It seems like it makes sense to implement both lossless safe casts
> > > > > (when all zeros after the decimal point) and lossy casts (fractional
> > > > > part discarded) from decimal to integer, do I have that right?
> > > >
> > > > Yes, though if I understood your examples are the same case - in both
> > > > cases fractional part is discarded - just it is all 0s in the first
> > > > case.
> > > >
> > > > The key question is whether CastFunctor in cast.cc has access to scale
> > > > of the decimal? If yes how?
> > >
> > > Yes, it's in the type of the input array. Here's a kernel
> > > implementation that uses the TimestampType metadata of the input
> > >
> > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521
> > >
> > > >
> > > > If not - these are the options I've came up with:
> > > >
> > > > Let's assume Decimal128Type value is  n
> > > >
> > > > Then I expect that base call
> > > > .cast('int64') will return  overflow for n beyond int64 values, value 
> > > > otherwise
> > > >
> > > > Option 1:
> > > >
> > > > .cast('int64', decimal_scale=s)  would calculate  n/10**s and return
> > > > overflow if it is beyond int64, value otherwise
> > > >
> > > > Option 2:
> > > >
> > > > .cast('int64', bytes_group=0) would return n & 0x00000000FFFFFFFF
> > > > .cast('int64', bytes_group=1) would return (n >> 64) & 
> > > > 0x00000000FFFFFFFF
> > > > .cast('int64') would have default value bytes_group=0
> > > >
> > > > Option 3:
> > > >
> > > > cast has no CastOptions but we add  multiply compute kernel and have
> > > > something like this instead:
> > > >
> > > > .compute('multiply', 10**-s).cast('int64')
> > > >
> > > > BR,
> > > >
> > > > Jacek

Re: [ARROW-3329] Re: Decimal casting or scaling

Reply via email to