On Wed, Feb 12, 2020 at 2:37 PM Jacek Pliszka <jacek.plis...@gmail.com> wrote: > > Actually these options still make some sense - but not as much as before. > > The use case: unit conversion > > Data about prices exported from sql in Decimal(38,10) which uses 128 > bit but the numbers are actually prices which expressed in cents fit > perfectly in uint32 > > Having scaling would reduce bandwidth/disk usage by factor of 4.
You'd need to implement a separate function for this since you're changing the semantics of the cast. I don't think it makes sense to convert from 123.45 (decimal) to 12345 (uint32) in Cast > What would be the best approach to such use case? > > Would decimal_scale CastOption be OK or should it rather be compute > 'multiply' kernel ? > > BR, > > Jacek > > > śr., 12 lut 2020 o 19:32 Jacek Pliszka <jacek.plis...@gmail.com> napisał(a): > > > > OK, then what I proposed does not make sense and I can just copy the > > solution you pointed out. > > > > Thank you, > > > > Jacek > > > > śr., 12 lut 2020 o 19:27 Wes McKinney <wesmck...@gmail.com> napisał(a): > > > > > > On Wed, Feb 12, 2020 at 12:09 PM Jacek Pliszka <jacek.plis...@gmail.com> > > > wrote: > > > > > > > > Hi! > > > > > > > > ARROW-3329 - we can discuss there. > > > > > > > > > It seems like it makes sense to implement both lossless safe casts > > > > > (when all zeros after the decimal point) and lossy casts (fractional > > > > > part discarded) from decimal to integer, do I have that right? > > > > > > > > Yes, though if I understood your examples are the same case - in both > > > > cases fractional part is discarded - just it is all 0s in the first > > > > case. > > > > > > > > The key question is whether CastFunctor in cast.cc has access to scale > > > > of the decimal? If yes how? > > > > > > Yes, it's in the type of the input array. Here's a kernel > > > implementation that uses the TimestampType metadata of the input > > > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/cast.cc#L521 > > > > > > > > > > > If not - these are the options I've came up with: > > > > > > > > Let's assume Decimal128Type value is n > > > > > > > > Then I expect that base call > > > > .cast('int64') will return overflow for n beyond int64 values, value > > > > otherwise > > > > > > > > Option 1: > > > > > > > > .cast('int64', decimal_scale=s) would calculate n/10**s and return > > > > overflow if it is beyond int64, value otherwise > > > > > > > > Option 2: > > > > > > > > .cast('int64', bytes_group=0) would return n & 0x00000000FFFFFFFF > > > > .cast('int64', bytes_group=1) would return (n >> 64) & > > > > 0x00000000FFFFFFFF > > > > .cast('int64') would have default value bytes_group=0 > > > > > > > > Option 3: > > > > > > > > cast has no CastOptions but we add multiply compute kernel and have > > > > something like this instead: > > > > > > > > .compute('multiply', 10**-s).cast('int64') > > > > > > > > BR, > > > > > > > > Jacek