Re: [DISCUSS] Linear Formula Types

2024-01-08 Thread Andrew Lamb
ch "application" (pandas, DataFusion, etc.) have their own implementation? > > Best regards, > Elliot Morrison-Reed > > -Original Message- > From: Andrew Lamb > Sent: Saturday, January 6, 2024 8:22 AM > To: dev@arrow.apache.org > Subject: Re: [DISCUSS] Lin

RE: [DISCUSS] Linear Formula Types

2024-01-08 Thread Morrison-Reed Elliot (BEG/PJ-EDS-NA)
AM To: dev@arrow.apache.org Subject: Re: [DISCUSS] Linear Formula Types Hi Elliot, Given your description, I agree extension types sound like they may be a good idea, similar to geoarrow[1] for Geospatial data where there is extra metadata[2] needed to interpret underlying types (e.g. factor

Re: [DISCUSS] Linear Formula Types

2024-01-07 Thread Julian Hyde
If the DB layer above Arrow supports it, I would define a (non-stored) calculated column. Given celsius_percent between 0 and 1, I would define fahrenheit as (32 + celsius_percent * 1.8). A good query optimizer would convert the condition 'where fahrenheit > 122' into 'where celsius_percent > 0.5'.

Re: [DISCUSS] Linear Formula Types

2024-01-06 Thread Andrew Lamb
Hi Elliot, Given your description, I agree extension types sound like they may be a good idea, similar to geoarrow[1] for Geospatial data where there is extra metadata[2] needed to interpret underlying types (e.g. factor and offset) Andrew [1] https://github.com/geoarrow/geoarrow [2] https://arr