Re: [DISCUSS] Linear Formula Types

2024-01-08 Thread Andrew Lamb
ch "application" (pandas, DataFusion, etc.) have their own implementation? > > Best regards, > Elliot Morrison-Reed > > -Original Message- > From: Andrew Lamb > Sent: Saturday, January 6, 2024 8:22 AM > To: dev@arrow.apache.org > Subject: Re: [DISCUSS] Lin

RE: [DISCUSS] Linear Formula Types

2024-01-08 Thread Morrison-Reed Elliot (BEG/PJ-EDS-NA)
AM To: dev@arrow.apache.org Subject: Re: [DISCUSS] Linear Formula Types Hi Elliot, Given your description, I agree extension types sound like they may be a good idea, similar to geoarrow[1] for Geospatial data where there is extra metadata[2] needed to interpret underlying types (e.g. factor

Re: [DISCUSS] Linear Formula Types

2024-01-07 Thread Julian Hyde
If the DB layer above Arrow supports it, I would define a (non-stored) calculated column. Given celsius_percent between 0 and 1, I would define fahrenheit as (32 + celsius_percent * 1.8). A good query optimizer would convert the condition 'where fahrenheit > 122' into 'where celsius_percent > 0.5'.

Re: [DISCUSS] Linear Formula Types

2024-01-06 Thread Andrew Lamb
Hi Elliot, Given your description, I agree extension types sound like they may be a good idea, similar to geoarrow[1] for Geospatial data where there is extra metadata[2] needed to interpret underlying types (e.g. factor and offset) Andrew [1] https://github.com/geoarrow/geoarrow [2] https://arr

[DISCUSS] Linear Formula Types

2024-01-06 Thread Morrison-Reed Elliot (BEG/PJ-EDS-NA)
Background I have been looking into using parquet files for storing and working with automotive data. One interesting thing about automotive data is that most communication happens on the CAN bus where we have extremely limited bandwidth. In order to encode "physical" values in a very space effici