In the application I'm working on I'm reading a parquet file and creating a table to keep the records in memory.
This gist has the idea of it https://gist.github.com/elferherrera/a2a796ae83a7203f58de704c178c44ef I would like to keep it as pure Arrow because I have found that it is super fast to create references to the data and create HashMaps with the information read from the parquet. The limitation I have is that I have to change the type on the column in the code every time I want to extract data from a column that is not a StringArray, either with an iterator or using a value method. I will go through the scalar example you are using in datafusion to implement something similar. Thanks On Thu, Jan 28, 2021 at 12:06 PM Andrew Lamb <al...@influxdata.com> wrote: > I think this approach would work (and we have something similar in > DataFusion (ScalarValue) > > https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46 > -- though it is an enum rather than a Trait, I think the idea is basically > the same) > > I think this API would be reasonable to implement (and I think would be > worth considering adding to Arrow for usability), but I fear it will be > quite slow as now the program would have to do some sort of type dispatch > on each element in an array rather than once for the entire array. > > On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera < > fernando.j.herr...@gmail.com> wrote: > > > Hi Jorge, > > > > What about making the Array::value return a &dyn ValueTrait. This new > > ValueTrait would have to be implemented for all the possible values that > > can be returned from the arrays > > > > Fernando > > > > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, < > jorgecarlei...@gmail.com > > > > > wrote: > > > > > Hi Fernando, > > > > > > I tried that some time ago, but I was unable to do so. The reason is > that > > > Array is a trait that needs to support also being a trait object (i.e. > > > support `&dyn Array`). > > > > > > Let's try here: what type should `Array::value` return? One option is > to > > > make Array a generic. But if Array is a generic, we can't support `dyn > > > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes > > > against the requirement that we can use `Array` without knowing its > > > compile-time type. > > > > > > If we make the function `value<T>()` a generic without constraints, > then > > > all concrete arrays (e.g. PrimitiveArray) will need to implement that, > > > which is not possible because e.g. `StringArray` does not know how to > > yield > > > a value of e.g. `f32`. > > > > > > I also tried a softer version recently: use ListArray<T: Array>, i.e. > try > > > to change `ListArray` to be a generic over Array and have `values(i)` > > > return the concrete type. However, even that does not work because it > is > > > impossible to tell how nested a ListArray will be until we read the > data > > > (i.e. after the program was compiled), which means that the compiler > will > > > be unable to compile all (potentially nested) possible variations of > the > > > generic. > > > > > > So, overall, this exercise convinced me that what we have is already > the > > > simplest (but no simpler) API that we can offer under the requirements > we > > > have (But I would love to be proven wrong, as I share your concerns) > > > > > > Best, > > > Jorge > > > > > > > > > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera < > > > fernando.j.herr...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > I'm wondering if it has been considered to move the value function > that > > > is > > > > implemented in all the arrays (StringArray, BooleanArray, ListArray, > > etc) > > > > as part of the Array trait? > > > > > > > > This would help when extracting values from generic arrays that > > implement > > > > dyn Array without having to manually downcast the array all the time > to > > > > read a value from the array. > > > > > > > > Thanks, > > > > > > > > > >