Thanks Andrew and Jorge for the help. I think the use of the ScalarValue enum is precisely what I want. I was worried that downcasting the column every time you need to get a value would be slow but I can see that you are doing that with the ScalarValue enum ( https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L83). That's great.
On Thu, Jan 28, 2021 at 12:21 PM Fernando Herrera < fernando.j.herr...@gmail.com> wrote: > In the application I'm working on I'm reading a parquet file and creating > a table to keep the records in memory. > > This gist has the idea of it > https://gist.github.com/elferherrera/a2a796ae83a7203f58de704c178c44ef > > I would like to keep it as pure Arrow because I have found that it is > super fast to create references to the data and create HashMaps with the > information read from the parquet. The limitation I have is that I have to > change the type on the column in the code every time I want to extract data > from a column that is not a StringArray, either with an iterator or using a > value method. > > I will go through the scalar example you are using in datafusion to > implement something similar. > > Thanks > > > On Thu, Jan 28, 2021 at 12:06 PM Andrew Lamb <al...@influxdata.com> wrote: > >> I think this approach would work (and we have something similar in >> DataFusion (ScalarValue) >> >> https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46 >> -- though it is an enum rather than a Trait, I think the idea is basically >> the same) >> >> I think this API would be reasonable to implement (and I think would be >> worth considering adding to Arrow for usability), but I fear it will be >> quite slow as now the program would have to do some sort of type dispatch >> on each element in an array rather than once for the entire array. >> >> On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera < >> fernando.j.herr...@gmail.com> wrote: >> >> > Hi Jorge, >> > >> > What about making the Array::value return a &dyn ValueTrait. This new >> > ValueTrait would have to be implemented for all the possible values that >> > can be returned from the arrays >> > >> > Fernando >> > >> > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, < >> jorgecarlei...@gmail.com >> > > >> > wrote: >> > >> > > Hi Fernando, >> > > >> > > I tried that some time ago, but I was unable to do so. The reason is >> that >> > > Array is a trait that needs to support also being a trait object (i.e. >> > > support `&dyn Array`). >> > > >> > > Let's try here: what type should `Array::value` return? One option is >> to >> > > make Array a generic. But if Array is a generic, we can't support `dyn >> > > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes >> > > against the requirement that we can use `Array` without knowing its >> > > compile-time type. >> > > >> > > If we make the function `value<T>()` a generic without constraints, >> then >> > > all concrete arrays (e.g. PrimitiveArray) will need to implement that, >> > > which is not possible because e.g. `StringArray` does not know how to >> > yield >> > > a value of e.g. `f32`. >> > > >> > > I also tried a softer version recently: use ListArray<T: Array>, i.e. >> try >> > > to change `ListArray` to be a generic over Array and have `values(i)` >> > > return the concrete type. However, even that does not work because it >> is >> > > impossible to tell how nested a ListArray will be until we read the >> data >> > > (i.e. after the program was compiled), which means that the compiler >> will >> > > be unable to compile all (potentially nested) possible variations of >> the >> > > generic. >> > > >> > > So, overall, this exercise convinced me that what we have is already >> the >> > > simplest (but no simpler) API that we can offer under the >> requirements we >> > > have (But I would love to be proven wrong, as I share your concerns) >> > > >> > > Best, >> > > Jorge >> > > >> > > >> > > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera < >> > > fernando.j.herr...@gmail.com> wrote: >> > > >> > > > Hi, >> > > > >> > > > I'm wondering if it has been considered to move the value function >> that >> > > is >> > > > implemented in all the arrays (StringArray, BooleanArray, ListArray, >> > etc) >> > > > as part of the Array trait? >> > > > >> > > > This would help when extracting values from generic arrays that >> > implement >> > > > dyn Array without having to manually downcast the array all the >> time to >> > > > read a value from the array. >> > > > >> > > > Thanks, >> > > > >> > > >> > >> >