Thanks Andrew and Jorge for the help.

I think the use of the ScalarValue enum is precisely what I want. I was
worried that downcasting the column every time you need to get a value
would be slow but I can see that you are doing that with the ScalarValue
enum (
https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L83).
That's great.


On Thu, Jan 28, 2021 at 12:21 PM Fernando Herrera <
fernando.j.herr...@gmail.com> wrote:

> In the application I'm working on I'm reading a parquet file and creating
> a table to keep the records in memory.
>
> This gist has the idea of it
> https://gist.github.com/elferherrera/a2a796ae83a7203f58de704c178c44ef
>
> I would like to keep it as pure Arrow because I have found that it is
> super fast to create references to the data and create HashMaps with the
> information read from the parquet. The limitation I have is that I have to
> change the type on the column in the code every time I want to extract data
> from a column that is not a StringArray, either with an iterator or using a
> value method.
>
> I will go through the scalar example you are using in datafusion to
> implement something similar.
>
> Thanks
>
>
> On Thu, Jan 28, 2021 at 12:06 PM Andrew Lamb <al...@influxdata.com> wrote:
>
>> I think this approach would work (and we have something similar in
>> DataFusion (ScalarValue)
>>
>> https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46
>> -- though it is an enum rather than a Trait, I think the idea is basically
>> the same)
>>
>> I think this API would be reasonable to implement (and I think would be
>> worth considering adding to Arrow for usability), but I fear it will be
>> quite slow as now the program would have to do some sort of type dispatch
>> on each element in an array rather than once for the entire array.
>>
>> On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera <
>> fernando.j.herr...@gmail.com> wrote:
>>
>> > Hi Jorge,
>> >
>> > What about making the Array::value return a &dyn ValueTrait. This new
>> > ValueTrait would have to be implemented for all the possible values that
>> > can be returned from the arrays
>> >
>> > Fernando
>> >
>> > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, <
>> jorgecarlei...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Fernando,
>> > >
>> > > I tried that some time ago, but I was unable to do so. The reason is
>> that
>> > > Array is a trait that needs to support also being a trait object (i.e.
>> > > support `&dyn Array`).
>> > >
>> > > Let's try here: what type should `Array::value` return? One option is
>> to
>> > > make Array a generic. But if Array is a generic, we can't support `dyn
>> > > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
>> > > against the requirement that we can use `Array` without knowing its
>> > > compile-time type.
>> > >
>> > > If we make the function `value<T>()` a generic without constraints,
>> then
>> > > all concrete arrays (e.g. PrimitiveArray) will need to implement that,
>> > > which is not possible because e.g. `StringArray` does not know how to
>> > yield
>> > > a value of e.g. `f32`.
>> > >
>> > > I also tried a softer version recently: use ListArray<T: Array>, i.e.
>> try
>> > > to change `ListArray` to be a generic over Array and have `values(i)`
>> > > return the concrete type. However, even that does not work because it
>> is
>> > > impossible to tell how nested a ListArray will be until we read the
>> data
>> > > (i.e. after the program was compiled), which means that the compiler
>> will
>> > > be unable to compile all (potentially nested) possible variations of
>> the
>> > > generic.
>> > >
>> > > So, overall, this exercise convinced me that what we have is already
>> the
>> > > simplest (but no simpler) API that we can offer under the
>> requirements we
>> > > have (But I would love to be proven wrong, as I share your concerns)
>> > >
>> > > Best,
>> > > Jorge
>> > >
>> > >
>> > > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
>> > > fernando.j.herr...@gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I'm wondering if it has been considered to move the value function
>> that
>> > > is
>> > > > implemented in all the arrays (StringArray, BooleanArray, ListArray,
>> > etc)
>> > > > as part of the Array trait?
>> > > >
>> > > > This would help when extracting values from generic arrays that
>> > implement
>> > > > dyn Array without having to manually downcast the array all the
>> time to
>> > > > read a value from the array.
>> > > >
>> > > > Thanks,
>> > > >
>> > >
>> >
>>
>

Reply via email to