In the application I'm working on I'm reading a parquet file and creating a
table to keep the records in memory.

This gist has the idea of it
https://gist.github.com/elferherrera/a2a796ae83a7203f58de704c178c44ef

I would like to keep it as pure Arrow because I have found that it is super
fast to create references to the data and create HashMaps with the
information read from the parquet. The limitation I have is that I have to
change the type on the column in the code every time I want to extract data
from a column that is not a StringArray, either with an iterator or using a
value method.

I will go through the scalar example you are using in datafusion to
implement something similar.

Thanks


On Thu, Jan 28, 2021 at 12:06 PM Andrew Lamb <al...@influxdata.com> wrote:

> I think this approach would work (and we have something similar in
> DataFusion (ScalarValue)
>
> https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46
> -- though it is an enum rather than a Trait, I think the idea is basically
> the same)
>
> I think this API would be reasonable to implement (and I think would be
> worth considering adding to Arrow for usability), but I fear it will be
> quite slow as now the program would have to do some sort of type dispatch
> on each element in an array rather than once for the entire array.
>
> On Thu, Jan 28, 2021 at 5:50 AM Fernando Herrera <
> fernando.j.herr...@gmail.com> wrote:
>
> > Hi Jorge,
> >
> > What about making the Array::value return a &dyn ValueTrait. This new
> > ValueTrait would have to be implemented for all the possible values that
> > can be returned from the arrays
> >
> > Fernando
> >
> > On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão, <
> jorgecarlei...@gmail.com
> > >
> > wrote:
> >
> > > Hi Fernando,
> > >
> > > I tried that some time ago, but I was unable to do so. The reason is
> that
> > > Array is a trait that needs to support also being a trait object (i.e.
> > > support `&dyn Array`).
> > >
> > > Let's try here: what type should `Array::value` return? One option is
> to
> > > make Array a generic. But if Array is a generic, we can't support `dyn
> > > Array` without declaring its type (e.g. `dyn Array<i32>`), which goes
> > > against the requirement that we can use `Array` without knowing its
> > > compile-time type.
> > >
> > > If we make the function `value<T>()` a generic without constraints,
> then
> > > all concrete arrays (e.g. PrimitiveArray) will need to implement that,
> > > which is not possible because e.g. `StringArray` does not know how to
> > yield
> > > a value of e.g. `f32`.
> > >
> > > I also tried a softer version recently: use ListArray<T: Array>, i.e.
> try
> > > to change `ListArray` to be a generic over Array and have `values(i)`
> > > return the concrete type. However, even that does not work because it
> is
> > > impossible to tell how nested a ListArray will be until we read the
> data
> > > (i.e. after the program was compiled), which means that the compiler
> will
> > > be unable to compile all (potentially nested) possible variations of
> the
> > > generic.
> > >
> > > So, overall, this exercise convinced me that what we have is already
> the
> > > simplest (but no simpler) API that we can offer under the requirements
> we
> > > have (But I would love to be proven wrong, as I share your concerns)
> > >
> > > Best,
> > > Jorge
> > >
> > >
> > > On Wed, Jan 27, 2021 at 12:27 PM Fernando Herrera <
> > > fernando.j.herr...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm wondering if it has been considered to move the value function
> that
> > > is
> > > > implemented in all the arrays (StringArray, BooleanArray, ListArray,
> > etc)
> > > > as part of the Array trait?
> > > >
> > > > This would help when extracting values from generic arrays that
> > implement
> > > > dyn Array without having to manually downcast the array all the time
> to
> > > > read a value from the array.
> > > >
> > > > Thanks,
> > > >
> > >
> >
>

Reply via email to