Thanks. That makes sense. I have now refactored my code so that I have a
compute layer above the arrow types and expressions. My expressions can now
evaluate to arrays or scalars (which makes sense anyway if I want to
compute array.len() and aggregate functions like min, max, which I haven't
gotten to yet.

Thanks,

Andy.

On Sun, Mar 25, 2018 at 9:16 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> Hi Andy,
>
> Broadcasting is really a computational concept rather than one related to
> memory representation / data structures -- we haven't developed significant
> computational libraries in Apache Arrow yet, but it is within scope for the
> project.
>
> In C++ I have anticipated adding an object model for scalars, and we have
> the arrow::compute::Datum union object with a placeholder for scalars
> values once we have some objects to deal with them. There isn't anything
> specific about the columnar format that helps with this.
>
> It might be worth looking at data processing engines like Dremio that use
> Arrow to see how they handle scalar values.
>
> Wes
>
> On Sun, Mar 25, 2018, 10:49 PM Andy Grove <andygrov...@gmail.com> wrote:
>
> > Numpy uses the concept of broadcasting to perform math operations on
> arrays
> > of different sizes:
> > https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
> >
> > A good example is multiplying an array by a single literal value or
> > comparing an array to a single value.
> >
> > I'm in the process of converting my project to use Arrow types and this
> is
> > one use case where my design differs from Arrow and I am wondering what
> the
> > best solution is.
> >
> > My execution engine applies operations to arrays. I need  way to
> represent
> > literal values so I can evaluate "a < 5" where 'a' is an array. The
> result
> > of this evaluation would be an array of booleans.
> >
> > My current design has a special type of array to represent a single
> value:
> >
> > pub enum ArrayData {
> >     BroadcastVariable(ScalarValue),
> >     Boolean(Vec<bool>),
> >     Float32(Vec<f32>),
> >     Float64(Vec<f64>),
> >     ....
> > }
> >
> > I could implement some new type that can contain either an Arrow array
> or a
> > scalar value, but wondered how others are dealing with this.
> >
> > Thanks,
> >
> > Andy.
> >
>

Reply via email to