Thanks. That makes sense. I have now refactored my code so that I have a compute layer above the arrow types and expressions. My expressions can now evaluate to arrays or scalars (which makes sense anyway if I want to compute array.len() and aggregate functions like min, max, which I haven't gotten to yet.
Thanks, Andy. On Sun, Mar 25, 2018 at 9:16 PM, Wes McKinney <wesmck...@gmail.com> wrote: > Hi Andy, > > Broadcasting is really a computational concept rather than one related to > memory representation / data structures -- we haven't developed significant > computational libraries in Apache Arrow yet, but it is within scope for the > project. > > In C++ I have anticipated adding an object model for scalars, and we have > the arrow::compute::Datum union object with a placeholder for scalars > values once we have some objects to deal with them. There isn't anything > specific about the columnar format that helps with this. > > It might be worth looking at data processing engines like Dremio that use > Arrow to see how they handle scalar values. > > Wes > > On Sun, Mar 25, 2018, 10:49 PM Andy Grove <andygrov...@gmail.com> wrote: > > > Numpy uses the concept of broadcasting to perform math operations on > arrays > > of different sizes: > > https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html > > > > A good example is multiplying an array by a single literal value or > > comparing an array to a single value. > > > > I'm in the process of converting my project to use Arrow types and this > is > > one use case where my design differs from Arrow and I am wondering what > the > > best solution is. > > > > My execution engine applies operations to arrays. I need way to > represent > > literal values so I can evaluate "a < 5" where 'a' is an array. The > result > > of this evaluation would be an array of booleans. > > > > My current design has a special type of array to represent a single > value: > > > > pub enum ArrayData { > > BroadcastVariable(ScalarValue), > > Boolean(Vec<bool>), > > Float32(Vec<f32>), > > Float64(Vec<f64>), > > .... > > } > > > > I could implement some new type that can contain either an Arrow array > or a > > scalar value, but wondered how others are dealing with this. > > > > Thanks, > > > > Andy. > > >