Hi Andy, Broadcasting is really a computational concept rather than one related to memory representation / data structures -- we haven't developed significant computational libraries in Apache Arrow yet, but it is within scope for the project.
In C++ I have anticipated adding an object model for scalars, and we have the arrow::compute::Datum union object with a placeholder for scalars values once we have some objects to deal with them. There isn't anything specific about the columnar format that helps with this. It might be worth looking at data processing engines like Dremio that use Arrow to see how they handle scalar values. Wes On Sun, Mar 25, 2018, 10:49 PM Andy Grove <andygrov...@gmail.com> wrote: > Numpy uses the concept of broadcasting to perform math operations on arrays > of different sizes: > https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html > > A good example is multiplying an array by a single literal value or > comparing an array to a single value. > > I'm in the process of converting my project to use Arrow types and this is > one use case where my design differs from Arrow and I am wondering what the > best solution is. > > My execution engine applies operations to arrays. I need way to represent > literal values so I can evaluate "a < 5" where 'a' is an array. The result > of this evaluation would be an array of booleans. > > My current design has a special type of array to represent a single value: > > pub enum ArrayData { > BroadcastVariable(ScalarValue), > Boolean(Vec<bool>), > Float32(Vec<f32>), > Float64(Vec<f64>), > .... > } > > I could implement some new type that can contain either an Arrow array or a > scalar value, but wondered how others are dealing with this. > > Thanks, > > Andy. >