Hi Song, Wes proposed a couple of different array types a few months ago in [1]. These were documented in [2]. In this proposal a constant array type was suggested in addition to a run-length encoded array type. During the discussion it was suggested that a constant array might just be a special case of a run-length encoded array. So there has been some discussion about adding support for this. However, these ideas have not been implemented yet and I'm not aware of any PRs so it can be difficult to know if/when something may happen.
In the present moment you might be able to use arrow::compute::ExecBatch which is what we use in the streaming execution engine to bypass this problem. An ExecBatch is a vector of datums and so each column could either be a scalar or an array. The batch itself has a length so if a batch with length 50 has a scalar column then that implies a constant array of 50 items. However, this does add complication to the logic (constantly needing to check if a column is a scalar or an array) and I do hope the RLE array is added as it can simplify a lot of this. -Weston [1] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq [2] https://docs.google.com/document/d/12aZi8Inez9L_JCtZ6gi2XDbQpCsHICNy9_EUxj4ILeE/edit#heading=h.j2x776n0ymmp On Thu, May 5, 2022 at 4:28 PM Dongxiao Song <songdongx...@hashdata.cn> wrote: > > Hello, > > I’m using arrow c++ as storage and computing structure of my own project, > which is a database based on PostgresSQL. > > But when computing with a batch containing constant value column, the constant > value has to be expanded to an array to store into batch, which is waste of > time > and memory. > > Arrow::scalar can be used as parameter for arrow functions, but cannot > represent > a column in batch. So if we want to compute a batch containing constant value > column, > the expansion of value is inevitable. > > This occurs mainly before batch serialization, and functions like FilterBatch. > > A constant-type array may solve this problem. It looks like an arrow array, > but only stores single constant value and number of rows. In functions like > Arrow::Sum, the result can even be computed by multiplication. > > Another solution is allowing batch containing Arrow::Scalar. > > All this is just a suggestion from an Arrow user. I’m not sure that whether > it is helpful > for Arrow project. > > Thanks, > Song