For what it is worth, I think kernels that avoid materializing temporary buffers are quite interesting.
On Wed, Feb 10, 2021 at 3:42 PM Ben Chambers <bchamb...@apache.org> wrote: > Aha. I was looking in the wrong place. Buffer does have `bitand`, `bitor` > and `not` methods on it that seem to wrap the underlying `buffer_bin_and` > and `buffer_bin_or`, etc. > > I'm still curious on whether it would make sense to offer some more > variants of those (`A && !B`, for instance) to avoid materializing > temporary buffers and/or creating a variant of filter that treats `null` as > `false`. If either of those make sense I'm happy to take a stab at them > sometime if there are any thoughts on the direction to take. > > Thanks, and sorry for the spam! > > -- Ben > > On Wed, Feb 10, 2021 at 11:13 AM Ben Chambers <bchamb...@apache.org> > wrote: > > > Oh, another aspect of the issue that I forgot to mention is that > > `filter`(which I'm trying to use with these booleans) has this warning: > > > > "WARNING: the nulls of filter are ignored and the value on its slot is > > considered. Therefore, it is considered undefined behavior to pass filter > > with null values." > > > > So, I guess a third option would be a variant of `filter` which treated > > `null` as `false`. > > > > On Wed, Feb 10, 2021 at 10:50 AM Ben Chambers <bchamb...@apache.org> > > wrote: > > > >> I'm trying to implement something along the lines of "X if Y > Z", but > >> treating the case of Y or Z as null as "false". Interestingly, this is > >> difficult with the way the kernels are created: > >> > >> 1. `Y > Z` will treat `null > ???` as null. > >> "Perform left > right operation on two arrays. Non-null values are > >> greater than null values." > >> > >> 2. Ok, so maybe we write that as `(Y > Z) && not_null(Y)`. > >> "If either left or right value is null then the result is also null." > >> > >> Oh. So if the LHS is null, there is *no way* to get a boolean array with > >> a non-null value. > >> > >> 3. Ok, I'll go write my own operator to do this (`null_to_false` or > >> something like that). > >> I can do this, but it requires iterating over the booleans and combining > >> them. It seems like it would be easy to do using `buffer_bin_and`, but > that > >> is only visible within the Arrow crate. > >> > >> First, am I missing something with the above analysis? Is there some way > >> to provide non-null values for a boolean array that has nulls? > >> > >> Second, if not any thoughts on a solution? The two options I see > (without > >> changing behavior of existing kernels) would be: > >> 1. Add kernel(s) that provide a value in place of `null` (a general case > >> of the `null_to_false`). These could be specialized in the boolean case > to > >> use the `buffer_bin_and` as appropriate. > >> 2. Expose the `buffer_bin_and` and `buffer_bin_or` methods so that I (as > >> a user) can write the kernel myself. > >> > > >