Re: [Format] Array/RowBatch filters

2020-01-27 Thread Micah Kornfield
Thanks for all the input: > I think having support for this in some way in the IPC > protocol makes sense (it seems slightly less important for the C API > but worth thinking about The way I read Jacques e-mail is it seems like the opposite might be true (at least for Dremio). For IPC I think th

Re: [Format] Array/RowBatch filters

2020-01-27 Thread Wes McKinney
hi Micah -- I think having support for this in some way in the IPC protocol makes sense (it seems slightly less important for the C API but worth thinking about). It's helpful to know that Dremio (a big Arrow user) already employs various filters / selection vectors. The question is how mechanical

Re: [Format] Array/RowBatch filters

2020-01-26 Thread Jacques Nadeau
At Dremio, we use four main types of selection vector/bitmaps: Dense Format (record valid or not, no ordering) - single bit (bitmap) Sparse formats (identifies valid records as well as their order) - 2 byte (for record batches up to 2^16 records). - 4 byte (for 2^16 batches of 2^16 records); - 6

Re: [Format] Array/RowBatch filters

2020-01-24 Thread Micah Kornfield
I was thinking selection vector/bitmap (possibly with different encodings), but really nothing for now. Ordinarily, I'd lean towards YAGNI but there isn't a good way to add this in easily in a forward compatible way unless we add a placeholder enum/table for 1.0 (the default option would be no fil

Re: [Format] Array/RowBatch filters

2020-01-24 Thread Francois Saint-Jacques
By filter, you mean a filter expression, or a selection vector/bitmap? On Thu, Jan 23, 2020 at 11:38 PM Micah Kornfield wrote: > > One of the things that I think got overlooked in the conversation on having > a slice offset in the C API was a suggestion from Jacques of perhaps > generalizing the

[Format] Array/RowBatch filters

2020-01-23 Thread Micah Kornfield
One of the things that I think got overlooked in the conversation on having a slice offset in the C API was a suggestion from Jacques of perhaps generalizing the concept to an arbitrary "filter" for arrays/record batches. I believe this point was also discussed in the past as well. I'm not advoca