andygrove commented on PR #835:
URL: https://github.com/apache/datafusion-comet/pull/835#issuecomment-2294105508
After addressing the first round of feedback, we now have:
```rust
pub fn comet_filter_record_batch(
record_batch: &RecordBatch,
predicate: &BooleanArray,
) -> std::result::Result<RecordBatch, ArrowError> {
if predicate.true_count() == record_batch.num_rows() {
// special case where we just make an exact copy
let arrays: Vec<ArrayRef> = record_batch
.columns()
.iter()
.map(|array| {
let capacity = array.len();
let data = array.to_data();
let mut mutable = MutableArrayData::new(vec![&data], false,
capacity);
mutable.extend(0, 0, capacity);
make_array(mutable.freeze())
})
.collect();
let options =
RecordBatchOptions::new().with_row_count(Some(record_batch.num_rows()));
RecordBatch::try_new_with_options(record_batch.schema().clone(),
arrays, &options)
} else {
filter_record_batch(record_batch, predicate)
}
}
```
New benchmark results:
```
filter/comet_filter - few
time: [14.650 µs 14.727 µs 14.831 µs]
change: [-36.702% -35.128% -33.681%] (p = 0.00 <
0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
filter/comet_filter - many
time: [75.962 µs 76.172 µs 76.381 µs]
change: [-48.681% -48.501% -48.303%] (p = 0.00 <
0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
filter/comet_filter - all
time: [34.497 µs 34.628 µs 34.764 µs]
change: [-80.854% -80.527% -80.256%] (p = 0.00 <
0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
```
This certainly looks a lot better. I am running TPC-DS again to make sure
this really is always copying. I had tried an approach like this in the past
but ran into data corruption issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]