andygrove opened a new issue, #757:
URL: https://github.com/apache/datafusion-comet/issues/757
### What is the problem the feature request solves?
Because `FilterExec` can sometimes return its input vectors without copying
them (in the case where the predicate evaluates to true for all rows in the
batch), we have to wrap this exec in a `CopyExec` when using this as the input
to a join:
```rust
// DataFusion Join operators keep the input batch internally. We need
// to copy the input batch to avoid the data corruption from reusing the
input
// batch.
let left = if can_reuse_input_batch(&left) {
Arc::new(CopyExec::new(left))
} else {
left
};
```
In the case where the filter does not select all rows in the batch, it will
make a copy of the selected rows, and then we copy them again in `CopyExec`.
Perhaps we could avoid this redundant copy.
### Describe the potential solution
One idea would be to modify `FilterExec` to add some metadata to the
returned batch to indicate whether it is returning any original vectors and
then have `CopyExec` avoid a copy when this metadata is set.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]