Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

via GitHub Tue, 08 Apr 2025 04:45:21 -0700


acking-you commented on issue #15631:
URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2786166445

This might require manual SIMD for optimization, but that would increase the
porting difficulty([As duckdb
says](https://duckdb.org/faq.html#does-duckdb-use-simd)). However, perhaps an
alternative approach could be tried to make it easier for the compiler to
optimize. If feasible, it also seems capable of improving the performance of
related calls in the arrow-rs library.

## Some exploration
In ClickHouse's filter implementation, there is a classic manual SIMD
implementation approach:
[code](https://github.com/ClickHouse/ClickHouse/blob/master/src/Columns/ColumnsCommon.cpp#L237-L275)

The function involves loading multiple boolean values at once using SIMD
instructions to increase the loop step.
The best-case scenarios are:
- The filter does not match, skipping to the next iteration.
- The filter fully matches, copying multiple rows at once.

For other cases, the performance degrades to a handling method similar to
when SIMD is not used (the additional overhead being the preparation of SIMD
variables).

If this approach is applied to check whether a bit is 1 or 0, it should
incur almost no overhead (only requiring a comparison with `0` or `ffff`).

At the same time, could DataFusion's filter process also be optimized using
this method?

Alternatively, could we find another form of vectorization that does not
involve manual unrolling?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Improve the performance of early exit evaluation in binary_expr [datafusion]

Reply via email to