acking-you commented on issue #15631: URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2786166445
This might require manual SIMD for optimization, but that would increase the porting difficulty([As duckdb says](https://duckdb.org/faq.html#does-duckdb-use-simd)). However, perhaps an alternative approach could be tried to make it easier for the compiler to optimize. If feasible, it also seems capable of improving the performance of related calls in the arrow-rs library. ## Some exploration In ClickHouse's filter implementation, there is a classic manual SIMD implementation approach: [code](https://github.com/ClickHouse/ClickHouse/blob/master/src/Columns/ColumnsCommon.cpp#L237-L275) The function involves loading multiple boolean values at once using SIMD instructions to increase the loop step. The best-case scenarios are: - The filter does not match, skipping to the next iteration. - The filter fully matches, copying multiple rows at once. For other cases, the performance degrades to a handling method similar to when SIMD is not used (the additional overhead being the preparation of SIMD variables). If this approach is applied to check whether a bit is 1 or 0, it should incur almost no overhead (only requiring a comparison with `0` or `ffff`). At the same time, could DataFusion's filter process also be optimized using this method? Alternatively, could we find another form of vectorization that does not involve manual unrolling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org