acking-you commented on issue #15631:
URL: https://github.com/apache/datafusion/issues/15631#issuecomment-2786166445

   This might require manual SIMD for optimization, but that would increase the 
porting difficulty([As duckdb 
says](https://duckdb.org/faq.html#does-duckdb-use-simd)). However, perhaps an 
alternative approach could be tried to make it easier for the compiler to 
optimize. If feasible, it also seems capable of improving the performance of 
related calls in the arrow-rs library.
   
   ## Some exploration
   In ClickHouse's filter implementation, there is a classic manual SIMD 
implementation approach: 
[code](https://github.com/ClickHouse/ClickHouse/blob/master/src/Columns/ColumnsCommon.cpp#L237-L275)
   
   The function involves loading multiple boolean values at once using SIMD 
instructions to increase the loop step.  
   The best-case scenarios are:  
   - The filter does not match, skipping to the next iteration.  
   - The filter fully matches, copying multiple rows at once.  
   
   For other cases, the performance degrades to a handling method similar to 
when SIMD is not used (the additional overhead being the preparation of SIMD 
variables).  
   
   If this approach is applied to check whether a bit is 1 or 0, it should 
incur almost no overhead (only requiring a comparison with `0` or `ffff`).  
   
   At the same time, could DataFusion's filter process also be optimized using 
this method?  
   
   Alternatively, could we find another form of vectorization that does not 
involve manual unrolling?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to