CuteChuanChuan opened a new pull request, #20498: URL: https://github.com/apache/datafusion/pull/20498
Apply the same specialization strategy used in arrow-rs filter.rs to the scatter function used in CASE expression evaluation. Key changes: - Add selectivity-based iteration strategy (set_slices vs set_indices) - Add type dispatch infrastructure (scatter_array) - Implement core helpers: scatter_native, scatter_bits, scatter_null_mask - Add fast paths for all-true and all-false masks Still TODO: - scatter_primitive (currently todo!()) - scatter_boolean, scatter_bytes, scatter_byte_view - scatter_fixed_size_binary, scatter_dict - scatter_fallback - Additional tests Ref: https://github.com/apache/datafusion/issues/11570 ## Which issue does this PR close? Related to #11570 (scatter optimization suggested in https://github.com/apache/datafusion/pull/19994#issuecomment-3860528711) ## Rationale for this change Profiling shows scatter consumes 50%+ of elapsed time in the "10% zeroes" divide-by-zero protection benchmark. The current implementation uses the generic MutableArrayData path for all array types. ## What changes are included in this PR? Apply the same type-specific specialization strategy used in arrow-rs filter.rs to scatter: - Selectivity-based iteration: set_slices() for high selectivity, set_indices() for low selectivity - Type-specific dispatch via downcast_primitive_array! for primitives, boolean, bytes, byte views, dictionary, etc. - Fast paths for all-true and all-false masks - Fallback to MutableArrayData for unsupported types WIP - core infrastructure is in place, type-specific implementations are being added. ## Are these changes tested? Existing 4 scatter tests preserved. Additional tests will be added. ## Are there any user-facing changes? No. Public API signature is unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
