Dandandan commented on PR #19639: URL: https://github.com/apache/datafusion/pull/19639#issuecomment-3722233173
So I guess the main factor is expressions like this being super expensive to evaluate: ``` predicate=DynamicFilter [ l_partkey@1 >= 3 AND l_partkey@1 <= 199962 AND hash_lookup ] AND DynamicFilter [ l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND hash_lookup ] AND DynamicFilter [ CASE hash_repartition % 10 WHEN 0 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 7 AND l_partkey@1 <= 199998 AND hash_lookup WHEN 1 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 2 AND l_partkey@1 <= 199996 AND hash_lookup WHEN 2 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 3 AND l_partkey@1 <= 200000 AND hash_lookup WHEN 3 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 1 AND l_partkey@1 <= 200000 AND hash_lookup WHEN 4 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 3 AND l_partkey@1 <= 199998 AND hash_lookup WHEN 5 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 3 AND l_partkey@1 <= 199998 AND hash_lookup WHEN 6 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 9 AND l_partkey@1 <= 200000 AND hash_lookup WHEN 7 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 1 AND l_partkey@1 <= 200000 AND hash_lookup WHEN 8 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 2 AND l_partkey@1 <= 199998 AND hash_lookup WHEN 9 THEN l_suppkey@2 >= 1 AND l_suppkey@2 <= 10000 AND l_partkey@1 >= 8 AND l_partkey@1 <= 199999 AND hash_lookup ELSE false END ] AND DynamicFilter [ CASE hash_repartition % 10 WHEN 0 THEN l_orderkey@0 >= 5 AND l_orderkey@0 <= 5999847 AND hash_lookup WHEN 1 THEN l_orderkey@0 >= 6 AND l_orderkey@0 <= 5999970 AND hash_lookup WHEN 2 THEN l_orderkey@0 >= 37 AND l_orderkey@0 <= 5999975 AND hash_lookup WHEN 3 THEN l_orderkey@0 >= 1 AND l_orderkey@0 <= 5999971 AND hash_lookup WHEN 4 THEN l_orderkey@0 >= 131 AND l_orderkey@0 <= 5999969 AND hash_lookup WHEN 5 THEN l_orderkey@0 >= 66 AND l_orderkey@0 <= 5999941 AND hash_lookup WHEN 6 THEN l_orderkey@0 >= 34 AND l_orderkey@0 <= 5999974 AND hash_lookup WHEN 7 THEN l_orderke y@0 >= 4 AND l_orderkey@0 <= 5999940 AND hash_lookup WHEN 8 THEN l_orderkey@0 >= 3 AND l_orderkey@0 <= 5999879 AND hash_lookup WHEN 9 THEN l_orderkey@0 >= 71 AND l_orderkey@0 <= 6000000 AND hash_lookup ELSE false END ], pruning_predicate=l_partkey_null_count@1 != row_count@2 AND l_partkey_max@0 >= 3 AND l_partkey_null_count@1 != row_count@2 AND l_partkey_min@3 <= 199962 AND l_suppkey_null_count@5 != row_count@2 AND l_suppkey_max@4 >= 1 AND l_suppkey_null_count@5 != row_count@2 AND l_suppkey_min@6 <= 10000, required_guarantees=[], metrics=[output_rows=319.4 K, elapsed_compute=10ns, output_bytes=21.9 MB, output_batches=733, files_ranges_pruned_statistics=10 total → 10 matched, row_groups_pruned_statistics=6 total → 6 matched, row_groups_pruned_bloom_filter=6 total → 6 matched, page_index_rows_pruned=6.00 M total → 6.00 M matched, batches_split=0, bytes_scanned=66.32 M, file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=0, predicate_cache_inner_records=0, pre dicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, bloom_filter_eval_time=451.85µs, filter_apply_time=1.13s, metadata_load_time=246.88ms, page_index_eval_time=602.59µs, row_pushdown_eval_time=20ns, statistics_eval_time=737.84µs, time_elapsed_opening=422.26ms, time_elapsed_processing=21.54s, time_elapsed_scanning_total=21.33s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
