Dandandan commented on PR #20481:
URL: https://github.com/apache/datafusion/pull/20481#issuecomment-3954788413

   I checked both in this branch and earlier, the extended query 6 returns 
"sharecount"  with a count of 0.
   It looks like the morsel-driven path is more lazy so it prunes out 
everything already in `DataSourceExec` :
   
   ```
   |                   |           DataSourceExec: file_groups={10 groups: 
[[Users/danielheres/Code/arrow-datafusion-personal/benchmarks/data/hits.parquet:0..1477997645],
 
[Users/danielheres/Code/arrow-datafusion-personal/benchmarks/data/hits.parquet:1477997645..2955995290],
 
[Users/danielheres/Code/arrow-datafusion-personal/benchmarks/data/hits.parquet:2955995290..4433992935],
 
[Users/danielheres/Code/arrow-datafusion-personal/benchmarks/data/hits.parquet:4433992935..5911990580],
 
[Users/danielheres/Code/arrow-datafusion-personal/benchmarks/data/hits.parquet:5911990580..7389988225],
 ...]}, projection=[URL, Referer, IsMobile, MobilePhoneModel, ClientTimeZone, 
SocialAction, SocialSourceNetworkID, UTMSource, UTMCampaign], 
file_type=parquet, predicate=IsMobile@32 = 1 AND MobilePhoneModel@34 LIKE 
iPhone% AND SocialAction@77 = share AND (SocialSourceNetworkID@85 = 5 OR 
SocialSourceNetworkID@85 = 12) AND ClientTimeZone@44 >= -5 AND 
ClientTimeZone@44 <= 5 AND regexp_match(Referer@14, \/campaign
 \/(spring|summer)_promo) IS NOT NULL AND CASE WHEN 
split_part(split_part(URL@13, resolution=, 2), &, 1) ~ ^\d+$ THEN 
CAST(CAST(split_part(split_part(URL@13, resolution=, 2), &, 1) AS Int32) AS 
Int64) ELSE 0 END > 1920 AND levenshtein(UTMSource@95, UTMCampaign@97) < 3, 
pruning_predicate=IsMobile_null_count@2 != row_count@3 AND IsMobile_min@0 <= 1 
AND 1 <= IsMobile_max@1 AND MobilePhoneModel_null_count@6 != row_count@3 AND 
MobilePhoneModel_min@4 <= iPhonf AND iPhone <= MobilePhoneModel_max@5 AND 
SocialAction_null_count@9 != row_count@3 AND SocialAction_min@7 <= share AND 
share <= SocialAction_max@8 AND (SocialSourceNetworkID_null_count@12 != 
row_count@3 AND SocialSourceNetworkID_min@10 <= 5 AND 5 <= 
SocialSourceNetworkID_max@11 OR SocialSourceNetworkID_null_count@12 != 
row_count@3 AND SocialSourceNetworkID_min@10 <= 12 AND 12 <= 
SocialSourceNetworkID_max@11) AND ClientTimeZone_null_count@14 != row_count@3 
AND ClientTimeZone_max@13 >= -5 AND ClientTimeZone_null_count@14 != row_count@3 
 AND ClientTimeZone_min@15 <= 5, required_guarantees=[IsMobile in (1), 
SocialAction in (share), SocialSourceNetworkID in (12, 5)], 
metrics=[output_rows=0, elapsed_compute=10ns, output_bytes=0.0 B, 
output_batches=0, files_ranges_pruned_statistics=0 total → 0 matched, 
row_groups_pruned_statistics=226 total → 0 matched, 
row_groups_pruned_bloom_filter=0 total → 0 matched, page_index_pages_pruned=0 
total → 0 matched, page_index_rows_pruned=0 total → 0 matched, 
limit_pruned_row_groups=0 total → 0 matched, batches_split=0, bytes_scanned=0, 
file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=10, 
predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, 
predicate_cache_inner_records=0, predicate_cache_records=0, 
bloom_filter_eval_time=20ns, metadata_load_time=4.22ms, 
page_index_eval_time=20ns, row_pushdown_eval_time=20ns, 
statistics_eval_time=544.05µs, time_elapsed_opening=7.29ms, 
time_elapsed_processing=7.40ms, time_elapsed_scanning_total=
 10ns, time_elapsed_scanning_until_data=10ns, scan_efficiency_ratio=0% (0/14.78 
B)] |
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to