adriangb commented on code in PR #19639:
URL: https://github.com/apache/datafusion/pull/19639#discussion_r2660387539


##########
datafusion/datasource-parquet/src/opener.rs:
##########
@@ -1576,13 +1858,16 @@ mod test {
         assert_eq!(num_batches, 1);
         assert_eq!(num_rows, 1);
 
-        // Filter should not match the partition value or the data value
+        // Filter should not match the partition value or the data value.
+        // With adaptive selectivity tracking, unknown filters start in 
post_scan
+        // to learn their effectiveness. So the file is read and then filtered,
+        // resulting in 1 batch with 0 rows (rather than pruning the file 
entirely).

Review Comment:
   TODO: check this, maybe set the selectivity high for this test?



##########
datafusion/sqllogictest/test_files/parquet.slt:
##########
@@ -457,10 +457,7 @@ EXPLAIN
 logical_plan
 01)Filter: CAST(binary_as_string_default.binary_col AS Utf8View) LIKE 
Utf8View("%a%") AND CAST(binary_as_string_default.largebinary_col AS Utf8View) 
LIKE Utf8View("%a%") AND CAST(binary_as_string_default.binaryview_col AS 
Utf8View) LIKE Utf8View("%a%")
 02)--TableScan: binary_as_string_default projection=[binary_col, 
largebinary_col, binaryview_col], 
partial_filters=[CAST(binary_as_string_default.binary_col AS Utf8View) LIKE 
Utf8View("%a%"), CAST(binary_as_string_default.largebinary_col AS Utf8View) 
LIKE Utf8View("%a%"), CAST(binary_as_string_default.binaryview_col AS Utf8View) 
LIKE Utf8View("%a%")]
-physical_plan
-01)FilterExec: CAST(binary_col@0 AS Utf8View) LIKE %a% AND 
CAST(largebinary_col@1 AS Utf8View) LIKE %a% AND CAST(binaryview_col@2 AS 
Utf8View) LIKE %a%
-02)--RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1
-03)----DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/binary_as_string.parquet]]},
 projection=[binary_col, largebinary_col, binaryview_col], file_type=parquet, 
predicate=CAST(binary_col@0 AS Utf8View) LIKE %a% AND CAST(largebinary_col@1 AS 
Utf8View) LIKE %a% AND CAST(binaryview_col@2 AS Utf8View) LIKE %a%
+physical_plan DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/parquet/binary_as_string.parquet]]},
 projection=[binary_col, largebinary_col, binaryview_col], file_type=parquet, 
predicate=CAST(binary_col@0 AS Utf8View) LIKE %a% AND CAST(largebinary_col@1 AS 
Utf8View) LIKE %a% AND CAST(binaryview_col@2 AS Utf8View) LIKE %a%

Review Comment:
   Right I'm wondering if the perf degradation we're seeing is just loss of 
parallelism



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to