adriangb commented on code in PR #20444:
URL: https://github.com/apache/datafusion/pull/20444#discussion_r2832276223


##########
datafusion/physical-expr/benches/in_list.rs:
##########
@@ -50,7 +51,9 @@ fn random_string(rng: &mut StdRng, len: usize) -> String {
 }
 
 const IN_LIST_LENGTHS: [usize; 4] = [3, 8, 28, 100];
+const DYNAMIC_LIST_LENGTHS: [usize; 3] = [3, 8, 28];

Review Comment:
   Does `('a', 1, 123.24)` also force this "dynamic" path? If so would use the 
term "heterogeneous" for that. If not and it's only columns that trigger this 
code path I would use the term "`LIST_WITH_COLUMNS_LENGTHS`.



##########
datafusion/physical-expr/benches/in_list.rs:
##########
@@ -219,6 +222,144 @@ fn bench_realistic_mixed_strings<A>(
     }
 }
 
+/// Benchmarks the dynamic evaluation path (no static filter) by including
+/// a column reference in the IN list, which prevents static filter creation.

Review Comment:
   It would be nice to show an example of how the arguments to this function 
map to the equivalent SQL being benchmarked.



##########
datafusion/physical-expr/benches/in_list.rs:
##########
@@ -219,6 +222,144 @@ fn bench_realistic_mixed_strings<A>(
     }
 }
 
+/// Benchmarks the dynamic evaluation path (no static filter) by including
+/// a column reference in the IN list, which prevents static filter creation.
+fn do_bench_dynamic(
+    c: &mut Criterion,
+    name: &str,
+    values: ArrayRef,
+    list_cols: &[ArrayRef],
+) {
+    let mut fields = vec![Field::new("a", values.data_type().clone(), true)];
+    let mut columns: Vec<ArrayRef> = vec![values];
+
+    // Build list expressions: mix of column refs (forces dynamic path)
+    let schema_fields: Vec<Field> = list_cols
+        .iter()
+        .enumerate()
+        .map(|(i, col_arr)| {
+            let name = format!("b{i}");
+            fields.push(Field::new(&name, col_arr.data_type().clone(), true));
+            columns.push(Arc::clone(col_arr));
+            Field::new(&name, col_arr.data_type().clone(), true)
+        })
+        .collect();
+
+    let schema = Schema::new(fields);
+    let list_exprs: Vec<Arc<dyn PhysicalExpr>> = schema_fields
+        .iter()
+        .map(|f| col(f.name(), &schema).unwrap())
+        .collect();
+
+    let expr = in_list(col("a", &schema).unwrap(), list_exprs, &false, 
&schema).unwrap();
+    let batch = RecordBatch::try_new(Arc::new(schema), columns).unwrap();
+
+    c.bench_function(name, |b| {
+        b.iter(|| black_box(expr.evaluate(black_box(&batch)).unwrap()))
+    });
+}
+
+/// Benchmarks the dynamic IN list path for Int32 arrays with column 
references.

Review Comment:
   It would be nice to see examples in this docstring of what the SQL being 
benchmarked is, e.g.:
   
   ```sql
   // select 1 in x from t;
   // where t:
   // create table t ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to