Jefffrey commented on issue #4030:
URL: https://github.com/apache/datafusion/issues/4030#issuecomment-3315002759

   I think to achieve the expected behaviour you would need to mark the UDF as 
volatile. An updated example using latest main:
   
   ```rust
   use arrow::array::{ArrayRef, BooleanArray};
   use datafusion::arrow::datatypes::DataType;
   use datafusion::common::cast::as_float32_array;
   use datafusion::error::Result;
   use datafusion::logical_expr::{ColumnarValue, Volatility};
   use datafusion::prelude::*;
   use std::sync::Arc;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let ctx = SessionContext::new();
   
       ctx.register_csv(
           "csv",
           "/Users/jeffrey/Downloads/test.csv",
           CsvReadOptions::new(),
       )
       .await
       .unwrap();
   
       let udf = {
           create_udf(
               "rand_bool",
               vec![DataType::Float32],
               DataType::Boolean,
               Volatility::Volatile, // From Stable to Volatile
               Arc::new(|args: &[ColumnarValue]| {
                   let ColumnarValue::Array(l) = &args[0] else {
                       panic!("should be array")
                   };
                   const BOOLS: [bool; 4] = [true, true, false, false];
   
                   let x = as_float32_array(l)?;
                   println!("udf in: {x:?}");
   
                   Ok(ColumnarValue::from(Arc::new(BooleanArray::from(Vec::from(
                       &BOOLS[..x.len()],
                   ))) as ArrayRef))
               }),
           )
       };
   
       ctx.register_udf(udf.clone());
   
       let query = ctx
           .sql("SELECT * FROM (SELECT *, rand_bool(num) AS rand FROM csv) 
WHERE NOT rand")
           .await?;
   
       query.clone().show_limit(10).await.unwrap();
       query.explain(false, false).unwrap().show().await.unwrap();
   
       Ok(())
   }
   ```
   
   - Switching back to `Stable` gives the same output as described in the 
issue, with two evaluations
   
   Gives output:
   
   ```sh
   udf in: PrimitiveArray<Float32>
   [
     100.0,
     200.0,
     150.0,
     300.0,
   ]
   +--------+-----+-------+
   | name_1 | num | rand  |
   +--------+-----+-------+
   | andy   | 150 | false |
   | paul   | 300 | false |
   +--------+-----+-------+
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
   |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Filter: NOT rand                                           
                                                                                
   |
   |               |   Projection: csv.name_1, csv.num, rand_bool(CAST(csv.num 
AS Float32)) AS rand                                                            
    |
   |               |     TableScan: csv projection=[name_1, num]                
                                                                                
   |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                                                                                
   |
   |               |   FilterExec: NOT rand@2                                   
                                                                                
   |
   |               |     ProjectionExec: expr=[name_1@0 as name_1, num@1 as 
num, rand_bool(CAST(num@1 AS Float32)) as rand]                                 
       |
   |               |       RepartitionExec: partitioning=RoundRobinBatch(12), 
input_partitions=1                                                              
     |
   |               |         DataSourceExec: file_groups={1 group: 
[[Users/jeffrey/Downloads/test.csv]]}, projection=[name_1, num], file_type=csv, 
has_header=true |
   |               |                                                            
                                                                                
   |
   
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   Can see it's evaluated only once. I guess we could try update the docs 
around `Volatility` to see if we can make this clearer 🤔 
   
   
https://github.com/apache/datafusion/blob/1488e1010a670ee5973fc621af1ec73fd92c9b71/datafusion/expr-common/src/signature.rs#L46-L86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to