kylebarron opened a new issue, #13762:
URL: https://github.com/apache/datafusion/issues/13762

   ### Describe the bug
   
   Returning any dense union from `ScalarUDF` currently fails.
   
   ### To Reproduce
   
   ```rs
   use std::any::Any;
   use std::sync::{Arc, OnceLock};
   
   use arrow::array::UnionBuilder;
   use arrow::datatypes::{Float64Type, Int32Type};
   use arrow_array::Array;
   use arrow_schema::{DataType, Field, UnionFields, UnionMode};
   use datafusion::logical_expr::{
       ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility,
   };
   
   #[derive(Debug)]
   pub(super) struct UnionExample {
       signature: Signature,
   }
   
   impl UnionExample {
       pub fn new() -> Self {
           Self {
               signature: Signature::any(0, Volatility::Immutable),
           }
       }
   }
   
   static DOC: OnceLock<Documentation> = OnceLock::new();
   
   impl ScalarUDFImpl for UnionExample {
       fn as_any(&self) -> &dyn Any {
           self
       }
   
       fn name(&self) -> &str {
           "example_union"
       }
   
       fn signature(&self) -> &Signature {
           &self.signature
       }
   
       fn return_type(&self, _arg_types: &[DataType]) -> 
datafusion::error::Result<DataType> {
           let fields = UnionFields::new(
               vec![0, 1],
               vec![
                   Arc::new(Field::new("a", DataType::Int32, false)),
                   Arc::new(Field::new("b", DataType::Float64, false)),
               ],
           );
           Ok(DataType::Union(fields, UnionMode::Dense))
       }
   
       fn invoke(&self, args: &[ColumnarValue]) -> 
datafusion::error::Result<ColumnarValue> {
           todo!()
       }
   
       fn invoke_no_args(&self, _number_rows: usize) -> 
datafusion::error::Result<ColumnarValue> {
           let mut builder = UnionBuilder::new_dense();
           builder.append::<Int32Type>("a", 1).unwrap();
           builder.append::<Float64Type>("b", 3.0).unwrap();
           builder.append::<Int32Type>("a", 4).unwrap();
           let arr = builder.build().unwrap();
   
           assert_eq!(arr.type_id(0), 0);
           assert_eq!(arr.type_id(1), 1);
           assert_eq!(arr.type_id(2), 0);
   
           assert_eq!(arr.value_offset(0), 0);
           assert_eq!(arr.value_offset(1), 0);
           assert_eq!(arr.value_offset(2), 1);
   
           let arr = arr.slice(0, 1);
   
           assert!(matches!(
               arr.data_type(),
               DataType::Union(_, UnionMode::Dense)
           ));
   
           Ok(ColumnarValue::Array(Arc::new(arr)))
       }
   
       fn documentation(&self) -> Option<&Documentation> {
           Some(DOC.get_or_init(|| Documentation::builder().build().unwrap()))
       }
   }
   
   #[cfg(test)]
   mod test {
       use super::*;
       use datafusion::prelude::*;
   
       #[tokio::test]
       async fn test() {
           let ctx = SessionContext::new();
           ctx.register_udf(UnionExample::new().into());
   
           let out = ctx.sql("SELECT example_union();").await.unwrap();
           out.show().await.unwrap();
       }
   }
   ```
   
   Gives
   
   ```
   called `Result::unwrap()` on an `Err` value: 
ArrowError(InvalidArgumentError("column types must match schema types, expected 
Union([(0, Field { name: \"a\", data_type: Int32, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} }), (1, Field { name: \"b\", data_type: 
Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], 
Dense) but found Union([(0, Field { name: \"a\", data_type: Int32, nullable: 
false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: 
\"b\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, 
metadata: {} })], Sparse) at column index 0"), None)
   ```
   
   The only difference there is that "expected" has a Union type of `Dense` 
while "found" has a union type of `Sparse`. I'm returning dense array data from 
`invoke_no_args` and `return_type()` also returns a dense union. So it seems 
that internally the union array is being cast from dense to sparse somehow.
   
   ### Expected behavior
   
   Does not error with dense unions.
   
   ### Additional context
   
   I need to use a dense union to represent geospatial vector data of unknown 
geometry type and coordinate dimension. 
https://github.com/geoarrow/geoarrow/pull/43


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to