kylebarron opened a new issue, #13762: URL: https://github.com/apache/datafusion/issues/13762
### Describe the bug Returning any dense union from `ScalarUDF` currently fails. ### To Reproduce ```rs use std::any::Any; use std::sync::{Arc, OnceLock}; use arrow::array::UnionBuilder; use arrow::datatypes::{Float64Type, Int32Type}; use arrow_array::Array; use arrow_schema::{DataType, Field, UnionFields, UnionMode}; use datafusion::logical_expr::{ ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility, }; #[derive(Debug)] pub(super) struct UnionExample { signature: Signature, } impl UnionExample { pub fn new() -> Self { Self { signature: Signature::any(0, Volatility::Immutable), } } } static DOC: OnceLock<Documentation> = OnceLock::new(); impl ScalarUDFImpl for UnionExample { fn as_any(&self) -> &dyn Any { self } fn name(&self) -> &str { "example_union" } fn signature(&self) -> &Signature { &self.signature } fn return_type(&self, _arg_types: &[DataType]) -> datafusion::error::Result<DataType> { let fields = UnionFields::new( vec![0, 1], vec![ Arc::new(Field::new("a", DataType::Int32, false)), Arc::new(Field::new("b", DataType::Float64, false)), ], ); Ok(DataType::Union(fields, UnionMode::Dense)) } fn invoke(&self, args: &[ColumnarValue]) -> datafusion::error::Result<ColumnarValue> { todo!() } fn invoke_no_args(&self, _number_rows: usize) -> datafusion::error::Result<ColumnarValue> { let mut builder = UnionBuilder::new_dense(); builder.append::<Int32Type>("a", 1).unwrap(); builder.append::<Float64Type>("b", 3.0).unwrap(); builder.append::<Int32Type>("a", 4).unwrap(); let arr = builder.build().unwrap(); assert_eq!(arr.type_id(0), 0); assert_eq!(arr.type_id(1), 1); assert_eq!(arr.type_id(2), 0); assert_eq!(arr.value_offset(0), 0); assert_eq!(arr.value_offset(1), 0); assert_eq!(arr.value_offset(2), 1); let arr = arr.slice(0, 1); assert!(matches!( arr.data_type(), DataType::Union(_, UnionMode::Dense) )); Ok(ColumnarValue::Array(Arc::new(arr))) } fn documentation(&self) -> Option<&Documentation> { Some(DOC.get_or_init(|| Documentation::builder().build().unwrap())) } } #[cfg(test)] mod test { use super::*; use datafusion::prelude::*; #[tokio::test] async fn test() { let ctx = SessionContext::new(); ctx.register_udf(UnionExample::new().into()); let out = ctx.sql("SELECT example_union();").await.unwrap(); out.show().await.unwrap(); } } ``` Gives ``` called `Result::unwrap()` on an `Err` value: ArrowError(InvalidArgumentError("column types must match schema types, expected Union([(0, Field { name: \"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"b\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Dense) but found Union([(0, Field { name: \"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), (1, Field { name: \"b\", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} })], Sparse) at column index 0"), None) ``` The only difference there is that "expected" has a Union type of `Dense` while "found" has a union type of `Sparse`. I'm returning dense array data from `invoke_no_args` and `return_type()` also returns a dense union. So it seems that internally the union array is being cast from dense to sparse somehow. ### Expected behavior Does not error with dense unions. ### Additional context I need to use a dense union to represent geospatial vector data of unknown geometry type and coordinate dimension. https://github.com/geoarrow/geoarrow/pull/43 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org