xinlifoobar commented on issue #11413:
URL: https://github.com/apache/datafusion/issues/11413#issuecomment-2230502588
Sorry it takes longer than I expected to make this works end-to-end.
From my perspective,
Good points:
- Provide uniform way to implement functions against record batches.
- Code saving.
Bad points:
- Due to the macro implementation, the `global_registry` features needs to
be defined at the crate that references `arrow-udf`. otherwise, it would not
work.
- Difficult to leverage arrow infrastructures projects like `arrow-string`
or `arrow-ord`.
- Lack of support for operations against array and scalar.
- By default all udf are private, lack of a way to reference the udf that
could be used in e.g., `ExprPlanner`.
Neural:
- The `arrow-udf` interfaces are targeting `RecordBatch` and `Field` while
`Datafusion` uses `ColumnarValue` and `Datatype`. I'd vote for both
implementations but thought of `RecordBatch` are more nature abstraction while
take advantages of `arrow`.
- Lack of support of Arrow types that Datafusion needs, e.g, `Decimal128`.
I'd think we could replace some string functions, that are not supported by
`arrow-string` by `arrow-udf` to get rid of macros like `compute_utf8_op`. An
example would be
```rust
// declare concat
#[function("concat(string, string) -> string")]
#[function("concat(largestring, largestring) -> largestring")]
fn concat(lhs: &str, rhs: &str) -> String {
format!("{}{}", lhs, rhs)
}
// reference concat
apply_udf(
&ColumnarValue::Array(left),
&ColumnarValue::Array(right),
&Field::new("", DataType::Utf8, true),
"concat",
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]