Re: [I] Prototype implementing DataFusion functions / operators using `arrow-udf` liibrary [datafusion]

via GitHub Tue, 16 Jul 2024 03:00:51 -0700


xinlifoobar commented on issue #11413:
URL: https://github.com/apache/datafusion/issues/11413#issuecomment-2230502588


   Sorry it takes longer than I expected to make this works end-to-end.
   
   From my perspective,
   
   Good points:
   - Provide uniform way to implement functions against record batches.
   - Code saving.
   
   Bad points:
   - Due to the macro implementation, the `global_registry` features needs to 
be defined at the crate that references `arrow-udf`. otherwise, it would not 
work.
   - Difficult to leverage arrow infrastructures projects like `arrow-string` 
or `arrow-ord`.
   - Lack of support for operations against array and scalar.
   - By default all udf are private, lack of a way to reference the udf that 
could be used in e.g., `ExprPlanner`.
   
   Neural:
   - The `arrow-udf` interfaces are targeting `RecordBatch` and `Field` while 
`Datafusion` uses `ColumnarValue` and `Datatype`. I'd vote for both 
implementations but thought of `RecordBatch` are more nature abstraction while 
take advantages of `arrow`.
   - Lack of support of Arrow types that Datafusion needs, e.g, `Decimal128`.
   
   I'd think we could replace some string functions, that are not supported by 
`arrow-string` by `arrow-udf` to get rid of macros like `compute_utf8_op`. An 
example would be
   
   ```rust
   // declare concat
   #[function("concat(string, string) -> string")]
   #[function("concat(largestring, largestring) -> largestring")]
   fn concat(lhs: &str, rhs: &str) -> String {
       format!("{}{}", lhs, rhs)
   }
   
   // reference concat
   apply_udf(
       &ColumnarValue::Array(left),
       &ColumnarValue::Array(right),
       &Field::new("", DataType::Utf8, true),
       "concat",
   )
   ```
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Prototype implementing DataFusion functions / operators using `arrow-udf` liibrary [datafusion]

Reply via email to