Hi, I came to a limitation that I would like to propose a resolution to.
TL;DR; currently, users plan UDFs calls via a call of the form let e = scalar_functions(“my_udf”, vec![col(“a”)],DataType::Float64)]); df.select(vec![e]) The proposal is to use instead: let f = df.registry(); let e = f.udf(“my_udf”, vec![col(“a”)])?; # note: no DataType::Float64 df.select(vec![e]) so that users do not have to know the return type of the udf they are using (they still need to set it during registration). This will make our lives easier, and will also enable our own UDFs (e.g. sqrt) to support variable types (e.g. float32 and float64). This will be important for functions that return composite objects, such as array(), whose return type heavily depends on its input type. Proposal: https://docs.google.com/document/d/1Kzz642ScizeKXmVE1bBlbLvR663BKQaGqVIyy9cAscY/edit?usp=sharing Issue: https://issues.apache.org/jira/browse/ARROW-9836 PR: https://github.com/apache/arrow/pull/8032 Best, Jorge