Hi all, So we have these UDFs which take <1ms to operate and we're seeing pretty poor performance around them in practice, the overhead being >10ms for the projections (this data is deeply nested with ArrayTypes and MapTypes so that could be the cause). Looking at the logs and code for ScalaUDF, I noticed that there are a series of projections which take place before and after in order to make the Rows safe and then unsafe again. Is there any way to opt out of this and input/return InternalRows to skip the performance hit of the type conversion? It doesn't immediately appear to be possible but I'd like to make sure that I'm not missing anything.
I suspect we could make this possible by checking if typetags in the register function are all internal types, if they are, passing a false value for "needs[Input|Output]Conversion" to ScalaUDF and then in ScalaUDF checking for that flag to figure out if the conversion process needs to take place. We're still left with the issue of missing a schema in the case of outputting InternalRows, but we could expose the DataType parameter rather than inferring it in the register function. Is there anything else in the code that would prevent this from working? Regards, Hamel