Greetings, I've been trying to migrate some piece of code from Scala - Spark 2.X to PySpark 3.0.1. Part of the software includes a User-Defined Aggregate Function (UDAF), which represented a two-fold problem:
- The UserDefinedAggregateFunction abstract class is deprecated in Spark >= 3.0 in favor of the Aggregator <https://spark.apache.org/docs/latest/sql-ref-functions-udf-aggregate.html> abstract class. - Pyspark doesn't implement UDAFs. It only has, as far as I can tell, a registerJavaUDAF <https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=udaf#pyspark.sql.UDFRegistration.registerJavaUDAF> function. I thus reimplemented the UserDefinedAggregateFunction as an Aggregator, and attempted to use registerJavaUDAF() to register the Scala code in PySpark. However, I'm met with the following exception: AnalysisException: class <class-name> doesn't implement interface > UserDefinedAggregateFunction; I'm able to use the old class implementing the UserDefinedAggregateFunction abstract class but keeping legacy code isn't desirable. My understanding of the documentation led me to believe pyspark would be able to register Scala UDAF implementing the Aggregator abstract class. I feel this is a bug, or at least means the documentation is outdated. It also means there's, as far as I can tell, no way to use UDAF natively with PySpark >= 3.0. Am I missing a solution? Regards, G. Dugernier -- DISCLAIMER : The content of this e-mail message does not constitute a commitment of S.A. ALOALTO N.V. or its subsidiaries/affiliates. This e-mail and any attachments thereto may contain information which is confidential and/or protected by intellectual property rights and are intended for the intended recipient only. Any use of the information contained herein (including, but not limited to, total or partial reproduction, communication or distribution in any form) by persons other than the designated recipient(s) is prohibited. If an addressing or transmission error has misdirected this e-mail, please notify the author, either by telephone or by e-mail and delete the material from any computer.