Hi all, I'm using spark sql in python and want to write a udf that takes an entire Row as the argument. I tried something like:
def functionName(row): ... return a_string udfFunctionName=udf(functionName, StringType()) df.withColumn('columnName', udfFunctionName('*')) but this gives an error message: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/dataframe.py", line 1311, in withColumn return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx) File "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ File "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py", line 51, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: u"unresolved operator 'Project [address#0,name#1,PythonUDF#functionName(*) AS columnName#26];" Does anyone know how this can be done or whether this is possible? Thank you, Nisrina.