Sojan James created ZEPPELIN-1411: ------------------------------------- Summary: UDF with pyspark not working Key: ZEPPELIN-1411 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1411 Project: Zeppelin Issue Type: Bug Components: python-interpreter Affects Versions: 0.6.1 Reporter: Sojan James
The following UDF example doesn't work. {code} from pyspark.sql.types import StringType from pyspark.sql.functions import udf maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType()) df = sqlContext.createDataFrame([{'name': 'Alice', 'age': 1}]) df.withColumn("maturity", maturity_udf(df.age)) {code} Stack trace {code} Traceback (most recent call last): File "/tmp/zeppelin_pyspark-64075962331083004.py", line 266, in <module> raise Exception(traceback.format_exc()) Exception: Traceback (most recent call last): File "/tmp/zeppelin_pyspark-64075962331083004.py", line 259, in <module> exec(code) File "<stdin>", line 3, in <module> File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1789, in udf return UserDefinedFunction(f, returnType) File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1751, in __init__ self._judf = self._create_judf(name) File "/home/sjames/zeppelin/zeppelin-0.6.1-bin-all/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/functions.py", line 1758, in _create_judf jdt = ctx._ssql_ctx.parseDataType(self.returnType.json()) AttributeError: 'JavaMember' object has no attribute 'parseDataType' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)