from:"peng yu"

Re: udf of aggregation in pyspark dataframe ?

2016-09-29 Thread peng yu

df: - a|b|c --- 1|m|n 1|x | j 2|m|x ... import pyspark.sql.functions as F from pyspark.sql.types import MapType, StringType def my_zip(c, d): return dict(zip(c, d)) my_zip = F.udf(_my_zip, MapType(StingType(), StringType(), True), True) df.groupBy('a').agg(my_zip(collect_list

Re: udf of aggregation in pyspark dataframe ?

2016-09-29 Thread peng yu

btw, i am using spark 1.6.1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/udf-of-aggregation-in-pyspark-dataframe-tp27811p27812.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

udf of aggregation in pyspark dataframe ?

2016-09-29 Thread peng yu

Hi, is there a way to write a udf in pyspark support agg()? i search all over the docs and internet, and tested it out.. some say yes, some say no. and when i try those yes code examples, just complaint about AnalysisException: u"expression 'pythonUDF' is neither present in the group by, nor