df:
-
a|b|c
---
1|m|n
1|x | j
2|m|x
...
import pyspark.sql.functions as F
from pyspark.sql.types import MapType, StringType
def my_zip(c, d):
return dict(zip(c, d))
my_zip = F.udf(_my_zip, MapType(StingType(), StringType(), True), True)
df.groupBy('a').agg(my_zip(collect_list
btw, i am using spark 1.6.1
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/udf-of-aggregation-in-pyspark-dataframe-tp27811p27812.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
Hi,
is there a way to write a udf in pyspark support agg()?
i search all over the docs and internet, and tested it out.. some say yes,
some say no.
and when i try those yes code examples, just complaint about
AnalysisException: u"expression 'pythonUDF' is neither present in the group
by, nor