Hi Leon,
please refer to this link:
https://docs.databricks.com/spark/latest/spark-sql/udf-python-pandas.html
I have found using GROUP MAP to be a bit tricky, please refer to the
statement: "All data for a group is loaded into memory before the function
is applied. This can lead to out of memory
Thanks Silvio. I need grouped map pandas UDF which takes a spark data frame as
the input and outputs a spark data frame having a different shape from input.
Grouped map is kind of unique to pandas udf and I have trouble to find a
similar non pandas udf for an apple to apple comparison. Let me kn
Your 2 examples are doing different things.
The Pandas UDF is doing a grouped map, whereas your Python UDF is doing an
aggregate.
I think you want your Pandas UDF to be PandasUDFType.GROUPED_AGG? Is your
result the same?
From: Lian Jiang
Date: Sunday, April 5, 2020 at 3:28 AM
To: user
Subjec