() function
takes a numPartitions column. What other options can I explore?
--gautham
-Original Message-
From: ZHANG Wei
Sent: Thursday, May 7, 2020 1:34 AM
To: Gautham Acharya
Cc: user@spark.apache.org
Subject: Re: PyArrow Exception in Pandas UDF GROUPEDAGG()
CAUTION: This email origina
AFAICT, there might be data skews, some partitions got too much rows,
which caused out of memory limitation. Trying .groupBy().count()
or .aggregateByKey().count() may help check each partition data size.
If no data skew, to increase .groupBy() parameter `numPartitions` is
worth a try.
--
Cheers,