Hi all,
I am new to the Spark community. Please ignore if this question doesn't make
sense.
My PySpark Dataframe is just taking a fraction of time (in ms) in 'Sorting',
but moving data is much expensive (> 14 sec).
Explanation:
I have a huge Arrow RecordBatches collection which is equally
Hi All,
I have a custom implementation of K-Means where it needs the data to be
grouped by a key in a dataframe.
Now there is a big data skew for some of the keys , where it exceeds the
BufferHolder:
Cannot grow BufferHolder by size 17112 because the size after growing
exceeds size limitation 2147