Dear all,
I am trying to partition a DataFrame into windows and then for every
column and window use a custom function (udf) using Spark's Python
interface.
Within that function I cast a column of a window in a m x n matrix to do
a median-polish and afterwards return a list again.
This doesn't work:
|w
=Window().partitionBy(["col"]).rowsBetween(-sys.maxsize,sys.maxsize)defmedian_polish(rows,cols,values)://shape
values asmatrix defined by rows/cols //compute median polish //cast
matrix back to vector returnvalues med_pol_udf
=func.udf(median_polish,DoubleType())forx
indf.columns:ifx.startswith("some string"):df
=df.withColumn(x,med_pol_udf("rows","cols",x).over(w)) |
The issue seems to be the windowing. Can you actually do that in
Pyspark? Or would I need to change to Scala?
Thanks for your help.
Best,
Simon
||