Dear all,

I am trying to partition a DataFrame into windows and then for every column and window use a custom function (udf) using Spark's Python interface. Within that function I cast a column of a window in a m x n matrix to do a median-polish and afterwards return a list again.

This doesn't work:

|w =Window().partitionBy(["col"]).rowsBetween(-sys.maxsize,sys.maxsize)defmedian_polish(rows,cols,values)://shape values asmatrix defined by rows/cols //compute median polish //cast matrix back to vector returnvalues med_pol_udf =func.udf(median_polish,DoubleType())forx indf.columns:ifx.startswith("some string"):df =df.withColumn(x,med_pol_udf("rows","cols",x).over(w)) |

The issue seems to be the windowing. Can you actually do that in Pyspark? Or would I need to change to Scala?
Thanks for your help.

Best,
Simon
||

Reply via email to