Pyspark How to groupBy -> fit

Riccardo Ferrari Thu, 21 Jan 2021 07:19:57 -0800

Hi list,

I am looking for an efficient solution to apply a training pipeline to each
group of a DataFrame.groupBy.


This is very easy if you're using a pandas udf (i.e. groupBy().apply()), I
am not able to find the equivalent for a spark pipeline.

The ultimate goal is to fit multiple models, one per group of data.

Thanks,

Pyspark How to groupBy -> fit

Reply via email to