You can use your groupId as a grid parameter, filter your dataset using
this id in a pipeline stage, before feeding it to the model.
The following may help:
http://spark.apache.org/docs/latest/ml-tuning.html
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.ParamG
They https://www.youtube.com/watch?v=R-6nAwLyWCI use such functionality via
pyspark.
Xiaomeng Wan schrieb am Di., 29. Nov. 2016 um
17:54 Uhr:
> I want to divide big data into groups (eg groupby some id), and build one
> model for each group. I am wondering whether I can parallelize the model
> b