You should find that the first set of fits are called on the training set, and the resulting models evaluated on the validation set. The final best model is then retrained on the entire dataset. This is standard practice - usually the dataset passed to the train validation split is itself further split into a training and test set, where the final best model is evaluated against the test set. On Wed, 27 Apr 2016 at 14:30, Dirceu Semighini Filho < dirceu.semigh...@gmail.com> wrote:
> Hi guys, I was testing a pipeline here, and found a possible duplicated > call to fit method into the > org.apache.spark.ml.tuning.TrainValidationSplit > <https://github.com/apache/spark/blob/18c2c92580bdc27aa5129d9e7abda418a3633ea6/mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala> > class > In line 110 there is a call to est.fit method that call fit in all > parameter combinations that we have setup. > Down in the line 128, after discovering which is the bestmodel, we call > fit aggain using the bestIndex, wouldn't be better to just access the > result of the already call fit method stored in the models val? > > Kind regards, > Dirceu >