Re: Duplicated fit into TrainValidationSplit

Nick Pentreath Wed, 27 Apr 2016 07:38:01 -0700

You should find that the first set of fits are called on the training set,
and the resulting models evaluated on the validation set. The final best
model is then retrained on the entire dataset. This is standard practice -
usually the dataset passed to the train validation split is itself further
split into a training and test set, where the final best model is evaluated
against the test set.
On Wed, 27 Apr 2016 at 14:30, Dirceu Semighini Filho <
dirceu.semigh...@gmail.com> wrote:


> Hi guys, I was testing a pipeline here, and found a possible duplicated
> call to fit method into the
> org.apache.spark.ml.tuning.TrainValidationSplit
> <https://github.com/apache/spark/blob/18c2c92580bdc27aa5129d9e7abda418a3633ea6/mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala>
> class
> In line 110 there is a call to est.fit method that call fit in all
> parameter combinations that we have setup.
> Down in the line 128, after discovering which is the bestmodel, we call
> fit aggain using the bestIndex, wouldn't be better to just access the
> result of the already call fit method stored in the models val?
>
> Kind regards,
> Dirceu
>

Re: Duplicated fit into TrainValidationSplit

Reply via email to