Re: [ml] Lost persistence for fold in crossvalidation.

2015-02-18 Thread Joseph Bradley
Now in JIRA form: https://issues.apache.org/jira/browse/SPARK-5844 On Tue, Feb 17, 2015 at 3:12 PM, Xiangrui Meng wrote: > There are three different regParams defined in the grid and there are > tree folds. For simplicity, we didn't split the dataset into three and > reuse them, but do the split

Re: [ml] Lost persistence for fold in crossvalidation.

2015-02-17 Thread Xiangrui Meng
There are three different regParams defined in the grid and there are tree folds. For simplicity, we didn't split the dataset into three and reuse them, but do the split for each fold. Then we need to cache 3*3 times. Note that the pipeline API is not yet optimized for performance. It would be nice

[ml] Lost persistence for fold in crossvalidation.

2015-02-11 Thread Peter Rudenko
Hi i have a problem. Using spark 1.2 with Pipeline + GridSearch + LogisticRegression. I’ve reimplemented LogisticRegression.fit method and comment out instances.unpersist() |override def fit(dataset:SchemaRDD, paramMap:ParamMap):LogisticRegressionModel = { println(s"Fitting dataset ${da