SVMWithSGD sits in the older "mllib" package and is not compatible directly with the DataFrame API. I suppose one could write a ML-API wrapper around it.
However, there is LinearSVC in Spark 2.2.x: http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-support-vector-machine You should use that instead I would say. On Fri, 19 Jan 2018 at 13:59 Tomasz Dudek <[email protected]> wrote: > Hello, > > is there any way to use CrossValidation's ParamGrid with SVMWithSGD? > > usually, when e.g. using RandomForest you can specify a lot of parameters, > to automatise the param grid search (when used with CrossValidation) > > val algorithm = new RandomForestClassifier() > val paramGrid = { new ParamGridBuilder() > .addGrid(algorithm.impurity, Array("gini", "entropy")) > .addGrid(algorithm.maxDepth, Array(3, 5, 10)) > .addGrid(algorithm.numTrees, Array(2, 3, 5, 15, 50)) > .addGrid(algorithm.minInfoGain, Array(0.01, 0.001)) > .addGrid(algorithm.minInstancesPerNode, Array(10, 50, 500)) > .build() > } > > with SGDWIthSGD however, the parameters are inside GradientDescent. You > can explicitly tune the params, either by using SGDWithSGD's constructor or > by calling setters here: > > val algorithm = new SVMWithSGD() > algorithm.optimizer.setMiniBatchFraction(256) > .setNumIterations(200) > .setRegParam(0.01) > > those two ways however restrict me from using ParamGridBuilder correctly. > > There are no such things as algorithm.optimizer.numIterations or > algorithm.optimizer.regParam, only setters(and ParamGrid requires Params, > not setters) > > I could of course create each SVM model manually, create one huge Pipeline > with each model saving its result to different column and then manually > decide which performed the best. It requires a lot of coding and so far > CrossValidation's ParamGrid did that job for me instead. > > Am I missing something? Is it WIP or is there any hack to do that? > > Yours, > Tomasz >
