Hello everyone, I am stuck with the (experimental, I think) API for machine learning pipelines. I have a pipeline with just one estimator (ALS) and I want it to try different values for the regularization parameter. Therefore I need to supply an Evaluator that returns a value of type Double. I guess this could be something like accuracy or mean squared error? The only implementation I found is BinaryClassificationEvaluator, and I did not understand the computation there.
I could not find detailed documentation so I implemented a dummy Evaluator that just returns the regularization parameter: new Evaluator { def evaluate(dataset: DataFrame, paramMap: ParamMap): Double = paramMap.get(als.regParam).getOrElse(throw new Exception) } I just wanted to see whether the lower or higher value "wins". On the resulting model I inspected the chosen regularization parameter this way: cvModel.bestModel.fittingParamMap.get(als.regParam) And it was the highest of my three regularization parameter candidates. Strange thing is, if I negate the return value of the Evaluator, that line still returns the highest regularization parameter candidate. So I am probably working with false assumptions. I'd be grateful if someone could point me to some documentation or examples, or has a few hints to share. Cheers, Stefan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-implement-an-Evaluator-for-a-ML-pipeline-tp22830.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org