Hello everyone,

I am stuck with the (experimental, I think) API for machine learning
pipelines. I have a pipeline with just one estimator (ALS) and I want it to
try different values for the regularization parameter. Therefore I need to
supply an Evaluator that returns a value of type Double. I guess this could
be something like accuracy or mean squared error? The only implementation I
found is BinaryClassificationEvaluator, and I did not understand the
computation there.

I could not find detailed documentation so I implemented a dummy Evaluator
that just returns the regularization parameter:

  new Evaluator {
    def evaluate(dataset: DataFrame, paramMap: ParamMap): Double = 
      paramMap.get(als.regParam).getOrElse(throw new Exception)
  }

I just wanted to see whether the lower or higher value "wins". On the
resulting model I inspected the chosen regularization parameter this way:

  cvModel.bestModel.fittingParamMap.get(als.regParam)

And it was the highest of my three regularization parameter candidates.
Strange thing is, if I negate the return value of the Evaluator, that line
still returns the highest regularization parameter candidate.

So I am probably working with false assumptions. I'd be grateful if someone
could point me to some documentation or examples, or has a few hints to
share.

Cheers,
Stefan



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-implement-an-Evaluator-for-a-ML-pipeline-tp22830.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to