Could you check the log to see how much iterations does your LoR runs? Does your program output same model between different attempts?
Thanks Yanbo 2016-08-12 3:08 GMT-07:00 olivierjeunen <olivierjeu...@gmail.com>: > I'm using pyspark ML's logistic regression implementation to do some > classification on an AWS EMR Yarn cluster. > > The cluster consists of 10 m3.xlarge nodes and is set up as follows: > spark.driver.memory 10g, spark.driver.cores 3 , spark.executor.memory 10g, > spark.executor-cores 4. > > I enabled yarn's dynamic allocation abilities. > > The problem is that my results are way unstable. Sometimes my application > finishes using 13 executors total, sometimes all of them seem to die and > the > application ends up using anywhere between 100 and 200... > > Any insight on what could cause this stochastic behaviour would be greatly > appreciated. > > The code used to run the logistic regression: > > data = spark.read.parquet(storage_path).repartition(80) > lr = LogisticRegression() > lr.setMaxIter(50) > lr.setRegParam(0.063) > evaluator = BinaryClassificationEvaluator() > lrModel = lr.fit(data.filter(data.test == 0)) > predictions = lrModel.transform(data.filter(data.test == 1)) > auROC = evaluator.evaluate(predictions) > print "auROC on test set: ", auROC > Data is a dataframe of roughly 2.8GB > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Spark-s-Logistic-Regression-runs- > unstable-on-Yarn-cluster-tp27520.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >