Re: Run times for Spark 1.6.2 compared to 2.1.0?

Colin Beckingham Thu, 28 Jul 2016 04:23:46 -0700

On 27/07/16 16:31, Colin Beckingham wrote:

I have a project which runs fine in both Spark 1.6.2 and 2.1.0. Itcalculates a logistic model using MLlib. I compiled the 2.1 today fromsource and took the version 1 as a precompiled version with Hadoop.The odd thing is that on 1.6.2 the project produces an answer in 350sec and the 2.1.0 takes 990 sec. Identical code using pyspark. I'mwondering if there is something in the setup params for 1.6 and 2.1,say number of executors or memory allocation, which might account forthis? I'm using just the 4 cores of my machine as master and executors.

FWIW I have a bit more information. Watching the jobs as Spark runs Ican see that when performing the logistic regression in Spark 1.6.2 thePySpark call "LogisticRegressionWithLBFGS.train()" runs "treeAggregateat LBFGS.scala:218" but the same command in pyspark with Spark 2.1 runs"treeAggregate at LogisticRegression.scala:1092". This last commandtakes about 3 times longer to run than the LBFGS version, and there areway more of these calls, and the result is considerably less accuratethan the LBFGS. The rest of the process seems to be pretty close. SoSpark 2.1 does not seem to be running an optimized version of logisticregression algorithm?


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Run times for Spark 1.6.2 compared to 2.1.0?

Reply via email to