On Wed, Aug 13, 2014 at 2:31 PM, Davies Liu <dav...@databricks.com> wrote: > On Wed, Aug 13, 2014 at 2:16 PM, Ignacio Zendejas > <ignacio.zendejas...@gmail.com> wrote: >> Yep, I thought it was a bogus comparison. >> >> I should rephrase my question as it was poorly phrased: on average, how >> much faster is Spark v. PySpark (I didn't really mean Scala v. Python)? >> I've only used Spark and don't have a chance to test this at the moment so >> if anybody has these numbers or general estimates (10x, etc), that'd be >> great. > > A quick comparison by word count on 4.3G text file (local mode), > > Spark: 40 seconds > PySpark: 2 minutes and 16 seconds > > So PySpark is 3.4x slower than Spark.
I also tried DPark, which is a pure Python clone of Spark: DPark: 53 seconds so it's 2 times faster than PySpark, because of it does not have the over head of passing data between JVM and Python. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org