On Wed, Aug 13, 2014 at 2:31 PM, Davies Liu <dav...@databricks.com> wrote:
> On Wed, Aug 13, 2014 at 2:16 PM, Ignacio Zendejas
> <ignacio.zendejas...@gmail.com> wrote:
>> Yep, I thought it was a bogus comparison.
>>
>> I should rephrase my question as it was poorly phrased: on average, how
>> much faster is Spark v. PySpark (I didn't really mean Scala v. Python)?
>> I've only used Spark and don't have a chance to test this at the moment so
>> if anybody has these numbers or general estimates (10x, etc), that'd be
>> great.
>
> A quick comparison by word count on 4.3G text file (local mode),
>
> Spark:  40 seconds
> PySpark: 2 minutes and 16 seconds
>
> So PySpark is 3.4x slower than Spark.

I also tried DPark, which is a pure Python clone of Spark:

DPark: 53 seconds

so it's 2 times faster than PySpark, because of it does not have
the over head of passing data between JVM and Python.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to