Re: Does feature parity exist between Spark and PySpark

Richard Eggert Tue, 06 Oct 2015 17:36:06 -0700

That should have read "a lot of neat tricks", not "a lot of nest tricks".
That's what I get for sending emails on my phone....
On Oct 6, 2015 8:32 PM, "Richard Eggert" <richard.egg...@gmail.com> wrote:


> Since the Python API is built on top of the Scala implementation,  its
> performance can be at best roughly the same as that of the Scala API (as in
> the case of DataFrames and SQL) and at worst several orders of magnitude
> slower.
>
> Likewise,  since the a Scala implementation of new features necessarily
> needs to be completed before they can be ported to other languages, those
> other languages tend to lag behind the Scala API, though I believe the
> development team has been making a conscious effort for the past few months
> to get those other languages as up to date as possible before publishing
> new releases.
>
> Third party extensions to Spark are generally written in Scala but rarely
> ported to other languages.
>
> Additionally,  Scala's type system can make it a bit easier to keep track
> of the structure of your data while you are implementing complex
> transformations, at least for regular RDDs (not as helpful for
> SQL/DataFrames), and there are a lot of nest tricks you can do with
> implicits and pattern matching to improve the readability of your code.
>
> That said,  there are plenty of people using the Python API quite
> effectively,  so using Scala is by no means a requirement,  though it does
> have several advantages.
>
> Anyway,  that's my two cents.
>
> Rich
> On Oct 6, 2015 7:40 PM, "ayan guha" <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> 2 cents
>>
>> 1. It should not be true anymore if data frames are used. The reason is
>> regardless of the language DF uses same optimization engine behind the
>> scene.
>> 2. This is generally true in the sense Python APIs are typically  little
>> behind of scala/java ones.
>>
>> Best
>> Ayan
>>
>> On Wed, Oct 7, 2015 at 9:15 AM, dant <dan.tr...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I'm hearing a common theme running that I should only do serious
>>> programming
>>> in Scala on Spark (1.5.1). Real power users use Scala. It is said that
>>> Python is great for analytics but in the end the code should be written
>>> to
>>> Scala to finalise. There are a number of reasons I'm hearing:
>>>
>>> 1. Spark is written in Scala so will always be faster than any other
>>> language implementation on top of it.
>>> 2. Spark releases always favour more features being visible and enabled
>>> for
>>> Scala API than Python API.
>>>
>>> Are there any truth's to the above? I'm a little sceptical.
>>>
>>> Apologies for the duplication, my previous message was held up due to
>>> subscription issue. Reposting now.
>>>
>>> Thanks
>>> Dan
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>

Re: Does feature parity exist between Spark and PySpark

Reply via email to