Re: Does feature parity exist between Spark and PySpark

Siegfried Bilstein Tue, 06 Oct 2015 17:50:40 -0700

Python APIs are sometimes a little behind Scala APIs.

Another issue that arises sometimes is when you have dependencies on Java
or Scala classes for serializing and deserializing data. Working with
non-trivial Avro schemas has been a bit of a pain for me in Python due to
the difficulty in dealing with generic writers and readers when using
Python.


At my company we use Avro heavily and it's not been fun when i've tried to
work with complex avro schemas and python. This may not be relevant to you
however...otherwise I found Python to be a great fit for Spark :)




On Tue, Oct 6, 2015 at 5:35 PM Richard Eggert <richard.egg...@gmail.com>
wrote:

> That should have read "a lot of neat tricks", not "a lot of nest tricks".
> That's what I get for sending emails on my phone....
> On Oct 6, 2015 8:32 PM, "Richard Eggert" <richard.egg...@gmail.com> wrote:
>
>> Since the Python API is built on top of the Scala implementation,  its
>> performance can be at best roughly the same as that of the Scala API (as in
>> the case of DataFrames and SQL) and at worst several orders of magnitude
>> slower.
>>
>> Likewise,  since the a Scala implementation of new features necessarily
>> needs to be completed before they can be ported to other languages, those
>> other languages tend to lag behind the Scala API, though I believe the
>> development team has been making a conscious effort for the past few months
>> to get those other languages as up to date as possible before publishing
>> new releases.
>>
>> Third party extensions to Spark are generally written in Scala but rarely
>> ported to other languages.
>>
>> Additionally,  Scala's type system can make it a bit easier to keep track
>> of the structure of your data while you are implementing complex
>> transformations, at least for regular RDDs (not as helpful for
>> SQL/DataFrames), and there are a lot of nest tricks you can do with
>> implicits and pattern matching to improve the readability of your code.
>>
>> That said,  there are plenty of people using the Python API quite
>> effectively,  so using Scala is by no means a requirement,  though it does
>> have several advantages.
>>
>> Anyway,  that's my two cents.
>>
>> Rich
>> On Oct 6, 2015 7:40 PM, "ayan guha" <guha.a...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> 2 cents
>>>
>>> 1. It should not be true anymore if data frames are used. The reason is
>>> regardless of the language DF uses same optimization engine behind the
>>> scene.
>>> 2. This is generally true in the sense Python APIs are typically  little
>>> behind of scala/java ones.
>>>
>>> Best
>>> Ayan
>>>
>>> On Wed, Oct 7, 2015 at 9:15 AM, dant <dan.tr...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> I'm hearing a common theme running that I should only do serious
>>>> programming
>>>> in Scala on Spark (1.5.1). Real power users use Scala. It is said that
>>>> Python is great for analytics but in the end the code should be written
>>>> to
>>>> Scala to finalise. There are a number of reasons I'm hearing:
>>>>
>>>> 1. Spark is written in Scala so will always be faster than any other
>>>> language implementation on top of it.
>>>> 2. Spark releases always favour more features being visible and enabled
>>>> for
>>>> Scala API than Python API.
>>>>
>>>> Are there any truth's to the above? I'm a little sceptical.
>>>>
>>>> Apologies for the duplication, my previous message was held up due to
>>>> subscription issue. Reposting now.
>>>>
>>>> Thanks
>>>> Dan
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>

Re: Does feature parity exist between Spark and PySpark

Reply via email to