Since the Python API is built on top of the Scala implementation, its performance can be at best roughly the same as that of the Scala API (as in the case of DataFrames and SQL) and at worst several orders of magnitude slower.
Likewise, since the a Scala implementation of new features necessarily needs to be completed before they can be ported to other languages, those other languages tend to lag behind the Scala API, though I believe the development team has been making a conscious effort for the past few months to get those other languages as up to date as possible before publishing new releases. Third party extensions to Spark are generally written in Scala but rarely ported to other languages. Additionally, Scala's type system can make it a bit easier to keep track of the structure of your data while you are implementing complex transformations, at least for regular RDDs (not as helpful for SQL/DataFrames), and there are a lot of nest tricks you can do with implicits and pattern matching to improve the readability of your code. That said, there are plenty of people using the Python API quite effectively, so using Scala is by no means a requirement, though it does have several advantages. Anyway, that's my two cents. Rich On Oct 6, 2015 7:40 PM, "ayan guha" <guha.a...@gmail.com> wrote: > Hi > > 2 cents > > 1. It should not be true anymore if data frames are used. The reason is > regardless of the language DF uses same optimization engine behind the > scene. > 2. This is generally true in the sense Python APIs are typically little > behind of scala/java ones. > > Best > Ayan > > On Wed, Oct 7, 2015 at 9:15 AM, dant <dan.tr...@gmail.com> wrote: > >> Hi >> >> I'm hearing a common theme running that I should only do serious >> programming >> in Scala on Spark (1.5.1). Real power users use Scala. It is said that >> Python is great for analytics but in the end the code should be written to >> Scala to finalise. There are a number of reasons I'm hearing: >> >> 1. Spark is written in Scala so will always be faster than any other >> language implementation on top of it. >> 2. Spark releases always favour more features being visible and enabled >> for >> Scala API than Python API. >> >> Are there any truth's to the above? I'm a little sceptical. >> >> Apologies for the duplication, my previous message was held up due to >> subscription issue. Reposting now. >> >> Thanks >> Dan >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Best Regards, > Ayan Guha >