That should have read "a lot of neat tricks", not "a lot of nest tricks". That's what I get for sending emails on my phone.... On Oct 6, 2015 8:32 PM, "Richard Eggert" <richard.egg...@gmail.com> wrote:
> Since the Python API is built on top of the Scala implementation, its > performance can be at best roughly the same as that of the Scala API (as in > the case of DataFrames and SQL) and at worst several orders of magnitude > slower. > > Likewise, since the a Scala implementation of new features necessarily > needs to be completed before they can be ported to other languages, those > other languages tend to lag behind the Scala API, though I believe the > development team has been making a conscious effort for the past few months > to get those other languages as up to date as possible before publishing > new releases. > > Third party extensions to Spark are generally written in Scala but rarely > ported to other languages. > > Additionally, Scala's type system can make it a bit easier to keep track > of the structure of your data while you are implementing complex > transformations, at least for regular RDDs (not as helpful for > SQL/DataFrames), and there are a lot of nest tricks you can do with > implicits and pattern matching to improve the readability of your code. > > That said, there are plenty of people using the Python API quite > effectively, so using Scala is by no means a requirement, though it does > have several advantages. > > Anyway, that's my two cents. > > Rich > On Oct 6, 2015 7:40 PM, "ayan guha" <guha.a...@gmail.com> wrote: > >> Hi >> >> 2 cents >> >> 1. It should not be true anymore if data frames are used. The reason is >> regardless of the language DF uses same optimization engine behind the >> scene. >> 2. This is generally true in the sense Python APIs are typically little >> behind of scala/java ones. >> >> Best >> Ayan >> >> On Wed, Oct 7, 2015 at 9:15 AM, dant <dan.tr...@gmail.com> wrote: >> >>> Hi >>> >>> I'm hearing a common theme running that I should only do serious >>> programming >>> in Scala on Spark (1.5.1). Real power users use Scala. It is said that >>> Python is great for analytics but in the end the code should be written >>> to >>> Scala to finalise. There are a number of reasons I'm hearing: >>> >>> 1. Spark is written in Scala so will always be faster than any other >>> language implementation on top of it. >>> 2. Spark releases always favour more features being visible and enabled >>> for >>> Scala API than Python API. >>> >>> Are there any truth's to the above? I'm a little sceptical. >>> >>> Apologies for the duplication, my previous message was held up due to >>> subscription issue. Reposting now. >>> >>> Thanks >>> Dan >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> >