hi spark user,
IMHO, I will use the language for application aligning with the language under which the system designed. If working on Spark, I choose Scala. If working on Hadoop, I choose Java. If working on nothing, I use Python. Why? Because it will save my life, just kidding. Best regards, Leonard ------------------ Original ------------------ From: "Luciano Resende";<luckbr1...@gmail.com>; Send time: Tuesday, Sep 6, 2016 8:07 AM To: "darren"<dar...@ontrenet.com>; Cc: "Mich Talebzadeh"<mich.talebza...@gmail.com>; "Jakob Odersky"<ja...@odersky.com>; "ayan guha"<guha.a...@gmail.com>; "kant kodali"<kanth...@gmail.com>; "AssafMendelson"<assaf.mendel...@rsa.com>; "user"<user@spark.apache.org>; Subject: Re: Scala Vs Python On Thu, Sep 1, 2016 at 3:15 PM, darren <dar...@ontrenet.com> wrote: This topic is a concern for us as well. In the data science world no one uses native scala or java by choice. It's R and Python. And python is growing. Yet in spark, python is 3rd in line for feature support, if at all. This is why we have decoupled from spark in our project. It's really unfortunate spark team have invested so heavily in scale. As for speed it comes from horizontal scaling and throughout. When you can scale outward, individual VM performance is less an issue. Basic HPC principles. You could still try to get best of the both worlds, having your data scientists writing their algorithms using Python and/or R and have a compiler/optimizer handling the optimizations to run in a distributed fashion in a spark cluster leveraging some of the low level apis written in java/scala. Take a look at Apache SystemML http://systemml.apache.org/ for more details. -- Luciano Resende http://twitter.com/lresende1975 http://lresende.blogspot.com/