hi spark user,

IMHO, I will use the language for application aligning with the language under 
which the system designed.


If working on Spark, I choose Scala.
If working on Hadoop, I choose Java.
If working on nothing, I use Python.
Why?
Because it will save my life, just kidding.




Best regards,
Leonard
------------------ Original ------------------
From:  "Luciano Resende";<luckbr1...@gmail.com>;
Send time: Tuesday, Sep 6, 2016 8:07 AM
To: "darren"<dar...@ontrenet.com>; 
Cc: "Mich Talebzadeh"<mich.talebza...@gmail.com>; "Jakob 
Odersky"<ja...@odersky.com>; "ayan guha"<guha.a...@gmail.com>; "kant 
kodali"<kanth...@gmail.com>; "AssafMendelson"<assaf.mendel...@rsa.com>; 
"user"<user@spark.apache.org>; 
Subject:  Re: Scala Vs Python





On Thu, Sep 1, 2016 at 3:15 PM, darren <dar...@ontrenet.com> wrote:
This topic is a concern for us as well. In the data science world no one uses 
native scala or java by choice. It's R and Python. And python is growing. Yet 
in spark, python is 3rd in line for feature support, if at all.


This is why we have decoupled from spark in our project. It's really 
unfortunate spark team have invested so heavily in scale. 


As for speed it comes from horizontal scaling and throughout. When you can 
scale outward, individual VM performance is less an issue. Basic HPC principles.





You could still try to get best of the both worlds, having your data scientists 
writing their algorithms using Python and/or R and have a compiler/optimizer 
handling the optimizations to run in a distributed fashion in a spark cluster 
leveraging some of the low level apis written in java/scala. Take a look at 
Apache SystemML http://systemml.apache.org/ for more details.




-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Reply via email to