I would say the pros and cons of Python vs Scala is both down to Spark, the
languages in themselves and what kind of data engineer you will get when you
try to hire for the different solutions. 

With Pyspark you get less functionality and increased complexity with the
py4j java interop compared to vanilla Spark. Why would you want that? Maybe
you want the Python ML tools and have a clear use case, then go for it. If
not, avoid the increased complexity and reduced functionality of Pyspark.

Python vs Scala? Idiomatic Python is a lesson in bad programming
habits/ideas, there's no other way to put it. Do you really want programmers
enjoying coding i such a language hacking away at your system?

Scala might be far from perfect with the plethora of ways to express
yourself. But Python < 3.5 is not fit for anything except simple scripting
IMO.

Doing exploratory data analysis in a Jupiter notebook, Pyspark seems like a
fine idea. Coding an entire ETL library including state management, the
whole kitchen including the sink, Scala everyday of the week.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to