I would say the pros and cons of Python vs Scala is both down to Spark, the languages in themselves and what kind of data engineer you will get when you try to hire for the different solutions.
With Pyspark you get less functionality and increased complexity with the py4j java interop compared to vanilla Spark. Why would you want that? Maybe you want the Python ML tools and have a clear use case, then go for it. If not, avoid the increased complexity and reduced functionality of Pyspark. Python vs Scala? Idiomatic Python is a lesson in bad programming habits/ideas, there's no other way to put it. Do you really want programmers enjoying coding i such a language hacking away at your system? Scala might be far from perfect with the plethora of ways to express yourself. But Python < 3.5 is not fit for anything except simple scripting IMO. Doing exploratory data analysis in a Jupiter notebook, Pyspark seems like a fine idea. Coding an entire ETL library including state management, the whole kitchen including the sink, Scala everyday of the week. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org