Dear community, I had a general question about the use of scala VS pyspark for spark streaming. I believe spark streaming will work most efficiently when written in scala. I believe however that things can be implemented in pyspark. My question: 1)is it completely dumb to make a streaming job in pyspark? 2)what are the technical reasons that it is done best in scala (is this easy to understand why)? 3)any good links anyone has seen with numbers of the difference in performance and under what circumstances+explanation? 4)are there certain scenarios when the use of pyspark can be motivated (maybe when someone doesn’t feel confortable writing a job in scala and the number of messages/minute aren’t gigantic so performance isnt that crucial?)
Thanks for any input! --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
