the performance difference is expected, but you
could tune the application to reduce the gap.
Also because python RDD wraps a lot, so the DAG you saw is different from
Scala, that is also expected.
Thanks
Saisai
On Fri, May 6, 2016 at 12:47 PM, pratik gawande
mailto:pratik.gawa...@hot
Hello,
I am new to spark. For one of job I am finding significant performance
difference when run in pyspark vs scala. Could you please let me know if this
is known and scala is preferred over python for writing spark jobs? Also DAG
visualization shows completely different DAGs for scala and p