subject:"Fw\: Significant performance difference for same spark job in scala vs pyspark"

Re: Fw: Significant performance difference for same spark job in scala vs pyspark

2016-05-06 Thread nguyen duc tuan

Try to use Dataframe instead of RDD. Here's an introduction to Dataframe: https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html 2016-05-06 21:52 GMT+07:00 pratik gawande : > Thanks Shao for quick reply. I will look into how pyspark jobs are > exe

Re: Fw: Significant performance difference for same spark job in scala vs pyspark

2016-05-06 Thread pratik gawande

Thanks Shao for quick reply. I will look into how pyspark jobs are executed. Any suggestions or reference to docs on how to tune pyspark jobs? On Thu, May 5, 2016 at 10:12 PM -0700, "Saisai Shao" mailto:sai.sai.s...@gmail.com>> wrote: Writing RDD based application using pyspark will bring i

Re: Fw: Significant performance difference for same spark job in scala vs pyspark

2016-05-05 Thread Saisai Shao

Writing RDD based application using pyspark will bring in additional overheads, Spark is running on the JVM whereas your python code is running on python runtime, so data should be communicated between JVM world and python world, this requires additional serialization-deserialization, IPC. Also oth

Fw: Significant performance difference for same spark job in scala vs pyspark

2016-05-05 Thread pratik gawande

Hello, I am new to spark. For one of job I am finding significant performance difference when run in pyspark vs scala. Could you please let me know if this is known and scala is preferred over python for writing spark jobs? Also DAG visualization shows completely different DAGs for scala and p

Re: Fw: Significant performance difference for same spark job in scala vs pyspark

Re: Fw: Significant performance difference for same spark job in scala vs pyspark

Re: Fw: Significant performance difference for same spark job in scala vs pyspark

Fw: Significant performance difference for same spark job in scala vs pyspark

4 matches

Site Navigation

Mail list logo

Footer information