Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Michael Mansour (CS)
MethodAccessorImpl.java:0 [] | ../log.txt HadoopRDD[7] at textFile at NativeMethodAccessorImpl.java:0 [] Why is that? Does pyspark do some optimizations under the hood? This debug string is really useless for debugging. -- View this message in context: http://apache-spark-user-list.1001560.n3.nab

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
PythonRDD[13] at RDD at PythonRDD.scala:48 [] >>>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 [] >>>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0 >>>>> [] >>>>> +-(2) PairwiseRDD[10] a

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread lucas.g...@gmail.com
D[12] at mapPartitions at PythonRDD.scala:422 [] >>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0 >>>> [] >>>> +-(2) PairwiseRDD[10] at reduceByKey at :1 >>>> [] >>>> | PythonRDD[9] at reduceB

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Holden Karau
scala:422 [] >>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0 [] >>> +-(2) PairwiseRDD[10] at reduceByKey at >>> :1 [] >>> | PythonRDD[9] at reduceByKey at :1 [] >>> | ../log.txt MapPartitionsRDD[8] at textFile at >

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
ccessorImpl.java:0 [] >> | ../log.txt HadoopRDD[7] at textFile at >> NativeMethodAccessorImpl.java:0 [] >> >> Why is that? Does pyspark do some optimizations under the hood? This debug >> string is really useless for debugging. >> >> >> >> --

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Holden Karau
7] at textFile at > NativeMethodAccessorImpl.java:0 [] > > Why is that? Does pyspark do some optimizations under the hood? This debug > string is really useless for debugging. > > > > -- > View this message in context: > http:

[Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread pklemenkov
ttp://apache-spark-user-list.1001560.n3.nabble.com/Spark-Core-Python-and-Scala-generate-different-DAGs-for-identical-code-tp28674.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
This Scala code: scala> val logs = sc.textFile("big_data_specialization/log.txt"). | filter(x => !x.contains("INFO")). | map(x => (x.split("\t")(1), 1)). | reduceByKey((x, y) => x + y) generated obvious lineage: (2) ShuffledRDD[4] at reduceByKey at :27 [] +-(2) MapPartitionsRDD[3]