MethodAccessorImpl.java:0 []
| ../log.txt HadoopRDD[7] at textFile at
NativeMethodAccessorImpl.java:0 []
Why is that? Does pyspark do some optimizations under the hood? This debug
string is really useless for debugging.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nab
PythonRDD[13] at RDD at PythonRDD.scala:48 []
>>>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 []
>>>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0
>>>>> []
>>>>> +-(2) PairwiseRDD[10] a
D[12] at mapPartitions at PythonRDD.scala:422 []
>>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0
>>>> []
>>>> +-(2) PairwiseRDD[10] at reduceByKey at :1
>>>> []
>>>> | PythonRDD[9] at reduceB
scala:422 []
>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0 []
>>> +-(2) PairwiseRDD[10] at reduceByKey at
>>> :1 []
>>> | PythonRDD[9] at reduceByKey at :1 []
>>> | ../log.txt MapPartitionsRDD[8] at textFile at
>
ccessorImpl.java:0 []
>> | ../log.txt HadoopRDD[7] at textFile at
>> NativeMethodAccessorImpl.java:0 []
>>
>> Why is that? Does pyspark do some optimizations under the hood? This debug
>> string is really useless for debugging.
>>
>>
>>
>> --
7] at textFile at
> NativeMethodAccessorImpl.java:0 []
>
> Why is that? Does pyspark do some optimizations under the hood? This debug
> string is really useless for debugging.
>
>
>
> --
> View this message in context:
> http:
ttp://apache-spark-user-list.1001560.n3.nabble.com/Spark-Core-Python-and-Scala-generate-different-DAGs-for-identical-code-tp28674.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
This Scala code:
scala> val logs = sc.textFile("big_data_specialization/log.txt").
| filter(x => !x.contains("INFO")).
| map(x => (x.split("\t")(1), 1)).
| reduceByKey((x, y) => x + y)
generated obvious lineage:
(2) ShuffledRDD[4] at reduceByKey at :27 []
+-(2) MapPartitionsRDD[3]