Hello Dylan, Thank you for help. The result do look formatted after making the change. However, from the following code, I was expecting RDD types like MappedRDD and filteredRDD to be present in the lineage. However, I can only see PythonRDD and parallelCollectionRDD in the lineage [I am running in local mode].
`sc.parallelize([1,2,3,3]).map(lambda x:x**2).filter(lambda x:x>5).count()` Note: I also tried setting logLineage property to true, but it did not yield any additional details in the log. Thanks, Kanchan On Sun, Apr 21, 2019 at 12:11 AM Dylan Guedes <djmggue...@gmail.com> wrote: > Kanchan, > the `toDebugString` looks unformatted because in some scenarios you need > to parse it before (can't remember the reason, though). I suggest you to > print the RDD Lineage using > `print(rdd.toDebugString().decode("utf-8"))` instead (obs: this only > occurs in Pyspark). > > About the other question, you may use `getNumberPartitions`. > > On Sat, Apr 20, 2019 at 2:40 PM kanchan tewary <kanchan.tew...@gmail.com> > wrote: > >> Dear All, >> >> Greetings! >> >> I am new to Apache Spark and working on RDDs using pyspark. I am trying >> to understand the logical plan provided by toDebugString function, but I >> find two issues a) the output is not formatted when I print the result >> b) I do not see number of partitions shown. >> >> Can anyone direct me to any reference documentation to understand the >> logical plan better? Or, do you suggest to use DAG from spark UI instead? >> >> >> Thanks & Best Regards, >> Kanchan >> Data Engineer, IBM >> > -- Thanks & Best Regards, Kanchan