-dev +user Hi,
You have tried this? scala> val df = Seq((1, 0), (2, 0), (3, 0), (4, 0)).toDF.cache scala> df.queryExecution.executedPlan(0).execute().foreach(x => Unit) scala> df.rdd.toDebugString res4: String = (4) MapPartitionsRDD[13] at rdd at <console>:26 [] | MapPartitionsRDD[12] at rdd at <console>:26 [] | MapPartitionsRDD[11] at rdd at <console>:26 [] | LocalTableScan [_1#41, _2#42] MapPartitionsRDD[9] at cache at <console>:23 [] | CachedPartitions: 4; MemorySize: 1104.0 B; ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B | MapPartitionsRDD[8] at cache at <console>:23 [] | ParallelCollectionRDD[7] at cache at <console>:23 [] // maropu On Fri, Oct 21, 2016 at 10:18 AM, Egor Pahomov <pahomov.e...@gmail.com> wrote: > I needed the same for debugging and I just added "count" action in debug > mode for every step I was interested in. It's very time-consuming, but I > debug not very often. > > 2016-10-20 2:17 GMT-07:00 Andreas Hechenberger <inter...@hechenberger.me>: > >> Hey awesome Spark-Dev's :) >> >> i am new to spark and i read a lot but now i am stuck :( so please be >> kind, if i ask silly questions. >> >> I want to analyze some algorithms and strategies in spark and for one >> experiment i want to know the size of the intermediate results between >> iterations/jobs. Some of them are written to disk and some are in the >> cache, i guess. I am not afraid of looking into the code (i already did) >> but its complex and have no clue where to start :( It would be nice if >> someone can point me in the right direction or where i can find more >> information about the structure of spark core devel :) >> >> I already setup the devel environment and i can compile spark. It was >> really awesome how smoothly the setup was :) Thx for that. >> >> Servus >> Andy >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- --- Takeshi Yamamuro