Try to answer your another question. One sortByKey is triggered by rangePartition which does sample to calculate the range boundaries, which again triggers the first reduceByKey.
The second sortByKey is doing the real work to sort based on the partition calculated, which again trigger the reduceByKey because it is not cached. I agree with you it is very confusing. Thanks. Zhan Zhang The f On Aug 20, 2014, at 2:28 PM, Patrick Wendell <pwend...@gmail.com> wrote: > The reason is that some operators get pipelined into a single stage. > rdd.map(XX).filter(YY) - this executes in a single stage since there is no > data movement needed in between these operations. > > If you call toDeubgString on the final RDD it will give you some information > about the exact lineage. In Spark 1.1 this will return information about > stage boudnaries as well. > > > On Wed, Aug 20, 2014 at 4:22 AM, Grzegorz Białek > <grzegorz.bia...@codilime.com> wrote: > Hi, > > I am wondering why in web UI some stages (like join, filter) are not visible. > For example this code: > > val simple = sc.parallelize(Array.range(0,100)) > val simple2 = sc.parallelize(Array.range(0,100)) > > val toJoin = simple.map(x => (x, x.toString + x.toString)) > val rdd = simple2 > .map(x => (scala.util.Random.nextInt(100), x)) > .join(toJoin) > .map { case (r, (x, s)) => (r, x)} > .reduceByKey(_ + _) > .sortByKey() > .cache() > rdd.saveAsTextFile("output/1") > > val rdd2 = toJoin > .groupBy{ case (x, _) => x} > .filter{ case (x, _) => x < 10} > rdd2.saveAsTextFile("output/2") > > println(rdd2.join(toJoin).count()) > > in UI doesn't show join and filter stages and moreover it shows sortByKey and > reduceByKey twice. > Could anyone explain how it works? > > Thanks, > Grzegorz > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.