Try to answer your another question.

One sortByKey is triggered by rangePartition which does sample to calculate the 
range boundaries, which again triggers the first reduceByKey.

The second sortByKey is doing the real work to sort based on the partition 
calculated, which again trigger the reduceByKey because it is not cached.

I agree with you it is very confusing.

Thanks.

Zhan Zhang

The f
On Aug 20, 2014, at 2:28 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> The reason is that some operators get pipelined into a single stage. 
> rdd.map(XX).filter(YY) - this executes in a single stage since there is no 
> data movement needed in between these operations.
> 
> If you call toDeubgString on the final RDD it will give you some information 
> about the exact lineage. In Spark 1.1 this will return information about 
> stage boudnaries as well.
> 
> 
> On Wed, Aug 20, 2014 at 4:22 AM, Grzegorz Białek 
> <grzegorz.bia...@codilime.com> wrote:
> Hi,
> 
> I am wondering why in web UI some stages (like join, filter) are not visible. 
> For example this code:
> 
> val simple = sc.parallelize(Array.range(0,100))
> val simple2 = sc.parallelize(Array.range(0,100))
> 
>   val toJoin = simple.map(x => (x, x.toString + x.toString))
>   val rdd = simple2
>     .map(x => (scala.util.Random.nextInt(100), x))
>     .join(toJoin)
>     .map { case (r, (x, s)) => (r, x)}
>     .reduceByKey(_ + _)
>     .sortByKey()
>     .cache()
>   rdd.saveAsTextFile("output/1")
> 
>   val rdd2 = toJoin
>     .groupBy{ case (x, _) => x}
>     .filter{ case (x, _) => x < 10}
>   rdd2.saveAsTextFile("output/2")
> 
>   println(rdd2.join(toJoin).count())
> 
> in UI doesn't show join and filter stages and moreover it shows sortByKey and 
> reduceByKey twice.
> Could anyone explain how it works?
> 
> Thanks,
> Grzegorz
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to