Re: [SPARK-23207] Repro

Sean Owen Fri, 09 Aug 2019 07:49:43 -0700

Interesting but I'd put this on the JIRA, and also test vs master
first. It's entirely possible this is something else that was
subsequently fixed, and maybe even backported for 2.4.4.
(I can't quite reproduce it - just makes the second job fail, which is
also puzzling)


On Fri, Aug 9, 2019 at 8:11 AM <tcon...@gmail.com> wrote:
>
> Hi,
>
>
>
> We are able to reproduce this bug in Spark 2.4 using the following program:
>
>
>
> import scala.sys.process._
>
> import org.apache.spark.TaskContext
>
>
>
> val res = spark.range(0, 10000 * 10000, 1).map{ x => (x % 1000, 
> x)}.repartition(20)
>
> res.distinct.count
>
>
>
> // kill an executor in the stage that performs repartition(239)
>
> val df = res.repartition(113).cache.repartition(239).map { x =>
>
>   if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1) {
>
>     throw new Exception("pkill -f java".!!)
>
>   }
>
>   x
>
> }
>
> df.distinct.count()
>
>
>
> The first df.distinct.count correctly produces 100000000
>
> The second df.distinct.count incorrect produces 99999769
>
>
>
> If the cache step is removed then the bug does not reproduce.
>
>
>
> Best regards,
>
> Tyson
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [SPARK-23207] Repro

Reply via email to