Hi,
I am using Spark 2.4.4 standalone mode.
On Mon, Jan 18, 2021 at 4:26 AM Sean Owen wrote:
> Hm, FWIW I can't reproduce that on Spark 3.0.1. What version are you using?
>
> On Sun, Jan 17, 2021 at 6:22 AM Shiao-An Yuan
> wrote:
>
>> Hi folks,
>>
>>
.
Therefore, the first stage and the retry stage might have different
distribution and cause duplications and loss.
Thanks,
Shiao-An Yuan
On Tue, Dec 29, 2020 at 10:00 PM Shiao-An Yuan
wrote:
> Hi folks,
>
> We recently identified a data correctness issue in our pipeline.
>
> The
p; lost, I mean duplicated "pkey" exists in
the output file (after "reduce by key") and some "pkey" missing.
Since it only happens when executors being preempted, I believe this is a
bug (nondeterministic shuffle) that SPARK-23207 trying to solve.
Thanks,
Shiao-An Yuan
Set, I believe it is unrelated to SPARK-24243.
Can anyone give me some advice about the following tasks?
Thanks in advance.
Shiao-An Yuan