Re: Data correctness issue with Repartition + FetchFailure

2022-03-12 Thread Reynold Xin
This is why RoundRobinPartitioning shouldn't be used ... On Sat, Mar 12, 2022 at 12:08 PM, Jason Xu < jasonxu.sp...@gmail.com > wrote: > > Hi Spark community, > > I reported a data correctness issue in https:/ / issues. apache. org/ jira/ > browse/ SPARK-38388 ( https://issues.apache.org/jira/b

Data correctness issue with Repartition + FetchFailure

2022-03-12 Thread Jason Xu
Hi Spark community, I reported a data correctness issue in https://issues.apache.org/jira/browse/SPARK-38388. In short, non-deterministic data + Repartition + FetchFailure could result in incorrect data, this is an issue we run into in production pipelines, I have an example to reproduce the bug i