Re: MetadataFetchFailedException due to decommission block migrations

2022-02-02 Thread Dongjoon Hyun
Thank you for sharing, Emil. > I willing to help up to develop a fix, but might need some guidance of > how this case could be handled better. Could you file an official Apache JIRA for your finding and propose a PR for that too with the test case? We can continue our discussion on your PR. Dong

MetadataFetchFailedException due to decommission block migrations

2022-02-02 Thread Emil Ejbyfeldt
As noted in SPARK-34939 there is race when using broadcast for map output status. Explanation from SPARK-34939 > After map statuses are broadcasted and the executors obtain serialized broadcasted map statuses. If any fetch failure happens after, Spark scheduler invalidates cached map statuses