Can you please explain how did you realize it’s wrong? Did you check cloudwatch for the same metrics and compare? Also are you using do.cache() and expecting that shuffle read/write to go away ?
Sent from Yahoo Mail for iPhone On Sunday, May 26, 2024, 7:53 AM, Prem Sahoo <prem.re...@gmail.com> wrote: Can anyone please assist me ? On Fri, May 24, 2024 at 12:29 AM Prem Sahoo <prem.re...@gmail.com> wrote: Does anyone have a clue ? On Thu, May 23, 2024 at 11:40 AM Prem Sahoo <prem.re...@gmail.com> wrote: Hello Team,in spark DAG UI , we have Stages tab. Once you click on each stage you can view the tasks. In each task we have a column "ShuffleWrite Size/Records " that column prints wrong data when it gets the data from cache/persist . it typically will show the wrong record number though the data size is correct for e.g 3.2G/ 7400 which is wrong . please advise.