Can you please explain how did you realize it’s wrong? Did you check cloudwatch 
for the same metrics and compare? Also are you using do.cache() and expecting 
that shuffle read/write to go away ?


Sent from Yahoo Mail for iPhone


On Sunday, May 26, 2024, 7:53 AM, Prem Sahoo <prem.re...@gmail.com> wrote:

Can anyone please assist me ?
On Fri, May 24, 2024 at 12:29 AM Prem Sahoo <prem.re...@gmail.com> wrote:

Does anyone have a clue ?
On Thu, May 23, 2024 at 11:40 AM Prem Sahoo <prem.re...@gmail.com> wrote:

Hello Team,in spark DAG UI , we have Stages tab. Once you click on each stage 
you can view the tasks.
In each task we have a column "ShuffleWrite Size/Records " that column prints 
wrong data when it gets the data from cache/persist . it typically will show 
the wrong record number though the data size is correct for e.g  3.2G/ 7400 
which is wrong . 
please advise. 




Reply via email to