I would have to check it, but in principle it could be done by checking the streaming logs, so that once you detect when a shuffle operation starts and ends, you can know the total operation time.
https://stackoverflow.com/questions/27276884/what-is-shuffle-read-shuffle-write-in-apache-spark El mar., 4 feb. 2020 a las 12:58, asma zgolli (<zgollia...@gmail.com>) escribió: > dear spark contributors, > > I'm searching for a way to model spark shuffle cost and i wonder if there > s mathematic formulas to compute "shuffle read " and "shuffle write" sizes > in the stages view in spark UI. > if there isn't, are there any references to head start in this. > Stage Id ▾ > <http://localhost:4040/stages/?&completedStage.sort=Stage+Id&completedStage.desc=false&completedStage.pageSize=100#completed> > Description > <http://localhost:4040/stages/?&completedStage.sort=Description&completedStage.pageSize=100#completed> > Submitted > <http://localhost:4040/stages/?&completedStage.sort=Submitted&completedStage.pageSize=100#completed> > Duration > <http://localhost:4040/stages/?&completedStage.sort=Duration&completedStage.pageSize=100#completed>Tasks: > Succeeded/TotalInput > <http://localhost:4040/stages/?&completedStage.sort=Input&completedStage.pageSize=100#completed> > Output > <http://localhost:4040/stages/?&completedStage.sort=Output&completedStage.pageSize=100#completed>Shuffle > Read > <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Read&completedStage.pageSize=100#completed>Shuffle > Write > <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Write&completedStage.pageSize=100#completed> > > thank you for the help and the directions > yours sincerely > Asma ZGOLLI > > Ph.D. student in data engineering - computer science > -- Alonso Isidoro Roman [image: https://]about.me/alonso.isidoro.roman <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>