I would have to check it, but in principle it could be done by checking the
streaming logs, so that once you detect when a shuffle operation starts and
ends, you can know the total operation time.


https://stackoverflow.com/questions/27276884/what-is-shuffle-read-shuffle-write-in-apache-spark

El mar., 4 feb. 2020 a las 12:58, asma zgolli (<zgollia...@gmail.com>)
escribió:

> dear spark contributors,
>
> I'm searching for a way to model spark shuffle cost and i wonder if there
> s mathematic formulas to compute "shuffle read " and "shuffle write" sizes
> in the stages view in spark UI.
> if there isn't, are there any references to head start in this.
> Stage Id  â–¾
> <http://localhost:4040/stages/?&completedStage.sort=Stage+Id&completedStage.desc=false&completedStage.pageSize=100#completed>
> Description
> <http://localhost:4040/stages/?&completedStage.sort=Description&completedStage.pageSize=100#completed>
> Submitted
> <http://localhost:4040/stages/?&completedStage.sort=Submitted&completedStage.pageSize=100#completed>
> Duration
> <http://localhost:4040/stages/?&completedStage.sort=Duration&completedStage.pageSize=100#completed>Tasks:
> Succeeded/TotalInput
> <http://localhost:4040/stages/?&completedStage.sort=Input&completedStage.pageSize=100#completed>
> Output
> <http://localhost:4040/stages/?&completedStage.sort=Output&completedStage.pageSize=100#completed>Shuffle
> Read
> <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Read&completedStage.pageSize=100#completed>Shuffle
> Write
> <http://localhost:4040/stages/?&completedStage.sort=Shuffle+Write&completedStage.pageSize=100#completed>
>
> thank you for the help and the directions
> yours sincerely
> Asma ZGOLLI
>
> Ph.D. student in data engineering - computer science
>


-- 
Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman
<https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links>

Reply via email to