I ran it on a single job.
SparkLens has an overhead on the job duration. I'm not ready to enable it
by default on all our jobs.

Attached is the output.

Still trying to understand what exactly it means.

On Sun, Mar 25, 2018 at 10:40 AM, Fawze Abujaber <fawz...@gmail.com> wrote:

> Nice!
>
> Shmuel, Were you able to run on a cluster level or for a specific job?
>
> Did you configure it on the spark-default.conf?
>
> On Sun, 25 Mar 2018 at 10:34 Shmuel Blitz <shmuel.bl...@similarweb.com>
> wrote:
>
>> Just to let you know, I have managed to run SparkLens on our cluster.
>>
>> I switched to the spark_1.6 branch, and also compiled against the
>> specific image of Spark we are using (cdh5.7.6).
>>
>> Now I need to figure out what the output means... :P
>>
>> Shmuel
>>
>> On Fri, Mar 23, 2018 at 7:24 PM, Fawze Abujaber <fawz...@gmail.com>
>> wrote:
>>
>>> Quick question:
>>>
>>> how to add the  --jars /path/to/sparklens_2.11-0.1.0.jar to the
>>> spark-default conf, should it be using:
>>>
>>> spark.driver.extraClassPath /path/to/sparklens_2.11-0.1.0.jar or i
>>> should use spark.jars option? anyone who could give an example how it
>>> should be, and if i the path for the jar should be an hdfs path as i'm
>>> using it in cluster mode.
>>>
>>>
>>>
>>>
>>> On Fri, Mar 23, 2018 at 6:33 AM, Fawze Abujaber <fawz...@gmail.com>
>>> wrote:
>>>
>>>> Hi Shmuel,
>>>>
>>>> Did you compile the code against the right branch for Spark 1.6.
>>>>
>>>> I tested it and it looks working and now i'm testing the branch for a
>>>> wide tests, Please use the branch for Spark 1.6
>>>>
>>>> On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz <
>>>> shmuel.bl...@similarweb.com> wrote:
>>>>
>>>>> Hi Rohit,
>>>>>
>>>>> Thanks for sharing this great tool.
>>>>> I tried running a spark job with the tool, but it failed with an 
>>>>> *IncompatibleClassChangeError
>>>>> *Exception.
>>>>>
>>>>> I have opened an issue on Github.(https://github.com/qub
>>>>> ole/sparklens/issues/1)
>>>>>
>>>>> Shmuel
>>>>>
>>>>> On Thu, Mar 22, 2018 at 5:05 PM, Shmuel Blitz <
>>>>> shmuel.bl...@similarweb.com> wrote:
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> We will give this a try and report back.
>>>>>>
>>>>>> Shmuel
>>>>>>
>>>>>> On Thu, Mar 22, 2018 at 4:22 PM, Rohit Karlupia <roh...@qubole.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks everyone!
>>>>>>> Please share how it works and how it doesn't. Both help.
>>>>>>>
>>>>>>> Fawaze, just made few changes to make this work with spark 1.6. Can
>>>>>>> you please try building from branch *spark_1.6*
>>>>>>>
>>>>>>> thanks,
>>>>>>> rohitk
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 22, 2018 at 10:18 AM, Fawze Abujaber <fawz...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It's super amazing .... i see it was tested on spark 2.0.0 and
>>>>>>>> above, what about Spark 1.6 which is still part of Cloudera's main 
>>>>>>>> versions?
>>>>>>>>
>>>>>>>> We have a vast Spark applications with version 1.6.0
>>>>>>>>
>>>>>>>> On Thu, Mar 22, 2018 at 6:38 AM, Holden Karau <hol...@pigscanfly.ca
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Super exciting! I look forward to digging through it this weekend.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) <
>>>>>>>>> ravishankar.n...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Excellent. You filled a missing link.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Passion
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia <
>>>>>>>>>> roh...@qubole.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Happy to announce the availability of Sparklens as open source
>>>>>>>>>>> project. It helps in understanding the  scalability limits of spark
>>>>>>>>>>> applications and can be a useful guide on the path towards tuning
>>>>>>>>>>> applications for lower runtime or cost.
>>>>>>>>>>>
>>>>>>>>>>> Please clone from here: https://github.com/qubole/sparklens
>>>>>>>>>>> Old blogpost: https://www.qubole.com/blog/introducing-quboles-sp
>>>>>>>>>>> ark-tuning-tool/
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>> rohitk
>>>>>>>>>>>
>>>>>>>>>>> PS: Thanks for the patience. It took couple of months to get
>>>>>>>>>>> back on this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Shmuel Blitz
>>>>>> Big Data Developer
>>>>>> Email: shmuel.bl...@similarweb.com
>>>>>> www.similarweb.com
>>>>>> <https://www.facebook.com/SimilarWeb/>
>>>>>> <https://www.linkedin.com/company/429838/>
>>>>>> <https://twitter.com/similarweb>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Shmuel Blitz
>>>>> Big Data Developer
>>>>> Email: shmuel.bl...@similarweb.com
>>>>> www.similarweb.com
>>>>> <https://www.facebook.com/SimilarWeb/>
>>>>> <https://www.linkedin.com/company/429838/>
>>>>> <https://twitter.com/similarweb>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Shmuel Blitz
>> Big Data Developer
>> Email: shmuel.bl...@similarweb.com
>> www.similarweb.com
>> <https://www.facebook.com/SimilarWeb/>
>> <https://www.linkedin.com/company/429838/>
>> <https://twitter.com/similarweb>
>>
>


-- 
Shmuel Blitz
Big Data Developer
Email: shmuel.bl...@similarweb.com
www.similarweb.com
<https://www.facebook.com/SimilarWeb/>
<https://www.linkedin.com/company/429838/> <https://twitter.com/similarweb>
Printing application meterics.....

 AggregateMetrics (Application Metrics) total measurements 1869 
                NAME                        SUM                MIN           
MAX                MEAN         
 diskBytesSpilled                            0.0 KB         0.0 KB         0.0 
KB              0.0 KB
 executorRuntime                            15.1 hh         3.0 ms         4.0 
mm             29.1 ss
 inputBytesRead                             26.1 GB         0.0 KB        43.8 
MB             14.3 MB
 jvmGCTime                                  11.0 mm         0.0 ms         2.1 
ss            354.0 ms
 memoryBytesSpilled                        314.2 GB         0.0 KB         1.1 
GB            172.1 MB
 outputBytesWritten                          0.0 KB         0.0 KB         0.0 
KB              0.0 KB
 peakExecutionMemory                         0.0 KB         0.0 KB         0.0 
KB              0.0 KB
 resultSize                                 12.9 MB         2.0 KB        40.9 
KB              7.1 KB
 shuffleReadBytesRead                      107.7 GB         0.0 KB       276.0 
MB             59.0 MB
 shuffleReadFetchWaitTime                    2.0 ms         0.0 ms         0.0 
ms              0.0 ms
 shuffleReadLocalBlocks                       2,318              0             
68                   1
 shuffleReadRecordsRead               3,413,511,099              0      
8,251,926           1,826,383
 shuffleReadRemoteBlocks                    291,126              0            
824                 155
 shuffleWriteBytesWritten                  107.6 GB         0.0 KB       257.6 
MB             58.9 MB
 shuffleWriteRecordsWritten           3,408,133,175              0      
7,959,055           1,823,506
 shuffleWriteTime                            8.7 mm         0.0 ms         1.8 
ss            278.2 ms
 taskDuration                               15.4 hh        12.0 ms         4.1 
mm             29.7 ss



Done printing host timeline

Total Hosts 135


Host server86.cluster.com startTime 02:26:21:081 executors count 3
Host server164.cluster.com startTime 02:30:12:204 executors count 1
Host server28.cluster.com startTime 02:31:09:023 executors count 1
Host server123.cluster.com startTime 02:31:29:852 executors count 2
Host server69.cluster.com startTime 02:30:27:796 executors count 2
Host server140.cluster.com startTime 02:29:00:130 executors count 2
Host server106.cluster.com startTime 02:29:15:915 executors count 2
Host server173.cluster.com startTime 02:31:47:044 executors count 1
Host server190.cluster.com startTime 02:29:33:359 executors count 1
Host server156.cluster.com startTime 02:28:13:027 executors count 1
Host server85.cluster.com startTime 02:29:13:030 executors count 1
Host server68.cluster.com startTime 02:26:44:921 executors count 2
Host server44.cluster.com startTime 02:26:22:324 executors count 2
Host server20.cluster.com startTime 02:29:01:819 executors count 2
Host server197.cluster.com startTime 02:26:49:482 executors count 2
Host server61.cluster.com startTime 02:29:04:992 executors count 1
Host server131.cluster.com startTime 02:31:45:763 executors count 1
Host server60.cluster.com startTime 02:29:08:526 executors count 1
Host server26.cluster.com startTime 02:29:51:914 executors count 1
Host server43.cluster.com startTime 02:29:53:303 executors count 2
Host server138.cluster.com startTime 02:26:20:146 executors count 1
Host server196.cluster.com startTime 02:31:41:035 executors count 1
Host server114.cluster.com startTime 02:31:46:565 executors count 1
Host server09.cluster.com startTime 02:30:51:657 executors count 1
Host server172.cluster.com startTime 02:29:06:900 executors count 1
Host server155.cluster.com startTime 02:29:19:209 executors count 1
Host server93.cluster.com startTime 02:29:22:610 executors count 1
Host server130.cluster.com startTime 02:29:48:358 executors count 2
Host server18.cluster.com startTime 02:31:39:437 executors count 1
Host server59.cluster.com startTime 02:30:22:335 executors count 4
Host server113.cluster.com startTime 02:29:07:841 executors count 2
Host server35.cluster.com startTime 02:26:50:403 executors count 2
Host server01.cluster.com startTime 02:29:22:802 executors count 1
Host server171.cluster.com startTime 02:29:47:764 executors count 1
Host server188.cluster.com startTime 02:29:10:729 executors count 1
Host server52.cluster.com startTime 02:26:49:640 executors count 1
Host server99.cluster.com startTime 02:26:05:705 executors count 4
Host server34.cluster.com startTime 02:29:31:464 executors count 1
Host server75.cluster.com startTime 02:29:28:241 executors count 3
Host server10.cluster.com startTime 02:29:54:994 executors count 1
Host server58.cluster.com startTime 02:29:00:602 executors count 4
Host server163.cluster.com startTime 02:31:47:322 executors count 1
Host server92.cluster.com startTime 02:31:33:694 executors count 2
Host server129.cluster.com startTime 02:30:02:504 executors count 2
Host server146.cluster.com startTime 02:30:21:951 executors count 1
Host server105.cluster.com startTime 02:26:20:386 executors count 2
Host server50.cluster.com startTime 02:30:07:969 executors count 1
Host server121.cluster.com startTime 02:26:19:625 executors count 2
Host server16.cluster.com startTime 02:29:19:612 executors count 1
Host server169.cluster.com startTime 02:29:48:686 executors count 1
Host server128.cluster.com startTime 02:31:00:486 executors count 1
Host server104.cluster.com startTime 02:30:11:263 executors count 2
Host server186.cluster.com startTime 02:30:07:595 executors count 1
Host server162.cluster.com startTime 02:29:07:528 executors count 2
Host server08.cluster.com startTime 02:26:01:337 executors count 3
Host server42.cluster.com startTime 02:29:09:956 executors count 2
Host server103.cluster.com startTime 02:27:04:400 executors count 3
Host server120.cluster.com startTime 02:28:57:765 executors count 4
Host server66.cluster.com startTime 02:26:42:152 executors count 2
Host server195.cluster.com startTime 02:29:52:884 executors count 1
Host server49.cluster.com startTime 02:27:53:847 executors count 1
Host server154.cluster.com startTime 02:30:01:367 executors count 2
Host server25.cluster.com startTime 02:31:41:512 executors count 2
Host server65.cluster.com startTime 02:28:08:708 executors count 2
Host server194.cluster.com startTime 02:31:48:742 executors count 1
Host server07.cluster.com startTime 02:26:27:685 executors count 2
Host server136.cluster.com startTime 02:26:17:840 executors count 2
Host server82.cluster.com startTime 02:31:45:961 executors count 1
Host server48.cluster.com startTime 02:26:00:583 executors count 1
Host server119.cluster.com startTime 02:26:20:125 executors count 1
Host server24.cluster.com startTime 02:31:45:990 executors count 1
Host server112.cluster.com startTime 02:29:16:166 executors count 4
Host server57.cluster.com startTime 02:29:18:285 executors count 1
Host server118.cluster.com startTime 02:29:28:424 executors count 1
Host server91.cluster.com startTime 02:26:00:525 executors count 3
Host server111.cluster.com startTime 02:30:05:595 executors count 1
Host server98.cluster.com startTime 02:29:53:427 executors count 1
Host server74.cluster.com startTime 02:26:31:665 executors count 3
Host server110.cluster.com startTime 02:29:25:546 executors count 2
Host server161.cluster.com startTime 02:29:11:616 executors count 3
Host server39.cluster.com startTime 02:32:14:317 executors count 1
Host server73.cluster.com startTime 02:31:50:390 executors count 3
Host server15.cluster.com startTime 02:26:23:104 executors count 2
Host server168.cluster.com startTime 02:29:05:654 executors count 1
Host server109.cluster.com startTime 02:26:14:301 executors count 2
Host server55.cluster.com startTime 02:29:36:467 executors count 1
Host server102.cluster.com startTime 02:28:58:268 executors count 1
Host server38.cluster.com startTime 02:29:03:309 executors count 3
Host server167.cluster.com startTime 02:26:47:201 executors count 3
Host server89.cluster.com startTime 02:27:48:202 executors count 2
Host server143.cluster.com startTime 02:26:23:646 executors count 2
Host server159.cluster.com startTime 02:31:37:841 executors count 3
Host server81.cluster.com startTime 02:30:20:654 executors count 1
Host server47.cluster.com startTime 02:29:19:942 executors count 1
Host server142.cluster.com startTime 02:30:05:837 executors count 2
Host server193.cluster.com startTime 02:30:33:828 executors count 1
Host server108.cluster.com startTime 02:29:18:082 executors count 4
Host server88.cluster.com startTime 02:26:44:310 executors count 2
Host server64.cluster.com startTime 02:26:58:103 executors count 3
Host server199.cluster.com startTime 02:30:36:656 executors count 1
Host server192.cluster.com startTime 02:27:04:664 executors count 1
Host server22.cluster.com startTime 02:26:36:725 executors count 3
Host server29.cluster.com startTime 02:29:28:038 executors count 1
Host server05.cluster.com startTime 02:29:08:972 executors count 3
Host server117.cluster.com startTime 02:26:42:338 executors count 1
Host server87.cluster.com startTime 02:26:20:093 executors count 3
Host server158.cluster.com startTime 02:26:27:096 executors count 2
Host server63.cluster.com startTime 02:26:22:321 executors count 3
Host server80.cluster.com startTime 02:29:34:249 executors count 2
Host server116.cluster.com startTime 02:30:57:138 executors count 2
Host server45.cluster.com startTime 02:26:25:972 executors count 4
Host server79.cluster.com startTime 02:29:58:945 executors count 1
Host server174.cluster.com startTime 02:26:06:484 executors count 2
Host server21.cluster.com startTime 02:26:54:798 executors count 2
Host server150.cluster.com startTime 02:31:46:518 executors count 2
Host server157.cluster.com startTime 02:29:27:251 executors count 4
Host server04.cluster.com startTime 02:26:28:554 executors count 2
Host server133.cluster.com startTime 02:30:18:156 executors count 1
Host server71.cluster.com startTime 02:30:51:128 executors count 2
Host server30.cluster.com startTime 02:26:21:170 executors count 2
Host server166.cluster.com startTime 02:30:18:564 executors count 1
Host server37.cluster.com startTime 02:29:29:179 executors count 2
Host server132.cluster.com startTime 02:29:33:675 executors count 1
Host server78.cluster.com startTime 02:26:08:844 executors count 5
Host server95.cluster.com startTime 02:31:45:301 executors count 1
Host server54.cluster.com startTime 02:26:09:341 executors count 3
Host server36.cluster.com startTime 02:26:15:004 executors count 4
Host server107.cluster.com startTime 02:29:47:766 executors count 4
Host server94.cluster.com startTime 02:29:07:823 executors count 3
Host server12.cluster.com startTime 02:29:15:729 executors count 1
Host server141.cluster.com startTime 02:26:01:994 executors count 2
Host server148.cluster.com startTime 02:29:23:191 executors count 1
Host server53.cluster.com startTime 02:27:43:220 executors count 1
Host server124.cluster.com startTime 02:26:10:523 executors count 3
Host server100.cluster.com startTime 02:30:24:073 executors count 1
Done printing host timeline


Printing executors timeline....

Total Hosts 135
Total Executors 250
At 02:26 executors added 52 & removed  0 currently available 52
At 02:27 executors added 10 & removed  0 currently available 62
At 02:28 executors added 13 & removed  0 currently available 75
At 02:29 executors added 81 & removed  0 currently available 156
At 02:30 executors added 48 & removed  0 currently available 204
At 02:31 executors added 45 & removed  0 currently available 249
At 02:32 executors added 1 & removed  0 currently available 250

Done printing executors timeline...



Printing Application timeline 

02:25:42:667 app started 
02:26:23:421 JOB 0 started : duration 00m 02s 
[      0        
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
02:26:23:661      Stage 0 started : duration 00m 02s 
02:26:25:888      Stage 0 ended : maxTaskTime 828 taskCount 1
02:26:25:903 JOB 0 ended 
02:26:36:216 JOB 1 started : duration 00m 02s 
[      1         
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
02:26:36:498      Stage 1 started : duration 00m 02s 
02:26:38:959      Stage 1 ended : maxTaskTime 932 taskCount 1
02:26:38:961 JOB 1 ended 
02:26:39:239 JOB 2 started : duration 00m 03s 
[      2    
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
02:26:39:392      Stage 2 started : duration 00m 03s 
02:26:42:403      Stage 2 ended : maxTaskTime 1352 taskCount 1
02:26:42:404 JOB 2 ended 
02:26:42:717 JOB 3 started : duration 00m 07s 
[      3 ||||||||||||||||||||||||||||||||||||||||||||||||||||||                 
         ]
[      4                                                        
|||||||||||||||||||||||| ]
02:26:42:740      Stage 3 started : duration 00m 04s 
02:26:47:654      Stage 3 ended : maxTaskTime 3117 taskCount 1
02:26:47:708      Stage 4 started : duration 00m 02s 
02:26:49:898      Stage 4 ended : maxTaskTime 226 taskCount 200
02:26:49:901 JOB 3 ended 
02:26:56:234 JOB 4 started : duration 08m 28s 
[      5 |||||||                                                                
         ]
[      6  |||||||||||||||||||                                                   
         ]
[      9                   ||||||||                                             
         ]
[     10     ||||||||||||||                                                     
         ]
[     11                                                                        
         ]
[     12                     ||                                                 
         ]
[     13                       ||||                                             
         ]
[     14                           |||||||||||||||                              
         ]
[     15                                          
|||||||||||||||||||||||||||||||||||||| ]
02:26:58:095      Stage 5 started : duration 00m 44s 
02:27:42:816      Stage 5 ended : maxTaskTime 37214 taskCount 23
02:27:03:478      Stage 6 started : duration 02m 04s 
02:29:07:517      Stage 6 ended : maxTaskTime 35578 taskCount 601
02:28:56:449      Stage 9 started : duration 00m 46s 
02:29:42:625      Stage 9 ended : maxTaskTime 7196 taskCount 200
02:27:22:343      Stage 10 started : duration 01m 33s 
02:28:56:333      Stage 10 ended : maxTaskTime 49203 taskCount 39
02:27:23:910      Stage 11 started : duration 00m 00s 
02:27:24:422      Stage 11 ended : maxTaskTime 298 taskCount 2
02:29:06:902      Stage 12 started : duration 00m 12s 
02:29:19:350      Stage 12 ended : maxTaskTime 11511 taskCount 200
02:29:19:413      Stage 13 started : duration 00m 25s 
02:29:44:444      Stage 13 ended : maxTaskTime 24924 taskCount 200
02:29:44:491      Stage 14 started : duration 01m 36s 
02:31:20:873      Stage 14 ended : maxTaskTime 86194 taskCount 200
02:31:20:973      Stage 15 started : duration 04m 03s 
02:35:24:346      Stage 15 ended : maxTaskTime 238747 taskCount 200
02:35:24:347 JOB 4 ended 
02:35:28:841 app ended 



Checking for job overlap...


No overlapping jobs found. Good




 Time spent in Driver vs Executors
 Driver WallClock Time    01m 02s   10.66%
 Executor WallClock Time  08m 43s   89.34%
 Total WallClock Time     09m 46s
      


Minimum possible time for the app based on the critical path (with infinite 
resources)   07m 59s
Minimum possible time for the app with same executors, perfect parallelism and 
zero skew 02m 15s
If we were to run this app with single executor and single core                 
         15h 08m

       
 Total cores available to the app 750

 OneCoreComputeHours: Measure of total compute power available from cluster. 
One core in the executor, running
                      for one hour, counts as one OneCoreComputeHour. Executors 
with 4 cores, will have 4 times
                      the OneCoreComputeHours compared to one with just one 
core. Similarly, one core executor
                      running for 4 hours will OnCoreComputeHours equal to 4 
core executor running for 1 hour.

 Driver Utilization (Cluster idle because of driver)

 Total OneCoreComputeHours available                            122h 07m
 Total OneCoreComputeHours available (AutoScale Aware)           77h 25m
 OneCoreComputeHours wasted by driver                            13h 01m

 AutoScale Aware: Most of the calculations by this tool will assume that all 
executors are available throughout
                  the runtime of the application. The number above is printed 
to show possible caution to be
                  taken in interpreting the efficiency metrics.

 Cluster Utilization (Executors idle because of lack of tasks or skew)

 Executor OneCoreComputeHours available                 109h 06m
 Executor OneCoreComputeHours used                       15h 07m        13.86%
 OneCoreComputeHours wasted                              93h 59m        86.14%

 App Level Wastage Metrics (Driver + Executor)

 OneCoreComputeHours wasted Driver               10.66%
 OneCoreComputeHours wasted Executor             76.96%
 OneCoreComputeHours wasted Total                87.62%

       


 App completion time and cluster utilization estimates with different executor 
counts

 Real App Duration 09m 46s
 Model Estimation  08m 01s
 Model Error       17%

 NOTE: 1) Model error could be large when auto-scaling is enabled.
       2) Model doesn't handles multiple jobs run via thread-pool. For better 
insights into
          application scalability, please try such jobs one by one without 
thread-pool.

       
 Executor count    25  ( 10%) estimated time 17m 07s and estimated cluster 
utilization 70.61%
 Executor count    50  ( 20%) estimated time 12m 15s and estimated cluster 
utilization 49.34%
 Executor count   125  ( 50%) estimated time 08m 25s and estimated cluster 
utilization 28.72%
 Executor count   200  ( 80%) estimated time 08m 15s and estimated cluster 
utilization 18.29%
 Executor count   250  (100%) estimated time 08m 01s and estimated cluster 
utilization 15.06%
 Executor count   275  (110%) estimated time 08m 00s and estimated cluster 
utilization 13.72%
 Executor count   300  (120%) estimated time 07m 59s and estimated cluster 
utilization 12.61%
 Executor count   375  (150%) estimated time 07m 59s and estimated cluster 
utilization 10.09%
 Executor count   500  (200%) estimated time 07m 59s and estimated cluster 
utilization 7.57%
 Executor count   750  (300%) estimated time 07m 59s and estimated cluster 
utilization 5.04%
 Executor count  1000  (400%) estimated time 07m 59s and estimated cluster 
utilization 3.78%
 Executor count  1250  (500%) estimated time 07m 59s and estimated cluster 
utilization 3.03%



Total tasks in all stages 1869
Per Stage  Utilization
Stage-ID   Wall    Task      Task     IO%    Input     Output    
----Shuffle-----    -WallClockTime-    --OneCoreComputeHours---   MaxTaskMem
          Clock%  Runtime%   Count                               Input  |  
Output    Measured | Ideal   Available| Used%|Wasted%                           
       
       0    0.00    0.00         1    0.0   64.0 KB    0.0 KB    0.0 KB    0.0 
KB    00m 02s   00m 00s    00h 27m    0.0  100.0    0.0 KB 
       1    0.00    0.00         1    0.0   64.0 KB    0.0 KB    0.0 KB    0.0 
KB    00m 02s   00m 00s    00h 30m    0.1   99.9    0.0 KB 
       2    0.00    0.00         1    0.0   90.0 KB    0.0 KB    0.0 KB    0.0 
KB    00m 03s   00m 00s    00h 37m    0.1   99.9    0.0 KB 
       3    0.00    0.01         1    0.0  867.1 KB    0.0 KB    0.0 KB  148.4 
KB    00m 04s   00m 00s    01h 01m    0.1   99.9    0.0 KB 
       4    0.00    0.00       200    0.0    0.0 KB    0.0 KB  148.4 KB    0.0 
KB    00m 02s   00m 00s    00h 27m    0.1   99.9    0.0 KB 
       5    6.00    1.15        23    0.2  402.1 MB    0.0 KB    0.0 KB    1.3 
GB    00m 44s   00m 00s    09h 19m    1.9   98.1    0.0 KB 
       6   17.00   19.92       601    7.1   17.2 GB    0.0 KB    0.0 KB    1.8 
GB    02m 04s   00m 14s    25h 50m   11.7   88.3    0.0 KB 
       9    6.00    0.73       200    2.9    6.9 GB    0.0 KB  409.5 MB    2.8 
GB    00m 46s   00m 00s    09h 37m    1.2   98.8    0.0 KB 
      10   13.00    2.27        39    0.3  807.8 MB    0.0 KB    0.0 KB    2.5 
GB    01m 33s   00m 01s    19h 34m    1.7   98.3    0.0 KB 
      11    0.00    0.00         2    0.0   31.5 KB    0.0 KB    0.0 KB   60.0 
KB    00m 00s   00m 00s    00h 06m    0.1   99.9    0.0 KB 
      12    1.00    2.15       200    0.3  758.7 MB    0.0 KB    2.3 GB    1.5 
GB    00m 12s   00m 01s    02h 35m   12.6   87.4    0.0 KB 
      13    3.00    5.91       200    0.0    0.0 KB    0.0 KB    1.5 GB   47.5 
GB    00m 25s   00m 04s    05h 12m   17.1   82.9    0.0 KB 
      14   13.00   19.83       200    0.0    0.0 KB    0.0 KB   50.3 GB   50.3 
GB    01m 36s   00m 14s    20h 04m   14.9   85.1    0.0 KB 
      15   34.00   48.02       200    0.0    0.0 KB    0.0 KB   53.2 GB    0.0 
KB    04m 03s   00m 34s    50h 42m   14.3   85.7    0.0 KB 


 Stage-ID WallClock  OneCore       Task   PRatio    -----Task------   OIRatio  
|* ShuffleWrite% ReadFetch%   GC%  *|
          Stage%     ComputeHours  Count            Skew   StageSkew            
                                    
      0    0.32         00h 00m       1    0.00     1.00     0.37     0.00     
|*   0.00           0.00    15.10  *|
      1    0.35         00h 00m       1    0.00     1.00     0.38     0.00     
|*   0.00           0.00    15.56  *|
      2    0.43         00h 00m       1    0.00     1.00     0.45     0.00     
|*   0.00           0.00     8.88  *|
      3    0.70         00h 00m       1    0.00     1.00     0.63     0.17     
|*   4.51           0.00     6.74  *|
      4    0.31         00h 00m     200    0.27    37.67     0.10     0.00     
|*   0.00           0.04    23.79  *|
      5    6.38         00h 10m      23    0.03     1.42     0.83     3.18     
|*   1.08           0.00     2.72  *|
      6   17.68         03h 00m     601    0.80     2.07     0.29     0.10     
|*   0.60           0.00     1.90  *|
      9    6.58         00h 06m     200    0.27     5.20     0.16     0.38     
|*   4.74          13.24     4.04  *|
     10   13.40         00h 20m      39    0.05     1.67     0.52     3.17     
|*   1.10           0.00     1.96  *|
     11    0.07         00h 00m       2    0.00     1.00     0.58     1.91     
|*  13.59           0.00     0.00  *|
     12    1.77         00h 19m     200    0.27     1.99     0.92     0.50     
|*   1.85          19.63     3.09  *|
     13    3.57         00h 53m     200    0.27     1.59     1.00    31.42     
|*   6.06          12.25     1.33  *|
     14   13.74         02h 59m     200    0.27     1.65     0.89     1.00     
|*   1.84           2.38     0.83  *|
     15   34.69         07h 15m     200    0.27     1.88     0.98     0.00     
|*   0.00           4.21     0.88  *|

PRatio:        Number of tasks in stage divided by number of cores. Represents 
degree of
               parallelism in the stage
TaskSkew:      Duration of largest task in stage divided by duration of median 
task.
               Represents degree of skew in the stage
TaskStageSkew: Duration of largest task in stage divided by total duration of 
the stage.
               Represents the impact of the largest task on stage time.
OIRatio:       Output to input ration. Total output of the stage (results + 
shuffle write)
               divided by total input (input data + shuffle read)

These metrics below represent distribution of time within the stage

ShuffleWrite:  Amount of time spent in shuffle writes across all tasks in the 
given
               stage as a percentage
ReadFetch:     Amount of time spent in shuffle read across all tasks in the 
given
               stage as a percentage
GC:            Amount of time spent in GC across all tasks in the given stage 
as a
               percentage

If the stage contributes large percentage to overall application time, we could 
look into
these metrics to check which part (Shuffle write, read fetch or GC is 
responsible)


      

done
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to