Victor Sunderland created SPARK-56093:
-----------------------------------------

             Summary: Improve History Server loading times by leveraging 
AppStatus precomputed state
                 Key: SPARK-56093
                 URL: https://issues.apache.org/jira/browse/SPARK-56093
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 4.2.0
            Reporter: Victor Sunderland


The history server can be very slow to replay Spark event logs, particularly 
for large, long running jobs. We have observed this taking on the order of >1 
hour.

Spark's history server materializes the same AppStatus state into the 
AppStatusStore to server the history server, as it does to serve the live UI. 
This state is more than an order of magnitude smaller than the event log itself.

We could re-use that state, and avoid having to replay the entire event log 
from the history server (we refer to this as 'history snapshots') and serialize 
it to a known location. The history server could then load into memory (or 
whatever KVStore implementation is configured), from that known location and 
avoid the expensive recomputation and materialization.

Deserialization is far and away the bottleneck for large jobs, so by improving 
this, we've observed >10x improvements in most cases, and the curve seems to 
get cut further as event log sizes go up.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to