[ 
https://issues.apache.org/jira/browse/SPARK-56093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56093:
-----------------------------------
    Labels: pull-request-available  (was: )

> Improve History Server loading times by leveraging AppStatus precomputed state
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-56093
>                 URL: https://issues.apache.org/jira/browse/SPARK-56093
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 4.2.0
>            Reporter: Victor Sunderland
>            Priority: Major
>              Labels: pull-request-available
>
> The history server can be very slow to replay Spark event logs, particularly 
> for large, long running jobs. We have observed this taking on the order of >1 
> hour.
> Spark's history server materializes the same AppStatus state into the 
> AppStatusStore to server the history server, as it does to serve the live UI. 
> This state is more than an order of magnitude smaller than the event log 
> itself.
> We could re-use that state, and avoid having to replay the entire event log 
> from the history server (we refer to this as 'history snapshots') and 
> serialize it to a known location. The history server could then load into 
> memory (or whatever KVStore implementation is configured), from that known 
> location and avoid the expensive recomputation and materialization.
> Deserialization is far and away the bottleneck for large jobs, so by 
> improving this, we've observed >10x improvements in most cases, and the curve 
> seems to get cut further as event log sizes go up.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to