[
https://issues.apache.org/jira/browse/SPARK-56093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56093:
-----------------------------------
Labels: pull-request-available (was: )
> Improve History Server loading times by leveraging AppStatus precomputed state
> ------------------------------------------------------------------------------
>
> Key: SPARK-56093
> URL: https://issues.apache.org/jira/browse/SPARK-56093
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 4.2.0
> Reporter: Victor Sunderland
> Priority: Major
> Labels: pull-request-available
>
> The history server can be very slow to replay Spark event logs, particularly
> for large, long running jobs. We have observed this taking on the order of >1
> hour.
> Spark's history server materializes the same AppStatus state into the
> AppStatusStore to server the history server, as it does to serve the live UI.
> This state is more than an order of magnitude smaller than the event log
> itself.
> We could re-use that state, and avoid having to replay the entire event log
> from the history server (we refer to this as 'history snapshots') and
> serialize it to a known location. The history server could then load into
> memory (or whatever KVStore implementation is configured), from that known
> location and avoid the expensive recomputation and materialization.
> Deserialization is far and away the bottleneck for large jobs, so by
> improving this, we've observed >10x improvements in most cases, and the curve
> seems to get cut further as event log sizes go up.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]