Victor Sunderland created SPARK-56093:
-----------------------------------------
Summary: Improve History Server loading times by leveraging
AppStatus precomputed state
Key: SPARK-56093
URL: https://issues.apache.org/jira/browse/SPARK-56093
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 4.2.0
Reporter: Victor Sunderland
The history server can be very slow to replay Spark event logs, particularly
for large, long running jobs. We have observed this taking on the order of >1
hour.
Spark's history server materializes the same AppStatus state into the
AppStatusStore to server the history server, as it does to serve the live UI.
This state is more than an order of magnitude smaller than the event log itself.
We could re-use that state, and avoid having to replay the entire event log
from the history server (we refer to this as 'history snapshots') and serialize
it to a known location. The history server could then load into memory (or
whatever KVStore implementation is configured), from that known location and
avoid the expensive recomputation and materialization.
Deserialization is far and away the bottleneck for large jobs, so by improving
this, we've observed >10x improvements in most cases, and the curve seems to
get cut further as event log sizes go up.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]