[
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608968#comment-15608968
]
Marcelo Vanzin commented on SPARK-18085:
----------------------------------------
bq. this really sounds very much like the Hadoop ATS V1.5. Have you looked at
that?
I haven't looked at 1.5 exactly, but I assume it's not much different from v1.
The problem with the ATS is twofold: it's an external YARN dependency, and its
based on "immutable events"; once you write a piece of data to the ATS you
can't update it. My spec requires the ability to update objects stored in the
underlying store.
I haven't looked closely at what will be done with v2, but the blurbs I looked
at a while ago didn't really impress me much; the idea of running a separate
JVM side-by-side with the AM sounded weird to me. Also, it still has the YARN
dependency which may not be desired by a bunch of Spark users.
bq. I assume this a separate levelDB that stores just the metadata to do the
simple listing on startup?
Yes, there are n+1 dbs kept by the SHS (listing + 1 per UI).
bq. Just a quick overall picture of what I think is being proposed without all
the incremental steps and leaving out UI parts
That's pretty accurate. As far as cleanup policy, currently the code just
cleans up based on the existing clean up policy; I've been thinking about
adding a second cleaning thread to clean up the local data based on files that
were deleted from HDFS outside of the SHS, but haven't gotten to that yet.
bq. How is this solving quickly listing new apps issue?
It's not, at least not explicitly. Incremental parsing would be a way to handle
that, and also hooking up with HDFS's "inotify" API to detect new files or
files being renamed. Or writing a separate "summary" file somewhere as you
suggest. But those enhancements can be done separately from this work, really
(which is why I kept SPARK-6951 a separate issue).
bq. streaming data, I'm not sure if streaming stores history at this point?
No, streaming doesn't write to the event log, so there's no streaming UI in the
SHS. This work touches streaming because I'm changing the backing store for UI
data, but I don't want to tackle streaming history at this point in time. That
can be done separately (and this work might help make it a more viable idea).
I also don't want to stray too far from the UI / SHS enhancements here. For
example, breaking up large event log files (to make logging streaming events
more palatable) would be nice, but in a sense it's orthogonal to what's being
proposed here.
Finally, note that even though I don't explicitly call this out in the
document, the proposal here is less about where the data will be stored and
more about changing the underlying architecture to allow the data to be stored
in a different place. If you take a look at the M1 code, there's an abstraction
for an external store, and I just happen to have a LevelDB implementation. You
could potentially implement an in-memory version, or an ATS version, or
something else crazy. But the main thing I want to change is changing the idea
that UI data is stored in memory, which is the source of most of the SHS issues
we see.
> Scalability enhancements for the History Server
> -----------------------------------------------
>
> Key: SPARK-18085
> URL: https://issues.apache.org/jira/browse/SPARK-18085
> Project: Spark
> Issue Type: Umbrella
> Components: Spark Core, Web UI
> Affects Versions: 2.0.0
> Reporter: Marcelo Vanzin
> Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues.
> I'll be attaching a document shortly describing the issues and suggesting a
> path to how to solve them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]