[jira] [Commented] (SPARK-18085) Scalability enhancements for the History Server

Marcelo Vanzin (JIRA) Wed, 26 Oct 2016 09:44:41 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608968#comment-15608968
 ]


Marcelo Vanzin commented on SPARK-18085:
----------------------------------------

bq. this really sounds very much like the Hadoop ATS V1.5. Have you looked at 
that?

I haven't looked at 1.5 exactly, but I assume it's not much different from v1. 
The problem with the ATS is twofold: it's an external YARN dependency, and its 
based on "immutable events"; once you write a piece of data to the ATS you 
can't update it. My spec requires the ability to update objects stored in the 
underlying store.

I haven't looked closely at what will be done with v2, but the blurbs I looked 
at a while ago didn't really impress me much; the idea of running a separate 
JVM side-by-side with the AM sounded weird to me. Also, it still has the YARN 
dependency which may not be desired by a bunch of Spark users.

bq. I assume this a separate levelDB that stores just the metadata to do the 
simple listing on startup?

Yes, there are n+1 dbs kept by the SHS (listing + 1 per UI).

bq. Just a quick overall picture of what I think is being proposed without all 
the incremental steps and leaving out UI parts

That's pretty accurate. As far as cleanup policy, currently the code just 
cleans up based on the existing clean up policy; I've been thinking about 
adding a second cleaning thread to clean up the local data based on files that 
were deleted from HDFS outside of the SHS, but haven't gotten to that yet.


bq. How is this solving quickly listing new apps issue?

It's not, at least not explicitly. Incremental parsing would be a way to handle 
that, and also hooking up with HDFS's "inotify" API to detect new files or 
files being renamed. Or writing a separate "summary" file somewhere as you 
suggest. But those enhancements can be done separately from this work, really 
(which is why I kept SPARK-6951 a separate issue).

bq. streaming data, I'm not sure if streaming stores history at this point? 

No, streaming doesn't write to the event log, so there's no streaming UI in the 
SHS. This work touches streaming because I'm changing the backing store for UI 
data, but I don't want to tackle streaming history at this point in time. That 
can be done separately (and this work might help make it a more viable idea).

I also don't want to stray too far from the UI / SHS enhancements here. For 
example, breaking up large event log files (to make logging streaming events 
more palatable) would be nice, but in a sense it's orthogonal to what's being 
proposed here.

Finally, note that even though I don't explicitly call this out in the 
document, the proposal here is less about where the data will be stored and 
more about changing the underlying architecture to allow the data to be stored 
in a different place. If you take a look at the M1 code, there's an abstraction 
for an external store, and I just happen to have a LevelDB implementation. You 
could potentially implement an in-memory version, or an ATS version, or 
something else crazy. But the main thing I want to change is changing the idea 
that UI data is stored in memory, which is the source of most of the SHS issues 
we see.

> Scalability enhancements for the History Server
> -----------------------------------------------
>
>                 Key: SPARK-18085
>                 URL: https://issues.apache.org/jira/browse/SPARK-18085
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core, Web UI
>    Affects Versions: 2.0.0
>            Reporter: Marcelo Vanzin
>         Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues 
> when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. 
> I'll be attaching a document shortly describing the issues and suggesting a 
> path to how to solve them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18085) Scalability enhancements for the History Server

Reply via email to