Re: History Server Refresh?

Steve Loughran Tue, 12 Apr 2016 06:43:29 -0700

On 12 Apr 2016, at 00:21, Miles Crawford 
<mil...@allenai.org<mailto:mil...@allenai.org>> wrote:


Hey there. I have my spark applications set up to write their event logs into 
S3 - this is super useful for ephemeral clusters, I can have persistent history 
even though my hosts go away.

A history server is set up to view this s3 location, and that works fine too - 
at least on startup.

The problem is that the history server doesn't seem to notice new logs arriving 
into the S3 bucket.  Any idea how I can get it to scan the folder for new files?

Thanks,
-miles

s3 isn't a real filesystem, and apps writing to it don't have any data written 
until one of
 -the output stream is close()'d. This happens at the end of the app
 -the file is set up to be partitioned and a partition size is crossed

Until either of those conditions are met, the history server isn't going to see 
anything.

If you are going to use s3 as the dest, and you want to see incomplete apps, 
then you'll need to configure the spark job to have smaller partition size (64? 
128? MB).

If it's completed apps that aren't being seen by the HS, then that's a bug, 
though if its against s3 only, likely to be something related to directory 
listings

Re: History Server Refresh?

Reply via email to