It is completed apps that are not showing up. I'm fine with incomplete apps
not appearing.

On Tue, Apr 12, 2016 at 6:43 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 12 Apr 2016, at 00:21, Miles Crawford <mil...@allenai.org> wrote:
>
> Hey there. I have my spark applications set up to write their event logs
> into S3 - this is super useful for ephemeral clusters, I can have
> persistent history even though my hosts go away.
>
> A history server is set up to view this s3 location, and that works fine
> too - at least on startup.
>
> The problem is that the history server doesn't seem to notice new logs
> arriving into the S3 bucket.  Any idea how I can get it to scan the folder
> for new files?
>
> Thanks,
> -miles
>
>
> s3 isn't a real filesystem, and apps writing to it don't have any data
> written until one of
>  -the output stream is close()'d. This happens at the end of the app
>  -the file is set up to be partitioned and a partition size is crossed
>
> Until either of those conditions are met, the history server isn't going
> to see anything.
>
> If you are going to use s3 as the dest, and you want to see incomplete
> apps, then you'll need to configure the spark job to have smaller partition
> size (64? 128? MB).
>
> If it's completed apps that aren't being seen by the HS, then that's a
> bug, though if its against s3 only, likely to be something related to
> directory listings
>

Reply via email to