It is completed apps that are not showing up. I'm fine with incomplete apps not appearing.
On Tue, Apr 12, 2016 at 6:43 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 12 Apr 2016, at 00:21, Miles Crawford <mil...@allenai.org> wrote: > > Hey there. I have my spark applications set up to write their event logs > into S3 - this is super useful for ephemeral clusters, I can have > persistent history even though my hosts go away. > > A history server is set up to view this s3 location, and that works fine > too - at least on startup. > > The problem is that the history server doesn't seem to notice new logs > arriving into the S3 bucket. Any idea how I can get it to scan the folder > for new files? > > Thanks, > -miles > > > s3 isn't a real filesystem, and apps writing to it don't have any data > written until one of > -the output stream is close()'d. This happens at the end of the app > -the file is set up to be partitioned and a partition size is crossed > > Until either of those conditions are met, the history server isn't going > to see anything. > > If you are going to use s3 as the dest, and you want to see incomplete > apps, then you'll need to configure the spark job to have smaller partition > size (64? 128? MB). > > If it's completed apps that aren't being seen by the HS, then that's a > bug, though if its against s3 only, likely to be something related to > directory listings >