On 12 Apr 2016, at 00:21, Miles Crawford <mil...@allenai.org<mailto:mil...@allenai.org>> wrote:
Hey there. I have my spark applications set up to write their event logs into S3 - this is super useful for ephemeral clusters, I can have persistent history even though my hosts go away. A history server is set up to view this s3 location, and that works fine too - at least on startup. The problem is that the history server doesn't seem to notice new logs arriving into the S3 bucket. Any idea how I can get it to scan the folder for new files? Thanks, -miles s3 isn't a real filesystem, and apps writing to it don't have any data written until one of -the output stream is close()'d. This happens at the end of the app -the file is set up to be partitioned and a partition size is crossed Until either of those conditions are met, the history server isn't going to see anything. If you are going to use s3 as the dest, and you want to see incomplete apps, then you'll need to configure the spark job to have smaller partition size (64? 128? MB). If it's completed apps that aren't being seen by the HS, then that's a bug, though if its against s3 only, likely to be something related to directory listings