Hi Jim, What are your checkpointing settings? Are you checkpointing to a distributed file system, such as HDFS or S3 or the local file system. The latter should not be used in a production setting and I would not expect this to work properly. (Except if the local filesystem is actually a network mounted file system)
Best, Aljoscha > On 15. May 2017, at 17:05, Jim Langston <jlangs...@resolutebi.com> wrote: > > Hi all, > > I have a long running , streaming app saving checkpoints to > the file system. > > What is the layout of the checkpoint directory ? My current > checkpoint directory has >2000 directories in it , similar to this: > > chk-4645 > > > Also, the directory has grown to >3GB > > I have a small cluster, and all were started at the same time, nothing > has been restarted, but this is occurring one of the nodes, the others have > about the same number of directories in the checkpoint directory, but > not nearly as large. > > > Why are there so many chk-xxxx directories ? And why can they become > so large ? Is there something I should be setting in the yaml file ? > > I was going to just remove them , but it just struck me as odd that there > are so many … > > > Thanks > > Jim