Re: Data Loss in HDFS after Job failure

2016-11-15 Thread Kostas Kloudas
ts > Dominique > > > > Von meinem Samsung Gerät gesendet. > > > Ursprüngliche Nachricht > Von: Kostas Kloudas > Datum: 15.11.16 15:51 (GMT+01:00) > An: user@flink.apache.org > Betreff: Re: Data Loss in HDFS after Job failure > >

AW: Re: Data Loss in HDFS after Job failure

2016-11-15 Thread Dominique Rondé
. GreetsDominique  Von meinem Samsung Gerät gesendet. Ursprüngliche Nachricht Von: Kostas Kloudas Datum: 15.11.16 15:51 (GMT+01:00) An: user@flink.apache.org Betreff: Re: Data Loss in HDFS after Job failure Hello Dominique, I think the problem is that you set both pending prefix and

Re: Data Loss in HDFS after Job failure

2016-11-15 Thread Kostas Kloudas
Hi Dominique, Just wanted to add that the RollingSink is deprecated and will eventually be replaced by the BucketingSink, so it is worth migrating to that. Cheers, Kostas > On Nov 15, 2016, at 3:51 PM, Kostas Kloudas > wrote: > > Hello Dominique, > > I think the problem is that you set both

Re: Data Loss in HDFS after Job failure

2016-11-15 Thread Kostas Kloudas
Hello Dominique, I think the problem is that you set both pending prefix and suffix to “”. Doing this makes the “committed” or “finished” filepaths indistinguishable from the pending ones. Thus they are cleaned up upon restoring. Could you undo this, and put for example a suffix “pending” or s

Data Loss in HDFS after Job failure

2016-11-15 Thread Dominique Rondé
Hi @all! I figured out a strange behavior with the Rolling HDFS-Sink. We consume events from a kafka topic and write them into a HDFS Filesystem. We use the RollingSink-Implementation in this way: RollingSink sink = new RollingSink("/some/hdfs/directory") // .setBucketer(new DateT