Parquet files in streaming mode

2021-12-27 Thread Mathieu D
Hello, We’re trying to use a Parquet file sink to output files in s3. When running in Streaming mode, it seems that parquet files are flushed and rolled at each checkpoint. The result is a crazy high number of very small parquet files which completely defeats the purpose of that format. Is ther

hook a callback on checkpointing failure.

2021-10-14 Thread Mathieu D
Hey there, We have some instabilities around checkpointing, that we don't quite understand. In general, as soon as a checkpoint fails, our cluster does not recover back to a proper state. But to better understand the mechanism, we'd like to be notified as soon as this happens, so we can jump on ou

Re: proper way to manage watermarks with messages combining multiple timestamps

2021-04-18 Thread Mathieu D
egards > Lasse Nedergaard > > > > Den 16. apr. 2021 kl. 18.29 skrev Mathieu D : > > > >  > > Hello, > > > > I'm totally new to Flink, and I'd like to make sure I understand things > properly around watermarks. > > > > We're

proper way to manage watermarks with messages combining multiple timestamps

2021-04-16 Thread Mathieu D
Hello, I'm totally new to Flink, and I'd like to make sure I understand things properly around watermarks. We're processing messages from iot devices. Those messages have a timestamp, and we have a first phase of processing based on this timestamp. So far so good. These messages actually "pack"