Hi Vishal, The difference between stop-with-savepoint and stop-with-savepoint-with-drain is that the latter emits a max watermark before taking the snapshot. The idea is to trigger all pending timers and flush the content of some buffering operations like windowing. Semantically, you should use the first option if you want to stop the job and resume it at a later point in time. Stop-with-savepoint-with-drain should only be used if you want to terminate your job and don't intend to resume it because the max watermark destroys the correctness of results which are generated after the job is resumed.
For the concrete problem at hand it is difficult to say why it does not stop. It would be helpful if you could provide us with the debug logs of such a run. I am also pulling Arvid who works on Flink's connector ecosystem. Cheers, Till On Mon, Mar 29, 2021 at 11:08 PM Vishal Santoshi <vishal.santo...@gmail.com> wrote: > More interested whether a StreamingFileSink without a drain > negatively affects it's exactly-once semantics , given that I state on SP > would have the offsets from kafka + the valid lengths of the part files at > SP. To be honest not sure whether the flushed buffers on sink are included > in the length, or this is not an issue with StreamingFileSink. If it is the > former then I would assume we should be documented and then have to look > why this hang happens. > > On Mon, Mar 29, 2021 at 4:08 PM Vishal Santoshi <vishal.santo...@gmail.com> > wrote: > >> Is this a known issue. We do a stop + savepoint with drain. I see no back >> pressure on our operators. It essentially takes a SP and then the SInk ( >> StreamingFileSink to S3 ) just stays in the RUNNING state. >> >> Without drain i stop + savepoint works fine. I would imagine drain is >> important ( flush the buffers etc ) but why this hang ( I did it 3 times >> and waited 15 minutes each time ). >> >> Regards. >> >