Re: PyFlink SQL window aggregation data not written out to file

2021-11-22 Thread Guoqin Zheng
Hi Roman, Thanks for the update and testing locally. This is very informative. -Guoqin On Mon, Nov 22, 2021 at 10:55 AM Roman Khachatryan wrote: > Hi Guoqin, > > I was able to reproduce the problem locally. I can see that at the > time of window firing the services are already closed because o

Re: PyFlink SQL window aggregation data not written out to file

2021-11-22 Thread Roman Khachatryan
Hi Guoqin, I was able to reproduce the problem locally. I can see that at the time of window firing the services are already closed because of the emitted MAX_WATERMARK. Previously, there were some discussions around waiting for all timers to complete [1], but AFAIK there was not much demand to i

Re: PyFlink SQL window aggregation data not written out to file

2021-11-17 Thread Guoqin Zheng
Hi Roman, Thanks for the detailed explanation. I did try 1.13 and 1.14, but it still didn't work. I explicitly enabled the checkpoint with: `env.enable_checkpointing(10)`. Any other configurations I need to set? Thanks, -Guoqin On Wed, Nov 17, 2021 at 4:30 AM Roman Khachatryan wrote: > Hi Gu

Re: PyFlink SQL window aggregation data not written out to file

2021-11-17 Thread Roman Khachatryan
Hi Guoqin, Thanks for the clarification. Processing time windows actually don't need watermarks: they fire when window end time comes. But the job will likely finish earlier because of the bounded input. Handling of this case was improved in 1.14 as part of FLIP-147, as well as in previous versi

Re: PyFlink SQL window aggregation data not written out to file

2021-11-15 Thread Guoqin Zheng
Hi Roman, Thanks for your quick response! Yes, it does seem to be the window closing problem. So if I change the tumble window on eventTime, which is column 'requestTime', it works fine. I guess the EOF of the test data file kicks in a watermark of Long.MAX_VALUE. But my application code needs

Re: PyFlink SQL window aggregation data not written out to file

2021-11-15 Thread Roman Khachatryan
Hi Guoqin, I think the problem might be related to watermarks and checkpointing: - if the file is too small, the only watermark will be the one after fully reading the file - in exactly once mode, sink waits for a checkpoint completion before committing the files Recently, there were some improve

PyFlink SQL window aggregation data not written out to file

2021-11-15 Thread Guoqin Zheng
Hi all, I am reposting my stackoverflow question here. So I would like to do a local test of my pyflink sql application. The sql query does a very simple job: read from source