Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-12 Thread Timo Walther
Hi Austin, your explanation for the KeyedProcessFunction implementation sounds good to me. Using the time and state primitives for this task will make the implementation more explicit but also more readable. Let me know if you could solve your use case. Regards, Timo On 09.10.20 17:27, Aus

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-09 Thread Austin Cawley-Edwards
Hey Timo, Hah, that's a fair point about using time. I guess I should update my statement to "as a user, I don't want to worry about *manually managing* time". That's a nice suggestion with the KeyedProcessFunction and no windows, I'll give that a shot. If I don't want to emit any duplicates, I'd

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-09 Thread Timo Walther
Hi Austin, if you don't want to worry about time at all, you should probably not use any windows because those are a time-based operation. A solution that would look a bit nicer could be to use a pure KeyedProcessFunction and implement the deduplication logic without reusing windows. In Proc

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-08 Thread Austin Cawley-Edwards
Hey Timo, Sorry for the delayed reply. I'm using the Blink planner and using non-time-based joins. I've got an example repo here that shows my query/ setup [1]. It's got the manual timestamp assignment commented out for now, but that does indeed solve the issue. I'd really like to not worry about

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-05 Thread Timo Walther
Btw which planner are you using? Regards, Timo On 05.10.20 10:23, Timo Walther wrote: Hi Austin, could you share some details of your SQL query with us? The reason why I'm asking is because I guess that the rowtime field is not inserted into the `StreamRecord` of DataStream API. The rowtime

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-05 Thread Timo Walther
Hi Austin, could you share some details of your SQL query with us? The reason why I'm asking is because I guess that the rowtime field is not inserted into the `StreamRecord` of DataStream API. The rowtime field is only inserted if there is a single field in the output of the query that is a

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-05 Thread Till Rohrmann
Hi Austin, thanks for offering to help. First I would suggest asking Timo whether this is an aspect which is still missing or whether we overlooked it. Based on that we can then take the next steps. Cheers, Till On Fri, Oct 2, 2020 at 7:05 PM Austin Cawley-Edwards < austin.caw...@gmail.com> wrot

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-02 Thread Austin Cawley-Edwards
Hey Till, Thanks for the notes. Yeah, the docs don't mention anything specific to this case, not sure if it's an uncommon one. Assigning timestamps on conversion does solve the issue. I'm happy to take a stab at implementing the feature if it is indeed missing and you all think it'd be worthwhile.

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-02 Thread Till Rohrmann
Hi Austin, yes, it should also work for ingestion time. I am not entirely sure whether event time is preserved when converting a Table into a retract stream. It should be possible and if it is not working, then I guess it is a missing feature. But I am sure that @Timo Walther knows more about it

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Austin Cawley-Edwards
Hey Till, Just a quick question on time characteristics -- this should work for IngestionTime as well, correct? Is there anything special I need to do to have the CsvTableSource/ toRetractStream call to carry through the assigned timestamps, or do I have to re-assign timestamps during the conversi

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Austin Cawley-Edwards
Perfect, thanks so much Till! On Thu, Oct 1, 2020 at 5:13 AM Till Rohrmann wrote: > Hi Austin, > > I believe that the problem is the processing time window. Unlike for event > time where we send a MAX_WATERMARK at the end of the stream to trigger all > remaining windows, this does not happen for

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Till Rohrmann
Hi Austin, I believe that the problem is the processing time window. Unlike for event time where we send a MAX_WATERMARK at the end of the stream to trigger all remaining windows, this does not happen for processing time windows. Hence, if your stream ends and you still have an open processing tim

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-09-30 Thread Austin Cawley-Edwards
Hey all, Thanks for your patience. I've got a small repo that reproduces the issue here: https://github.com/austince/flink-1.10-sql-windowing-error Not sure what I'm doing wrong but it feels silly. Thanks so much! Austin On Tue, Sep 29, 2020 at 3:48 PM Austin Cawley-Edwards < austin.caw...@gmai

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-09-29 Thread Austin Cawley-Edwards
Hey Till, Thanks for the reply -- I'll try to see if I can reproduce this in a small repo and share it with you. Best, Austin On Tue, Sep 29, 2020 at 3:58 AM Till Rohrmann wrote: > Hi Austin, > > could you share with us the exact job you are running (including the > custom window trigger)? Thi

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-09-29 Thread Till Rohrmann
Hi Austin, could you share with us the exact job you are running (including the custom window trigger)? This would help us to better understand your problem. I am also pulling in Klou and Timo who might help with the windowing logic and the Table to DataStream conversion. Cheers, Till On Mon, S

Streaming SQL Job Switches to FINISHED before all records processed

2020-09-28 Thread Austin Cawley-Edwards
Hey all, I'm not sure if I've missed something in the docs, but I'm having a bit of trouble with a streaming SQL job that starts w/ raw SQL queries and then transitions to a more traditional streaming job. I'm on Flink 1.10 using the Blink planner, running locally with no checkpointing. The job l