Hi Shahar,

>From my understanding, if you use "groupby" withAggregateFunctions, they
save the accumulators to SQL internal states: which are invariant from your
input schema. Based on what you described I think that's why it is fine for
recovering from existing state.
I think one confusion you might have is the "toRetractStream" syntax. This
actually passes the "retracting" flag to the Flink planner to indicate how
the DataStream operator gets generated based on your SQL.

So in my understanding, there's really no "state" associated with the
"retracting stream", but rather associated with the generated operators.
However, I am not expert in Table/SQL state recovery: I recall there were
an open JIRA[1] that might be related to your question regarding SQL/Table
generated operator recovery. Maybe @Fabian can provide more insight here?

Regarding the rest of the pipeline, both "filter" and "map" operators are
stateless; and sink state recovery depends on what you do.

--
Rong

[1] https://issues.apache.org/jira/browse/FLINK-6966

On Fri, Mar 8, 2019 at 12:07 PM shkob1 <shahar.kobrin...@gmail.com> wrote:

> Thanks Rong,
>
> I have made some quick test changing the SQL select (adding a select field
> in the middle) and reran the job from a savepoint and it worked without any
> errors. I want to make sure i understand how at what point the state is
> stored and how does it work.
>
> Let's simplify the scenario and forget my specific case of dynamically
> generated pojo. let's focus on generic steps of:
> Source->register table->SQL select and group by session->retracted stream
> (Row)->transformToPojo (Custom Map function) ->pushToSink
>
> And let's assume the SQL select is changed (a field is added somewhere in
> the middle of the select field).
> So:
> We had intermediate results that are in the old format that are loaded from
> state to the new Row object in the retracted stream. is that an accurate
> statement? at what operator/format is the state stored in this case? is it
> the SQL result/Row? is it the Pojo? as this scenario does not fail for me
> im
> trying to understand how/where it is handled in Flink?
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to