Hi Fabian,
Thanks for your information! Actually, I am not clear about the mechanism of auto-generated IDs in Flink SQL and the mechanism of how does the operator state mapping back from savepoint. I hope to get some detail information by giving an example bellow. I have two sql as samples: old sql : select id, name, sum(salary) from user_info where id == '001' group by TUMBLE(rowtime, INTERVAL '1' DAY), id, name; new sql: select id, name, sum(salary) from user_info where id == '001' and age >= '28' group by TUMBLE(rowtime, INTERVAL '1' DAY), id, name; I just add some age limitation in new SQL. Now, I want to switch the job from old one to the new one by trigger a savepoint. Flink will generate operator IDs for operators in new SQL. In this case, just from a technical point of view, the operator IDs in the savepoint of the old SQL job can match the operator IDs in the new SQL job? My understanding is that Flink will reorder the operators and generate new IDs for operators. The new IDs may not match the old IDs. This will cause some states failed to be mapped back from the old job savepoint, which naturally leads to inaccurate calculation results. I wonder if my understanding is correct. Thanks~ Jie | | shadowell | | shadow...@126.com | 签名由网易邮箱大师定制 On 7/7/2020 17:23,Fabian Hueske<fhue...@gmail.com> wrote: Hi Jie Feng, As you said, Flink translates SQL queries into streaming programs with auto-generated operator IDs. In order to start a SQL query from a savepoint, the operator IDs in the savepoint must match the IDs in the newly translated program. Right now this can only be guaranteed if you translate the same query with the same Flink version (optimizer changes might change the structure of the resulting plan even if the query is the same). This is of course a significant limitation, that the community is aware of and planning to improve in the future. I'd also like to add that it can be very difficult to assess whether it is meaningful to start a query from a savepoint that was generated with a different query. A savepoint holds intermediate data that is needed to compute the result of a query. If you update a query it is very well possible that the result computed by Flink won't be equal to the actual result of the new query. Best, Fabian Am Mo., 6. Juli 2020 um 10:50 Uhr schrieb shadowell <shadow...@126.com>: Hello, everyone, I have some unclear points when using Flink SQL. I hope to get an answer or tell me where I can find the answer. When using the DataStream API, in order to ensure that the job can recover the state from savepoint after adjustment, it is necessary to specify the uid for the operator. However, when using Flink SQL, the uid of the operator is automatically generated. If the SQL logic changes (operator order changes), when the task is restored from savepoint, will it cause some of the operator states to be unable to be mapped back, resulting in state loss? Thanks~ Jie Feng | | shadowell | | shadow...@126.com | 签名由网易邮箱大师定制