@Paul I agree with Gordon that those are useful features. The only thing I’d like to add is that I don’t believe listing operator ids will be useful to most users, they want to see UIDs which would also require changes to the Savepoint metadata file. I think that would be a good follow up but outside the scope of an initial implementation.
Seth > On May 30, 2019, at 3:05 AM, Louis <xu_soft39211...@163.com> wrote: > > +1 from my size. > > I think it will be a good feature. > > Best > -- > Louis > Email:xu_soft39211...@163.com > >> On 30 May 2019, at 15:57, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: >> >> The name "Savepoint Connector" might indeed be not that good, as it doesn't >> point out the fact that with the current design, all kinds of snapshots >> (savepoint / full or incremental checkpoints) can be read. >> >> @Paul >> That would be a very valid requirement. Querying the list of existing >> operator ids should be straight forward, as that information is in the >> snapshot metadata file. >> However, querying state names / state structure / state type would >> currently be impossible without also reading the state itself, as that >> information isn't globally available and can only be known when each key >> group is being read. We could potentially make those information available >> in the snapshot metadata file, but that would require more work. I think >> that can be a next step once we have an initial version. >> >> Cheers, >> Gordon >> >>> On Thu, May 30, 2019 at 1:21 PM Paul Lam <paullin3...@gmail.com> wrote: >>> >>> Hi Seth, >>> >>> Sorry for the confusion. I mean currently we need to know the operator id, >>> state name and the state type (eg. ListState, MapState) beforehand to get >>> the states. Is possible that we can perform a scan to get all existing >>> operator ids or state names in the savepoint? It would be good to know what >>> states are in the savepoint before we get to a specific state. >>> >>> For example, if we analyze a savepoint created weeks ago, and the >>> corresponding job has been modified since that, say, moved from KafkaSink >>> to KinesisSink, so we are not sure whether we have the Kafka sink states or >>> the Kinesis sink states in the savepoint and might need to try twice to get >>> the right one. >>> >>> I’m not familiar with the savepoint formats, so pardon me if it’s a dumb >>> question. >>> >>> Best, >>> Paul Lam >>> >>> 在 2019年5月30日,11:09,Seth Wiesman <sjwies...@gmail.com> 写道: >>> >>> Hi Paul, >>> >>> I’m not following, could you provide and example of the kind of operation >>> your describing? >>> >>> Seth >>> >>> On May 29, 2019, at 7:37 PM, Paul Lam <paullin3...@gmail.com> wrote: >>> >>> Hi Seth, >>> >>> +1 from my side. >>> >>> I was wondering if we can add a reader method to provide a full view of >>> the states instead of the state of a specific operator? It would be helpful >>> when there is some unrestored states of a previously removed operator in >>> the savepoint. >>> >>> Best, >>> Paul Lam >>> >>> 在 2019年5月30日,09:55,vino yang <yanghua1...@gmail.com> 写道: >>> >>> Hi Seth, >>> >>> Glad to see this FLIP, big +1 for this feature! >>> >>> Best, >>> Vino >>> >>> Seth Wiesman <sjwies...@gmail.com> 于2019年5月30日周四 上午7:14写道: >>> >>> Hey Everyone! >>> >>> Gordon and I have been discussing adding a savepoint connector to flink >>> for reading, writing and modifying savepoints. >>> >>> This is useful for: >>> >>> Analyzing state for interesting patterns >>> Troubleshooting or auditing jobs by checking for discrepancies in state >>> Bootstrapping state for new applications >>> Modifying savepoints such as: >>> Changing max parallelism >>> Making breaking schema changes >>> Correcting invalid state >>> >>> We are looking forward to your feedback! >>> >>> This is the FLIP: >>> >>> >>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+Savepoint+Connector >>> >>> Seth >>> >>> >>> >>> >>> >>> > >