this is an awesome feature. > The name "Savepoint Connector" might indeed be not that good, as it doesn't point out the fact that with the current design, all kinds of snapshots (savepoint / full or incremental checkpoints) can be read.
@Gordon can you add the above clarification to the FLIP page? I was wondering if it supports (full or incremental) checkpoint. Here is one way how we can leverage this new feature. For monitoring lag/stuck problem, we have external service monitor committed offsets to Kafka broker. As you probably know, async commit Kafka offset is best-effort in notifyCheckpointComplete. We have run into Kafka scalability issue for parallelism jobs due to single coordinator. Alternative is to inspect the source of truth of Flink checkpoint/savepoint and extract offsets from Kafka source operator. On Thu, May 30, 2019 at 4:51 AM Paul Lam <paullin3...@gmail.com> wrote: > Hi, > > @Gordon @Seth > > Thanks a lot for your inputs! In general, I agree with you. The metadata > querying feature is a nice-to-have but not a must-have, and it’s reasonable > to make it as a follow up since it requires some extra work. > > Best, > Paul Lam > > > 在 2019年5月30日,19:22,Seth Wiesman <sjwies...@gmail.com> 写道: > > > > @Paul > > > > I agree with Gordon that those are useful features. The only thing I’d > like to add is that I don’t believe listing operator ids will be useful to > most users, they want to see UIDs which would also require changes to the > Savepoint metadata file. I think that would be a good follow up but outside > the scope of an initial implementation. > > > > Seth > > > >> On May 30, 2019, at 3:05 AM, Louis <xu_soft39211...@163.com> wrote: > >> > >> +1 from my size. > >> > >> I think it will be a good feature. > >> > >> Best > >> -- > >> Louis > >> Email:xu_soft39211...@163.com > >> > >>> On 30 May 2019, at 15:57, Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > >>> > >>> The name "Savepoint Connector" might indeed be not that good, as it > doesn't > >>> point out the fact that with the current design, all kinds of snapshots > >>> (savepoint / full or incremental checkpoints) can be read. > >>> > >>> @Paul > >>> That would be a very valid requirement. Querying the list of existing > >>> operator ids should be straight forward, as that information is in the > >>> snapshot metadata file. > >>> However, querying state names / state structure / state type would > >>> currently be impossible without also reading the state itself, as that > >>> information isn't globally available and can only be known when each > key > >>> group is being read. We could potentially make those information > available > >>> in the snapshot metadata file, but that would require more work. I > think > >>> that can be a next step once we have an initial version. > >>> > >>> Cheers, > >>> Gordon > >>> > >>>> On Thu, May 30, 2019 at 1:21 PM Paul Lam <paullin3...@gmail.com> > wrote: > >>>> > >>>> Hi Seth, > >>>> > >>>> Sorry for the confusion. I mean currently we need to know the > operator id, > >>>> state name and the state type (eg. ListState, MapState) beforehand to > get > >>>> the states. Is possible that we can perform a scan to get all existing > >>>> operator ids or state names in the savepoint? It would be good to > know what > >>>> states are in the savepoint before we get to a specific state. > >>>> > >>>> For example, if we analyze a savepoint created weeks ago, and the > >>>> corresponding job has been modified since that, say, moved from > KafkaSink > >>>> to KinesisSink, so we are not sure whether we have the Kafka sink > states or > >>>> the Kinesis sink states in the savepoint and might need to try twice > to get > >>>> the right one. > >>>> > >>>> I’m not familiar with the savepoint formats, so pardon me if it’s a > dumb > >>>> question. > >>>> > >>>> Best, > >>>> Paul Lam > >>>> > >>>> 在 2019年5月30日,11:09,Seth Wiesman <sjwies...@gmail.com> 写道: > >>>> > >>>> Hi Paul, > >>>> > >>>> I’m not following, could you provide and example of the kind of > operation > >>>> your describing? > >>>> > >>>> Seth > >>>> > >>>> On May 29, 2019, at 7:37 PM, Paul Lam <paullin3...@gmail.com> wrote: > >>>> > >>>> Hi Seth, > >>>> > >>>> +1 from my side. > >>>> > >>>> I was wondering if we can add a reader method to provide a full view > of > >>>> the states instead of the state of a specific operator? It would be > helpful > >>>> when there is some unrestored states of a previously removed operator > in > >>>> the savepoint. > >>>> > >>>> Best, > >>>> Paul Lam > >>>> > >>>> 在 2019年5月30日,09:55,vino yang <yanghua1...@gmail.com> 写道: > >>>> > >>>> Hi Seth, > >>>> > >>>> Glad to see this FLIP, big +1 for this feature! > >>>> > >>>> Best, > >>>> Vino > >>>> > >>>> Seth Wiesman <sjwies...@gmail.com> 于2019年5月30日周四 上午7:14写道: > >>>> > >>>> Hey Everyone! > >>>> > >>>> Gordon and I have been discussing adding a savepoint connector to > flink > >>>> for reading, writing and modifying savepoints. > >>>> > >>>> This is useful for: > >>>> > >>>> Analyzing state for interesting patterns > >>>> Troubleshooting or auditing jobs by checking for discrepancies in > state > >>>> Bootstrapping state for new applications > >>>> Modifying savepoints such as: > >>>> Changing max parallelism > >>>> Making breaking schema changes > >>>> Correcting invalid state > >>>> > >>>> We are looking forward to your feedback! > >>>> > >>>> This is the FLIP: > >>>> > >>>> > >>>> > >>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+Savepoint+Connector > >>>> > >>>> Seth > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >> > >> > >