this is an awesome feature.

> The name "Savepoint Connector" might indeed be not that good, as it
doesn't
point out the fact that with the current design, all kinds of snapshots
(savepoint / full or incremental checkpoints) can be read.

@Gordon can you add the above clarification to the FLIP page? I was
wondering if it supports (full or incremental) checkpoint.

Here is one way how we can leverage this new feature. For monitoring
lag/stuck problem, we have external service monitor committed offsets to
Kafka broker. As you probably know, async commit Kafka offset is
best-effort  in notifyCheckpointComplete. We have run into Kafka
scalability issue for parallelism jobs due to single coordinator.
Alternative is to inspect the source of truth of Flink checkpoint/savepoint
and extract offsets from Kafka source operator.


On Thu, May 30, 2019 at 4:51 AM Paul Lam <paullin3...@gmail.com> wrote:

> Hi,
>
> @Gordon @Seth
>
> Thanks a lot for your inputs! In general, I agree with you. The metadata
> querying feature is a nice-to-have but not a must-have, and it’s reasonable
> to make it as a follow up since it requires some extra work.
>
> Best,
> Paul Lam
>
> > 在 2019年5月30日,19:22,Seth Wiesman <sjwies...@gmail.com> 写道:
> >
> > @Paul
> >
> > I agree with Gordon that those are useful features. The only thing I’d
> like to add is that I don’t believe listing operator ids will be useful to
> most users, they want to see UIDs which would also require changes to the
> Savepoint metadata file. I think that would be a good follow up but outside
> the scope of an initial implementation.
> >
> > Seth
> >
> >> On May 30, 2019, at 3:05 AM, Louis <xu_soft39211...@163.com> wrote:
> >>
> >> +1 from my size.
> >>
> >> I think it will be a good feature.
> >>
> >> Best
> >> --
> >> Louis
> >> Email:xu_soft39211...@163.com
> >>
> >>> On 30 May 2019, at 15:57, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> wrote:
> >>>
> >>> The name "Savepoint Connector" might indeed be not that good, as it
> doesn't
> >>> point out the fact that with the current design, all kinds of snapshots
> >>> (savepoint / full or incremental checkpoints) can be read.
> >>>
> >>> @Paul
> >>> That would be a very valid requirement. Querying the list of existing
> >>> operator ids should be straight forward, as that information is in the
> >>> snapshot metadata file.
> >>> However, querying state names / state structure / state type would
> >>> currently be impossible without also reading the state itself, as that
> >>> information isn't globally available and can only be known when each
> key
> >>> group is being read. We could potentially make those information
> available
> >>> in the snapshot metadata file, but that would require more work. I
> think
> >>> that can be a next step once we have an initial version.
> >>>
> >>> Cheers,
> >>> Gordon
> >>>
> >>>> On Thu, May 30, 2019 at 1:21 PM Paul Lam <paullin3...@gmail.com>
> wrote:
> >>>>
> >>>> Hi Seth,
> >>>>
> >>>> Sorry for the confusion. I mean currently we need to know the
> operator id,
> >>>> state name and the state type (eg. ListState, MapState) beforehand to
> get
> >>>> the states. Is possible that we can perform a scan to get all existing
> >>>> operator ids or state names in the savepoint? It would be good to
> know what
> >>>> states are in the savepoint before we get to a specific state.
> >>>>
> >>>> For example, if we analyze a savepoint created weeks ago, and the
> >>>> corresponding job has been modified since that, say, moved from
> KafkaSink
> >>>> to KinesisSink, so we are not sure whether we have the Kafka sink
> states or
> >>>> the Kinesis sink states in the savepoint and might need to try twice
> to get
> >>>> the right one.
> >>>>
> >>>> I’m not familiar with the savepoint formats, so pardon me if it’s a
> dumb
> >>>> question.
> >>>>
> >>>> Best,
> >>>> Paul Lam
> >>>>
> >>>> 在 2019年5月30日,11:09,Seth Wiesman <sjwies...@gmail.com> 写道:
> >>>>
> >>>> Hi Paul,
> >>>>
> >>>> I’m not following, could you provide and example of the kind of
> operation
> >>>> your describing?
> >>>>
> >>>> Seth
> >>>>
> >>>> On May 29, 2019, at 7:37 PM, Paul Lam <paullin3...@gmail.com> wrote:
> >>>>
> >>>> Hi Seth,
> >>>>
> >>>> +1 from my side.
> >>>>
> >>>> I was wondering if we can add a reader method to provide a full view
> of
> >>>> the states instead of the state of a specific operator? It would be
> helpful
> >>>> when there is some unrestored states of a previously removed operator
> in
> >>>> the savepoint.
> >>>>
> >>>> Best,
> >>>> Paul Lam
> >>>>
> >>>> 在 2019年5月30日,09:55,vino yang <yanghua1...@gmail.com> 写道:
> >>>>
> >>>> Hi Seth,
> >>>>
> >>>> Glad to see this FLIP, big +1 for this feature!
> >>>>
> >>>> Best,
> >>>> Vino
> >>>>
> >>>> Seth Wiesman <sjwies...@gmail.com> 于2019年5月30日周四 上午7:14写道:
> >>>>
> >>>> Hey Everyone!
> >>>> ​
> >>>> Gordon and I have been discussing adding a savepoint connector to
> flink
> >>>> for reading, writing and modifying savepoints.
> >>>> ​
> >>>> This is useful for:
> >>>> ​
> >>>> Analyzing state for interesting patterns
> >>>> Troubleshooting or auditing jobs by checking for discrepancies in
> state
> >>>> Bootstrapping state for new applications
> >>>> Modifying savepoints such as:
> >>>>   Changing max parallelism
> >>>>   Making breaking schema changes
> >>>>   Correcting invalid state
> >>>> ​
> >>>> We are looking forward to your feedback!
> >>>> ​
> >>>> This is the FLIP:
> >>>> ​
> >>>>
> >>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+Savepoint+Connector
> >>>>
> >>>> Seth
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to