Re: [DISCUSS] FLIP-394: Add Metrics for Connector Agnostic Autoscaling

Rui Fan Fri, 17 Nov 2023 17:36:16 -0800

Thanks Mason for your feedback and update!

The sources you listed look good to me, +1 for this proposal!


Best,
Rui

On Sat, Nov 18, 2023 at 3:38 AM Mason Chen <mas.chen6...@gmail.com> wrote:

> Also, it looks like externalizing the Hive connector is unblocked based on
> the past email thread. https://issues.apache.org/jira/browse/FLINK-30064
> seems to have some progress and perhaps we shouldn't touch it for now.
>
> On Fri, Nov 17, 2023 at 11:00 AM Mason Chen <mas.chen6...@gmail.com>
> wrote:
>
> > Hi Rui and Max,
> >
> > Thanks for the feedback!
> >
> > If yes, I suggest this FLIP includes registering metric part, otherwise
> >> these metrics still cannot work.
> >
> > Yup, you understood it correctly. I'll add that to the required list of
> > work. Note that I'll include only FLIP-27 sources in the Flink repo:
> > FileSource, HybridSource, NumberSequenceSource, DataGeneratorSource, and
> > HiveSource.
> >
> > I think externalized sources could be implemented outside of this FLIP,
> as
> > other sources would have to wait for a Flink minor release and it
> wouldn't
> > make sense to track something like IcebergSource here since it isn't part
> > of the Flink project. KafkaSource is a special case since there is a lot
> of
> > usage and people use it also to verify Flink releases, so I'm open
> > to tracking that here too.
> >
> > In addition, after some further thought on the `setAssignedSplitsGauge`
> > metric, I think we need to track this in the SourceReaders. The reason is
> > that splits can be "completed" and this is only tracked by readers. There
> > are some poll based sources that only take 1 split and poll for another
> > when completed, but we cannot make that assumption in general (i.e.
> request
> > split when a split is completed). So, this needs to be tracked in the
> > reader.
> >
> > Best,
> > Mason
> >
> > On Fri, Nov 17, 2023 at 2:39 AM Maximilian Michels <m...@apache.org>
> wrote:
> >
> >> Hi Mason,
> >>
> >> Thank you for the proposal. This is a highly requested feature to make
> >> the source scaling of Flink Autoscaling generic across all sources.
> >> The current implementation handles every source individually, and if
> >> we don't find any backlog metrics, we default to using busy time only.
> >> At this point Kafka is the only supported source. We collect the
> >> backlog size (pending metrics), as well as the number of available
> >> splits / partitions.
> >>
> >> For Kafka, we always read from all splits but I like how for the
> >> generic interface we take note of both assigned and unassigned splits.
> >> This allows for more flexible integration with other sources where we
> >> might have additional splits we read from at a later point in time.
> >>
> >> Considering Rui's point, I agree it makes sense to outline the
> >> integration with existing sources. Other than that, +1 from my side
> >> for the proposal.
> >>
> >> Thanks,
> >> Max
> >>
> >> On Fri, Nov 17, 2023 at 4:06 AM Rui Fan <1996fan...@gmail.com> wrote:
> >> >
> >> > Hi Mason,
> >> >
> >> > Thank you for driving this proposal!
> >> >
> >> > Currently, Autoscaler only supports the maximum source parallelism
> >> > of KafkaSource. Introducing the generic metric to support it is good
> >> > to me, +1 for this proposal.
> >> >
> >> > I have a question:
> >> > You added the metric in the flink repo, and Autoscaler will fetch this
> >> > metric. But I didn't see any connector to register this metric.
> >> Currently,
> >> > only IteratorSourceEnumerator setUnassignedSplitsGauge,
> >> > and KafkaSource didn't register it. IIUC, if we don't do it,
> autoscaler
> >> > still cannot fetch this metric, right?
> >> >
> >> > If yes, I suggest this FLIP includes registering metric part,
> otherwise
> >> > these metrics still cannot work.
> >> >
> >> > Please correct me if I misunderstood anything, thanks~
> >> >
> >> > Best,
> >> > Rui
> >> >
> >> > On Fri, Nov 17, 2023 at 6:53 AM Mason Chen <mas.chen6...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I would like to start a discussion on FLIP-394: Add Metrics for
> >> Connector
> >> > > Agnostic Autoscaling [1].
> >> > >
> >> > > This FLIP recommends adding two metrics to make autoscaling work for
> >> > > bounded split source implementations like IcebergSource. These
> >> metrics are
> >> > > required by the Flink Kubernetes Operator autoscaler algorithm [2]
> to
> >> > > retrieve information for the backlog and the maximum source
> >> parallelism.
> >> > > The changes would affect the `@PublicEvolving`
> >> `SplitEnumeratorMetricGroup`
> >> > > API of the source connector framework.
> >> > >
> >> > > Best,
> >> > > Mason
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-394%3A+Add+Metrics+for+Connector+Agnostic+Autoscaling
> >> > > [2]
> >> > >
> >> > >
> >>
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/#limitations
> >> > >
> >>
> >
>

Re: [DISCUSS] FLIP-394: Add Metrics for Connector Agnostic Autoscaling

Reply via email to