Re: [DISCUSS] FLIP-394: Add Metrics for Connector Agnostic Autoscaling

Mason Chen Fri, 17 Nov 2023 11:01:27 -0800

Hi Rui and Max,

Thanks for the feedback!


If yes, I suggest this FLIP includes registering metric part, otherwise
> these metrics still cannot work.

Yup, you understood it correctly. I'll add that to the required list of
work. Note that I'll include only FLIP-27 sources in the Flink repo:
FileSource, HybridSource, NumberSequenceSource, DataGeneratorSource, and
HiveSource.

I think externalized sources could be implemented outside of this FLIP, as
other sources would have to wait for a Flink minor release and it wouldn't
make sense to track something like IcebergSource here since it isn't part
of the Flink project. KafkaSource is a special case since there is a lot of
usage and people use it also to verify Flink releases, so I'm open
to tracking that here too.

In addition, after some further thought on the `setAssignedSplitsGauge`
metric, I think we need to track this in the SourceReaders. The reason is
that splits can be "completed" and this is only tracked by readers. There
are some poll based sources that only take 1 split and poll for another
when completed, but we cannot make that assumption in general (i.e. request
split when a split is completed). So, this needs to be tracked in the
reader.

Best,
Mason

On Fri, Nov 17, 2023 at 2:39 AM Maximilian Michels <m...@apache.org> wrote:

> Hi Mason,
>
> Thank you for the proposal. This is a highly requested feature to make
> the source scaling of Flink Autoscaling generic across all sources.
> The current implementation handles every source individually, and if
> we don't find any backlog metrics, we default to using busy time only.
> At this point Kafka is the only supported source. We collect the
> backlog size (pending metrics), as well as the number of available
> splits / partitions.
>
> For Kafka, we always read from all splits but I like how for the
> generic interface we take note of both assigned and unassigned splits.
> This allows for more flexible integration with other sources where we
> might have additional splits we read from at a later point in time.
>
> Considering Rui's point, I agree it makes sense to outline the
> integration with existing sources. Other than that, +1 from my side
> for the proposal.
>
> Thanks,
> Max
>
> On Fri, Nov 17, 2023 at 4:06 AM Rui Fan <1996fan...@gmail.com> wrote:
> >
> > Hi Mason,
> >
> > Thank you for driving this proposal!
> >
> > Currently, Autoscaler only supports the maximum source parallelism
> > of KafkaSource. Introducing the generic metric to support it is good
> > to me, +1 for this proposal.
> >
> > I have a question:
> > You added the metric in the flink repo, and Autoscaler will fetch this
> > metric. But I didn't see any connector to register this metric.
> Currently,
> > only IteratorSourceEnumerator setUnassignedSplitsGauge,
> > and KafkaSource didn't register it. IIUC, if we don't do it, autoscaler
> > still cannot fetch this metric, right?
> >
> > If yes, I suggest this FLIP includes registering metric part, otherwise
> > these metrics still cannot work.
> >
> > Please correct me if I misunderstood anything, thanks~
> >
> > Best,
> > Rui
> >
> > On Fri, Nov 17, 2023 at 6:53 AM Mason Chen <mas.chen6...@gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > I would like to start a discussion on FLIP-394: Add Metrics for
> Connector
> > > Agnostic Autoscaling [1].
> > >
> > > This FLIP recommends adding two metrics to make autoscaling work for
> > > bounded split source implementations like IcebergSource. These metrics
> are
> > > required by the Flink Kubernetes Operator autoscaler algorithm [2] to
> > > retrieve information for the backlog and the maximum source
> parallelism.
> > > The changes would affect the `@PublicEvolving`
> `SplitEnumeratorMetricGroup`
> > > API of the source connector framework.
> > >
> > > Best,
> > > Mason
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-394%3A+Add+Metrics+for+Connector+Agnostic+Autoscaling
> > > [2]
> > >
> > >
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/#limitations
> > >
>

Re: [DISCUSS] FLIP-394: Add Metrics for Connector Agnostic Autoscaling

Reply via email to