Hi, I believe this is a case where for the FileSystem (both Source and Sink) the metrics that are defined as part of FLIP-33 [1] have not been implemented yet. I've created a ticket for that [2].
Best regards, Martijn [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics [2] https://issues.apache.org/jira/browse/FLINK-28021 Op ma 13 jun. 2022 om 07:24 schreef Meghajit Mazumdar < meghajit.mazum...@gojek.com>: > Hi folks, > > Thanks for the reply. > We have implemented our own SplitAssigner, FileReaderFormat and > FileReaderFormat.Reader implementations. Hence, we plan to add custom > metrics such as these: > 1. No. of splits SplitAssigner is initialized with, number of splits > re-added back to the SplitAssigner > 2. Readers created per unit time > 3. Time taken to create a reader > 4. Time taken for the Reader to produce a single Row > 5. Readers closed per unit time > ... and some more > > However, since we haven't implemented our own FileSource or > SplitEnumerator, we don't have visibility into the metrics of these > components. We would ideally like to measure these: > 1. Number of rows emitted by the source per unit time > 2. Time taken by the enumerator to discover the splits > 3. Total splits discovered > > > Regards, > Meghajit > > > On Fri, Jun 10, 2022 at 10:04 PM Jing Ge <j...@ververica.com> wrote: > >> Hi meghajit, >> >> I think it makes sense to extend the current metrics. Could you list all >> metrics you need? Thanks! >> >> Best regards, >> Jing >> >> On Fri, Jun 10, 2022 at 5:06 PM Lijie Wang <wangdachui9...@gmail.com> >> wrote: >> >>> Hi Meghajit, >>> >>> As far as I know, currently, the FileSource does not have the metrics >>> you need. You can implement your own source, and register custom metrics >>> via `SplitEnumeratorContext#metricGroup` and >>> `SourceReaderContext#metricGroup`. >>> >>> Best, >>> Lijie >>> >>> Meghajit Mazumdar <meghajit.mazum...@gojek.com> 于2022年6月10日周五 16:36写道: >>> >>>> Hello, >>>> >>>> We are working on a Flink project which uses FileSource to discover and >>>> read Parquet Files from GCS. ( using Flink 1.14) >>>> >>>> As part of this, we wanted to implement some health metrics around the >>>> code. >>>> I wanted to know whether Flink gathers some metrics by itself around >>>> FileSource, e;g, number of files discovered by the SplitEnumerator, number >>>> of files added back to SplitAssigner, time taken to process per split, etc >>>> ? >>>> >>>> I checked in the official documentation >>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/filesystem/> >>>> but there doesn't appear to be. Is the solution then to implement >>>> custom metrics like this >>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/metrics/> >>>> ? >>>> >>>> >>>> *Regards,* >>>> *Meghajit* >>>> >>> > > -- > *Regards,* > *Meghajit* >