Thank you very much for your explanation. I will keep it in mind. Best,
Francesco > Il giorno 27 mag 2020, alle ore 15:43, Piotr Nowojski <pi...@ververica.com> > ha scritto: > > Hi Francesco, > > As long as you do not set update interval of metric reporter to some very low > value, there should be no visible performance degradation. > > Maybe worth keeping in mind is that if you jobs are bounded (they are working > on bounded input and they finish/complete at some point of time), the last > updated metric value before job completes might not necessarily reflect the > end state of the job. This limitation may not apply if you will be using REST > API, as Job Manager might be remembering the values you are looking for. > > Piotrek > >> On 27 May 2020, at 11:41, Francesco Ventura >> <francesco.vent...@campus.tu-berlin.de >> <mailto:francesco.vent...@campus.tu-berlin.de>> wrote: >> >> Hi Piotrek, >> >> Thank you for you replay and for your suggestions. Just another doubt. >> Does the usage of metrics reporter and custom metrics will affect the >> performances of the running jobs in term of execution time? Since I need the >> information about the exact netRunTime of each job maybe using the REST APIs >> to get the other information will be more reliable? >> >> Thank you. Best, >> >> Francesco >> >>> Il giorno 25 mag 2020, alle ore 19:54, Piotr Nowojski <pi...@ververica.com >>> <mailto:pi...@ververica.com>> ha scritto: >>> >>> Hi Francesco, >>> >>> Have you taken a look at the metrics? [1] And IO metrics [2] in particular? >>> You can use some of the pre-existing metric reporter [3] or implement a >>> custom one. You could export metrics to some 3rd party system, and get >>> JSONs from there, or export them to JSON directly via a custom metric >>> reporter. >>> >>> Piotrek >>> >>> [1] >>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html >>> >>> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html> >>> [2] >>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io >>> >>> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io> >>> [3] >>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter >>> >>> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter> >>> >>>> On 23 May 2020, at 11:31, Francesco Ventura >>>> <francesco.vent...@campus.tu-berlin.de >>>> <mailto:francesco.vent...@campus.tu-berlin.de>> wrote: >>>> >>>> Hi everybody, >>>> >>>> I would like to collect the statistics and the real output cardinalities >>>> about the execution of many jobs as json files. I know that exist a REST >>>> interface that can be used but I was looking for something simpler. In >>>> practice, I would like to get the information showed in the WebUI at >>>> runtime about a job and store it as a file. I am using the >>>> env.getExecutionPlan() to get the execution plan of a job with the >>>> estimated cardinalities for each operator. However, it includes only >>>> estimated cardinalities and it can be used only before calling >>>> env.execute(). >>>> >>>> There is a similar way to extract the real output cardinalities of each >>>> pipeline after the execution? >>>> Is there a place where the Flink cluster stores the history of the >>>> information about executed jobs? >>>> Developing a REST client to extract such information is the only way >>>> possible? >>>> >>>> I also would like to avoid adding counters to the job source code since I >>>> am monitoring the run time execution and I should avoid everything that >>>> can interfere. >>>> >>>> Maybe is a trivial problem but I have a quick look around and I can not >>>> find the solution. >>>> >>>> Thank you very much, >>>> >>>> Francesco >>> >> >