Re: Collecting operators real output cardinalities as json files

Francesco Ventura Wed, 27 May 2020 08:17:14 -0700

Thank you very much for your explanation.
I will keep it in mind.

Best,


Francesco

> Il giorno 27 mag 2020, alle ore 15:43, Piotr Nowojski <pi...@ververica.com> 
> ha scritto:
> 
> Hi Francesco,
> 
> As long as you do not set update interval of metric reporter to some very low 
> value, there should be no visible performance degradation.
> 
> Maybe worth keeping in mind is that if you jobs are bounded (they are working 
> on bounded input and they finish/complete at some point of time), the last 
> updated metric value before job completes might not necessarily reflect the 
> end state of the job. This limitation may not apply if you will be using REST 
> API, as Job Manager might be remembering the values you are looking for.
> 
> Piotrek
> 
>> On 27 May 2020, at 11:41, Francesco Ventura 
>> <francesco.vent...@campus.tu-berlin.de 
>> <mailto:francesco.vent...@campus.tu-berlin.de>> wrote:
>> 
>> Hi Piotrek,
>> 
>> Thank you for you replay and for your suggestions. Just another doubt.
>> Does the usage of metrics reporter and custom metrics will affect the 
>> performances of the running jobs in term of execution time? Since I need the 
>> information about the exact netRunTime of each job maybe using the REST APIs 
>> to get the other information will be more reliable?
>> 
>> Thank you. Best,
>> 
>> Francesco
>> 
>>> Il giorno 25 mag 2020, alle ore 19:54, Piotr Nowojski <pi...@ververica.com 
>>> <mailto:pi...@ververica.com>> ha scritto:
>>> 
>>> Hi Francesco,
>>> 
>>> Have you taken a look at the metrics? [1] And IO metrics [2] in particular? 
>>> You can use some of the pre-existing metric reporter [3] or implement a 
>>> custom one. You could export metrics to some 3rd party system, and get 
>>> JSONs from there, or export them to JSON directly via a custom metric 
>>> reporter.
>>> 
>>> Piotrek
>>> 
>>> [1] 
>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html
>>>  
>>> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html>
>>> [2] 
>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io
>>>  
>>> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io>
>>> [3] 
>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter
>>>  
>>> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter>
>>> 
>>>> On 23 May 2020, at 11:31, Francesco Ventura 
>>>> <francesco.vent...@campus.tu-berlin.de 
>>>> <mailto:francesco.vent...@campus.tu-berlin.de>> wrote:
>>>> 
>>>> Hi everybody, 
>>>> 
>>>> I would like to collect the statistics and the real output cardinalities 
>>>> about the execution of many jobs as json files. I know that exist a REST 
>>>> interface that can be used but I was looking for something simpler. In 
>>>> practice, I would like to get the information showed in the WebUI at 
>>>> runtime about a job and store it as a file. I am using the 
>>>> env.getExecutionPlan() to get the execution plan of a job with the 
>>>> estimated cardinalities for each operator. However, it includes only 
>>>> estimated cardinalities and it can be used only before calling 
>>>> env.execute(). 
>>>> 
>>>> There is a similar way to extract the real output cardinalities of each 
>>>> pipeline after the execution? 
>>>> Is there a place where the Flink cluster stores the history of the 
>>>> information about executed jobs?
>>>> Developing a REST client to extract such information is the only way 
>>>> possible? 
>>>> 
>>>> I also would like to avoid adding counters to the job source code since I 
>>>> am monitoring the run time execution and I should avoid everything that 
>>>> can interfere.
>>>> 
>>>> Maybe is a trivial problem but I have a quick look around and I can not 
>>>> find the solution.
>>>> 
>>>> Thank you very much,
>>>> 
>>>> Francesco
>>> 
>> 
>

Re: Collecting operators real output cardinalities as json files

Reply via email to