Re: Collecting operators real output cardinalities as json files

Francesco Ventura Wed, 27 May 2020 02:42:24 -0700

Hi Piotrek,

Thank you for you replay and for your suggestions. Just another doubt.
Does the usage of metrics reporter and custom metrics will affect the 
performances of the running jobs in term of execution time? Since I need the 
information about the exact netRunTime of each job maybe using the REST APIs to 
get the other information will be more reliable?


Thank you. Best,

Francesco

> Il giorno 25 mag 2020, alle ore 19:54, Piotr Nowojski <pi...@ververica.com> 
> ha scritto:
> 
> Hi Francesco,
> 
> Have you taken a look at the metrics? [1] And IO metrics [2] in particular? 
> You can use some of the pre-existing metric reporter [3] or implement a 
> custom one. You could export metrics to some 3rd party system, and get JSONs 
> from there, or export them to JSON directly via a custom metric reporter.
> 
> Piotrek
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html
>  
> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html>
> [2] 
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io
>  
> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#io>
> [3] 
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter
>  
> <https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#reporter>
> 
>> On 23 May 2020, at 11:31, Francesco Ventura 
>> <francesco.vent...@campus.tu-berlin.de 
>> <mailto:francesco.vent...@campus.tu-berlin.de>> wrote:
>> 
>> Hi everybody, 
>> 
>> I would like to collect the statistics and the real output cardinalities 
>> about the execution of many jobs as json files. I know that exist a REST 
>> interface that can be used but I was looking for something simpler. In 
>> practice, I would like to get the information showed in the WebUI at runtime 
>> about a job and store it as a file. I am using the env.getExecutionPlan() to 
>> get the execution plan of a job with the estimated cardinalities for each 
>> operator. However, it includes only estimated cardinalities and it can be 
>> used only before calling env.execute(). 
>> 
>> There is a similar way to extract the real output cardinalities of each 
>> pipeline after the execution? 
>> Is there a place where the Flink cluster stores the history of the 
>> information about executed jobs?
>> Developing a REST client to extract such information is the only way 
>> possible? 
>> 
>> I also would like to avoid adding counters to the job source code since I am 
>> monitoring the run time execution and I should avoid everything that can 
>> interfere.
>> 
>> Maybe is a trivial problem but I have a quick look around and I can not find 
>> the solution.
>> 
>> Thank you very much,
>> 
>> Francesco
>

Re: Collecting operators real output cardinalities as json files

Reply via email to