Re: Retrieve written records of a sink after job

Flavio Pompermaier Wed, 14 Feb 2018 05:11:45 -0800

So, if I'm not wrong, the right way to do this is using accumulators..what
do you think about my proposal to add an easy way to add to a sink an
accumulator for the written/outputed records?


On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler <ches...@apache.org>
wrote:

> Technically yes, a subset of metrics is stored in the ExecutionGraph when
> the job finishes. (This is for example where the webUI derives the values
> from for finished jobs). However these are on the task level, and will not
> contain the number of incoming records if your sink is chained to another
> operator. Changing this would be a larger endeavor, and tbh i don't see
> this happening soon.
>
> I'm afraid for now you're stuck with the REST API for finished jobs.
> (Correction for my previous mail: The metrics REST API cannot be used for
> finished jobs)
>
> Alternatively, if you rather want to work on files/json you can enable job
> archiving by configuring the jobmanager.archive.fs.dir directory. When
> the job finishes this will contain a big JSON file for each job containing
> all responses that the UI would return for finished jobs.
>
>
> On 14.02.2018 12:50, Flavio Pompermaier wrote:
>
> The problem here is that I don't know the vertex id of the sink..would it
> be possible to access the sink info by id?
> And couldn't be all those info attached to the JobExecutionResult
> (avoiding to set up all the rest connection etc)?
>
> On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> The only way to access this info from the client is the REST API
>> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#details-of-a-running-or-completed-job>
>> or the Metrics REST API
>> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#rest-api-integration>.
>>
>>
>>
>> On 14.02.2018 12:38, Flavio Pompermaier wrote:
>>
>> Actually I'd like to get this number from my Java class in order to
>> update some external dataset "catalog",
>> so I'm asking if there's some programmatic way to access this info
>> (from JobExecutionResult for example).
>>
>> On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> Do you want to know how many records the sink received, or how many the
>>> sink wrote to the DB?
>>> If it's the first you're in luck because we measure that already, check
>>> out the metrics documentation.
>>> If it's the latter, then this issue is essentially covered by FLINK-7286
>>> which aims at allowing functions
>>> to modify the numRecordsIn/numRecordsOut counts.
>>>
>>>
>>> On 14.02.2018 12:22, Flavio Pompermaier wrote:
>>>
>>> Hi to all,
>>> I have a (batch) job that writes to 1 or more sinks.
>>> Is there a way to retrieve, once the job has terminated, the number of
>>> records written to each sink?
>>> Is there any better way than than using an accumulator for each sink?
>>> If that is the only way to do that, the Sink API could be enriched in
>>> order to automatically create an accumulator when required. E.g.
>>>
>>> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat()
>>>             .setDrivername(...)
>>>             .setDBUrl(...)
>>>             .setQuery(...)
>>>             *.addRecordsCountAccumulator("some-name")*
>>>             .finish())
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>>
>>
>>
>> --
>> Flavio Pompermaier
>> Development Department
>>
>> OKKAM S.r.l.
>> Tel. +(39) 0461 041809 <+39%200461%20041809>
>>
>>
>>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>
>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809 <+39%200461%20041809>

Re: Retrieve written records of a sink after job

Reply via email to