So, if I'm not wrong, the right way to do this is using accumulators..what do you think about my proposal to add an easy way to add to a sink an accumulator for the written/outputed records?
On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler <ches...@apache.org> wrote: > Technically yes, a subset of metrics is stored in the ExecutionGraph when > the job finishes. (This is for example where the webUI derives the values > from for finished jobs). However these are on the task level, and will not > contain the number of incoming records if your sink is chained to another > operator. Changing this would be a larger endeavor, and tbh i don't see > this happening soon. > > I'm afraid for now you're stuck with the REST API for finished jobs. > (Correction for my previous mail: The metrics REST API cannot be used for > finished jobs) > > Alternatively, if you rather want to work on files/json you can enable job > archiving by configuring the jobmanager.archive.fs.dir directory. When > the job finishes this will contain a big JSON file for each job containing > all responses that the UI would return for finished jobs. > > > On 14.02.2018 12:50, Flavio Pompermaier wrote: > > The problem here is that I don't know the vertex id of the sink..would it > be possible to access the sink info by id? > And couldn't be all those info attached to the JobExecutionResult > (avoiding to set up all the rest connection etc)? > > On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler <ches...@apache.org> > wrote: > >> The only way to access this info from the client is the REST API >> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#details-of-a-running-or-completed-job> >> or the Metrics REST API >> <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/metrics.html#rest-api-integration>. >> >> >> >> On 14.02.2018 12:38, Flavio Pompermaier wrote: >> >> Actually I'd like to get this number from my Java class in order to >> update some external dataset "catalog", >> so I'm asking if there's some programmatic way to access this info >> (from JobExecutionResult for example). >> >> On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> Do you want to know how many records the sink received, or how many the >>> sink wrote to the DB? >>> If it's the first you're in luck because we measure that already, check >>> out the metrics documentation. >>> If it's the latter, then this issue is essentially covered by FLINK-7286 >>> which aims at allowing functions >>> to modify the numRecordsIn/numRecordsOut counts. >>> >>> >>> On 14.02.2018 12:22, Flavio Pompermaier wrote: >>> >>> Hi to all, >>> I have a (batch) job that writes to 1 or more sinks. >>> Is there a way to retrieve, once the job has terminated, the number of >>> records written to each sink? >>> Is there any better way than than using an accumulator for each sink? >>> If that is the only way to do that, the Sink API could be enriched in >>> order to automatically create an accumulator when required. E.g. >>> >>> dataset.output(JDBCOutputFormat.buildJDBCOutputFormat() >>> .setDrivername(...) >>> .setDBUrl(...) >>> .setQuery(...) >>> *.addRecordsCountAccumulator("some-name")* >>> .finish()) >>> >>> Best, >>> Flavio >>> >>> >>> >> >> >> -- >> Flavio Pompermaier >> Development Department >> >> OKKAM S.r.l. >> Tel. +(39) 0461 041809 <+39%200461%20041809> >> >> >> > > > -- > Flavio Pompermaier > Development Department > > OKKAM S.r.l. > Tel. +(39) 0461 041809 <+39%200461%20041809> > > > -- Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 041809 <+39%200461%20041809>