subject:"Dataset rowCount accumulator"

Re: Dataset rowCount accumulator

2019-02-04 Thread Flavio Pompermaier

Thinking about it I came up that adding a map function after the read is probably more general. Is there any "significant" difference in terms of performance in using such dedicated map function (that just reads a row, increment an accumulator and returns immediately) vs adding this accumulator dir

Dataset rowCount accumulator

2019-02-04 Thread Flavio Pompermaier

Hi to all, we often need to track the number of rows of a dataset. In order to burden on the job complexitye we use accumulators to track this information. The problem is that we have to extends all InputFormats that we use in order to properly handle such row-count accumulator...my question is: wh