Thinking about it I came up that adding a map function after the read is
probably more general.
Is there any "significant" difference in terms of performance in using such
dedicated map function (that just reads a row, increment an accumulator and
returns immediately) vs adding this accumulator dir
Hi to all,
we often need to track the number of rows of a dataset.
In order to burden on the job complexitye we use accumulators to track this
information.
The problem is that we have to extends all InputFormats that we use in
order to properly handle such row-count accumulator...my question is: wh