: Question about Spark best practice when counting records.
Hey Darin,
Record count metrics are coming in Spark 1.3. Can you wait until it is
released? Or do you need a solution in older versions of spark.
Kostas
On Friday, February 27, 2015, Darin McBeath wrote:
I have a fairly large Spark
Currently if you use accumulators inside actions (like foreach) you have
guarantee that, even if partition will be recalculated, the values will be
correct. Same thing does NOT apply to transformations and you can not
relay 100% on the values.
Pawel Szulc
pt., 27 lut 2015, 4:54 PM Darin McBeath
Hey Darin,
Record count metrics are coming in Spark 1.3. Can you wait until it is
released? Or do you need a solution in older versions of spark.
Kostas
On Friday, February 27, 2015, Darin McBeath
wrote:
> I have a fairly large Spark job where I'm essentially creating quite a few
> RDDs, do se