I'm wondering whether keeping track of accumulation in "consistent mode" is like a case for mapping straight to the Try value, so parsedData has type RDD[Try[...]], and counting failures is parsedData.filter(_.isFailure).count, etc
Put another way: Consistent mode accumulation seems (to me) like it is trying to obey spark's RDD compute model, contrasted with legacy accumulators which subvert that model. I think the fact that your "option 3" is sending information about accumulators down through mapping function api, as well as passing through an Option" stage, is also hinting at that idea. That might mean the idiomatic way to do consistent mode is via the existing spark API, and using constructs like Try, Either, Option, Tuple, or just a new column carrying additional accumulator channels. On Fri, Aug 16, 2019 at 5:48 PM Holden Karau <hol...@pigscanfly.ca> wrote: > Are folks interested in seeing data property accumulators for RDDs? I made > a proposal for this back in Spark 2016 ( > https://docs.google.com/document/d/1lR_l1g3zMVctZXrcVjFusq2iQVpr4XvRK_UUDsDr6nk/edit > ) but > ABI compatibility was a stumbling block I couldn't design around. I can > look at reviving it for Spark 3 or just go ahead and close out this idea. > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >