Re: using accumulators in (MicroBatch) InputPartitionReader

2021-03-07 Thread Jungtaek Lim
On Thu, Mar 4, 2021 at 11:41 PM kordex wrote: > I tried to create a data source, however our use case is bit hard as > we do only know the available offsets within the tasks, not on the > driver. I therefore planned to use accumulators in the > InputPartitionReader but they se

using accumulators in (MicroBatch) InputPartitionReader

2021-03-04 Thread kordex
I tried to create a data source, however our use case is bit hard as we do only know the available offsets within the tasks, not on the driver. I therefore planned to use accumulators in the InputPartitionReader but they seem not to work. Example accumulation is done here https://github.com

Re: Data Property Accumulators

2019-08-21 Thread Erik Erlandson
ms (to me) like it is trying to obey spark's RDD compute model, contrasted with legacy accumulators which subvert that model. I think the fact that your "option 3" is sending information about accumulators down through mapping function api, as well as passing through an Option"

Data Property Accumulators

2019-08-16 Thread Holden Karau
Are folks interested in seeing data property accumulators for RDDs? I made a proposal for this back in Spark 2016 ( https://docs.google.com/document/d/1lR_l1g3zMVctZXrcVjFusq2iQVpr4XvRK_UUDsDr6nk/edit ) but ABI compatibility was a stumbling block I couldn't design around. I can look at revivi

Re: Accumulators of Spark 1.x no longer work with Spark 2.x

2018-03-15 Thread Sergey Zhemzhitsky
12:55 AM, Sergey Zhemzhitsky wrote: > Hi there, > > I've noticed that accumulators of Spark 1.x no longer work with Spark > 2.x failing with > java.lang.AssertionError: assertion failed: copyAndReset must return a > zero value copy > > It happens while serializing an

Accumulators of Spark 1.x no longer work with Spark 2.x

2018-03-15 Thread Sergey Zhemzhitsky
Hi there, I've noticed that accumulators of Spark 1.x no longer work with Spark 2.x failing with java.lang.AssertionError: assertion failed: copyAndReset must return a zero value copy It happens while serializing an accumulator here [1] although copyAndReset returns zero-value copy for

Spark Accumulators

2017-12-03 Thread Tejeshwar J1
Hi all, I would like to read a directory containing 100 Files and increment the accumulator value by 1 whenever a file is read. Can anybody please help me out? Thanks, Tejeshwar

Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Rachana Srivastava
I have a very simple two lines program. I am getting input from Kafka and save the input in a file and counting the input received. My code looks like this, when I run this code I am getting two accumulator count for each input. HashMap kafkaParams = new HashMap(); kafkaParams.put("metadata.

Re: accumulators

2014-10-17 Thread Reynold Xin
It certainly makes sense for a single streaming job. But it is definitely non-trivial to make this useful to all Spark programs. If I were to have a long running SParkContext and submit a wide variety of jobs to it, this would make the list of accumulators very, very large. Maybe the solution

accumulators

2014-10-16 Thread Sean McNamara
Accumulators on the stage info page show the rolling life time value of accumulators as well as per task which is handy. I think it would be useful to add another field to the “Accumulators” table that also shows the total for the stage you are looking at (basically just a merge of the