On Thu, Mar 4, 2021 at 11:41 PM kordex wrote:
> I tried to create a data source, however our use case is bit hard as
> we do only know the available offsets within the tasks, not on the
> driver. I therefore planned to use accumulators in the
> InputPartitionReader but they se
I tried to create a data source, however our use case is bit hard as
we do only know the available offsets within the tasks, not on the
driver. I therefore planned to use accumulators in the
InputPartitionReader but they seem not to work.
Example accumulation is done here
https://github.com
ms (to me) like it is
trying to obey spark's RDD compute model, contrasted with legacy
accumulators which subvert that model. I think the fact that your "option
3" is sending information about accumulators down through mapping function
api, as well as passing through an Option"
Are folks interested in seeing data property accumulators for RDDs? I made
a proposal for this back in Spark 2016 (
https://docs.google.com/document/d/1lR_l1g3zMVctZXrcVjFusq2iQVpr4XvRK_UUDsDr6nk/edit
) but
ABI compatibility was a stumbling block I couldn't design around. I can
look at revivi
12:55 AM, Sergey Zhemzhitsky wrote:
> Hi there,
>
> I've noticed that accumulators of Spark 1.x no longer work with Spark
> 2.x failing with
> java.lang.AssertionError: assertion failed: copyAndReset must return a
> zero value copy
>
> It happens while serializing an
Hi there,
I've noticed that accumulators of Spark 1.x no longer work with Spark
2.x failing with
java.lang.AssertionError: assertion failed: copyAndReset must return a
zero value copy
It happens while serializing an accumulator here [1] although
copyAndReset returns zero-value copy for
Hi all,
I would like to read a directory containing 100 Files and increment the
accumulator value by 1 whenever a file is read.
Can anybody please help me out?
Thanks,
Tejeshwar
I have a very simple two lines program. I am getting input from Kafka and save
the input in a file and counting the input received. My code looks like this,
when I run this code I am getting two accumulator count for each input.
HashMap kafkaParams = new HashMap();
kafkaParams.put("metadata.
It certainly makes sense for a single streaming job. But it is definitely
non-trivial to make this useful to all Spark programs. If I were to have a long
running SParkContext and submit a wide variety of jobs to it, this would make
the list of accumulators very, very large. Maybe the solution
Accumulators on the stage info page show the rolling life time value of
accumulators as well as per task which is handy. I think it would be useful to
add another field to the “Accumulators” table that also shows the total for the
stage you are looking at (basically just a merge of the
10 matches
Mail list logo