Re: using accumulators in (MicroBatch) InputPartitionReader

2021-03-07 Thread Jungtaek Lim
I'm not sure about the accumulator approach; one possible approach which might work (DISCLAIMER: a random thought) would be employing an RPC endpoint on the driver side which receives such information from executors and plays as a coordinator. Beware that Spark's RPC implementation is package priv

using accumulators in (MicroBatch) InputPartitionReader

2021-03-04 Thread kordex
I tried to create a data source, however our use case is bit hard as we do only know the available offsets within the tasks, not on the driver. I therefore planned to use accumulators in the InputPartitionReader but they seem not to work. Example accumulation is done here https://github.com/kortem