That's cool, just be aware that all you're affecting is the time
between commits, not overall correctness.
Good call on the iterator not draining the queue, I'll fix that.
On Sun, Oct 9, 2016 at 12:22 PM, Srikanth wrote:
> I'll probably add this behavior. It's a good balance between not having t
I'll probably add this behavior. It's a good balance between not having to
rely on another external system just for offset management and reducing
duplicates.
I was more worried about the underlying framework using the consumer in
parallel. Will watch out for concurrent mod exp.
BTW, the commitQue
People may be calling commit from listeners or who knows where. Point
is it's not thread safe. If it's really important to you, it should
be pretty straightforward for you to hack on it to allow it at your
own risk. There is a check for concurrent access in the consumer, so
worst case scenario y
If I call commit in foreachrdd at the end of a batch, is there still a
possibility of another thread using the same consumer? Assuming I've not
configured scheduler to run parallel jobs.
On Oct 8, 2016 8:39 PM, "Cody Koeninger" wrote:
> The underlying kafka consumer isn't thread safe. Calling t
The underlying kafka consumer isn't thread safe. Calling the actual
commit in compute means it's called in the same thread as the other
consumer calls.
Using kafka as an offset store only works with correctly with
idempotent datastore writes anyway, so the question of when the commit
happens shou