Re: # of Readers < MaxNumWorkers for UnboundedSource. Causing deadlocks

Jan Lukavský Wed, 25 Sep 2019 01:14:15 -0700

I have a feeling there has to be something going on wrong with yoursource. If I were you, I would probably do the following:

- verify that I never produce an UnboundedSource split with no queueassociated (Preconditions.checkState())

- make sure that I *really never* block in call to advance(), that isnot even when the queue is actually empty. You can have a look atKafkaUnboundedReader for inspiration, generally, you would probably wantto spawn a new thread to feed a local BlockingQueue (or somethingsimilar) in case you cannot do purely async IO (i.e. just check if youhave any data in network buffer in call to advance() and return falseotherwise)

Each blocking call in advance() is potential source of distributeddeadlocks which might be what you observe.


Jan

On 9/24/19 4:07 PM, Ken Barr wrote:

I might have to resort to this.  I am reading from queue in a timed manor and 
properly returning false when read timeout.  I have implemented a set of 
statistic to enumerate advance() behavior, number of times it successfully 
reads a message, fails to read a message etc.

On 2019/09/19 19:45:49, Jan Lukavský <je...@seznam.cz> wrote:

You can ssh to the dataflow worker and investigate jstack of the
harness. The principal source of blocking would be if you wait for any
data. The logic should be implemented so that if there is no data
available at the moment then just return false and don't wait for
anything. Another suggestion would be, focus on how your reader behaves
when it receives no queue. I think a proper behavior would be to return
false from each call to advance() and set watermark to
BoundedWindow.TIMESTAMP_MAX_VALUE to indicate that there will be no more
data anymore.

Jan

Re: # of Readers < MaxNumWorkers for UnboundedSource. Causing deadlocks

Reply via email to