pnowojski commented on PR #21690: URL: https://github.com/apache/flink/pull/21690#issuecomment-1386850790
> If it's ok, ReaderOutput#collect and RecordEmitter#emitRecord should return a boolean as well. And the loop will be moved from SourceOperator#emitNext to SourceReader#pollNext. I wouldn't go that far, at least not if strictly necessary. `SourceReader` is already part of the stable public API (`@Public`). Even if we could modify it without braking compatibility, it would make our future live more difficult, since we would have more complicated public API to support. And of course it would make the live of users implementing their sources more difficult as well. > can you explain what Java Doc we should use to explain the semantics of the returned boolean value? This should be simple: ``` Returns true if caller can call this method more times within a single invocation of the `PushingAsyncDataInput#emitNext` method. Otherwise returned value false means the caller should return from the `PushingAsyncDataInput#emitNext` invocation as soon as possible. ``` We do not need to explain anything about the multiple inputs, mailbox, record writer/state backend availability, etc. > Also note that SourceOperator#emitNext is still checking things like shouldWaitForAlignment at every step in order to use DataOutput as designated But waiting for alignment is a pure `SourceOperator`'s implementation detail. It has nothing to do with the output, or runtime availability. > Batch processing does not have this issue. I also plan to see if there is to further mitigate this overhead in the future. No, it doesn't. But batch doesn't care about single record latency and doesn't have checkpointing. Those two requirements combined, are The Cause why non blocking input/outputs/statebackend and mailbox exists in the first place. Without those requirements indeed I would also implement the code very differently. Just have a loop in the source that emits either a large batch of records or all records until it's finished, without any checks in the middle. But I agree something doesn't feel right and I have a feeling that the overall design of the control flow could be improved. Btw, +1 @1996fanrui for including the check of statebackend availability in this bug fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org