[GitHub] [flink] pnowojski commented on pull request #21690: [FLINK-30623][runtime][bug] The canEmitBatchOfRecords should check recordWriter and changelogWriterAvailabilityProvider are available

GitBox Wed, 18 Jan 2023 02:47:50 -0800


pnowojski commented on PR #21690:
URL: https://github.com/apache/flink/pull/21690#issuecomment-1386850790


   > If it's ok, ReaderOutput#collect and RecordEmitter#emitRecord should 
return a boolean as well. And the loop will be moved from 
SourceOperator#emitNext to SourceReader#pollNext.
   
   I wouldn't go that far, at least not if strictly necessary. `SourceReader` 
is already part of the stable public API (`@Public`). Even if we could modify 
it without braking compatibility, it would make our future live more difficult, 
since we would have more complicated public API to support. And of course it 
would make the live of users implementing their sources more difficult as well.
   
   > can you explain what Java Doc we should use to explain the semantics of 
the returned boolean value?
   
   This should be simple:
   ```
   Returns true if caller can call this method more times within a single 
invocation of the `PushingAsyncDataInput#emitNext` method. Otherwise returned 
value false means the caller should return from the 
`PushingAsyncDataInput#emitNext` invocation as soon as possible.
   ```
   We do not need to explain anything about the multiple inputs, mailbox, 
record writer/state backend availability, etc. 
   
   >  Also note that SourceOperator#emitNext is still checking things like 
shouldWaitForAlignment at every step in order to use DataOutput as designated
   
   But waiting for alignment is a pure `SourceOperator`'s implementation 
detail. It has nothing to do with the output, or runtime availability.
   
   > Batch processing does not have this issue. I also plan to see if there is 
to further mitigate this overhead in the future.
   
   No, it doesn't. But batch doesn't care about single record latency and 
doesn't have checkpointing. Those two requirements combined, are The Cause why 
non blocking input/outputs/statebackend and mailbox exists in the first place. 
Without those requirements indeed I would also implement the code very 
differently. Just have a loop in the source that emits either a large batch of 
records or all records until it's finished, without any checks in the middle.
   
   But I agree something doesn't feel right and I have a feeling that the 
overall design of the control flow could be improved. 
   
   Btw, +1 @1996fanrui  for including the check of statebackend availability in 
this bug fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] pnowojski commented on pull request #21690: [FLINK-30623][runtime][bug] The canEmitBatchOfRecords should check recordWriter and changelogWriterAvailabilityProvider are available

Reply via email to