JingsongLi commented on pull request #18412: URL: https://github.com/apache/flink/pull/18412#issuecomment-1020739383
> From a Kafka connector side, the subscriber is not updated on every record, and just when the KafkaProducer is flushed it is only updated for a bulk of records (either during a checkpoint or if the internal buffer size is reached). I would definitely prefer to handle the consumer in the mailbox and not by the Kafka thread. The Kafka threads might have surprising effects on the overall pipeline stability i.e. the shutdown is blocked because the producer cannot be stopped because it is executing the metadata consumer. Thanks for the information. I share my concern here: - Performance problem: If we put every meta into mailbox, this performance will be very poor, you can look at `TaskMailboxImpl.put`, every data is locked will lead to very poor throughput. - Consistency issues: for example, the data flushed at the time of transaction commit, then re-stuff their meta into the mailbox, these data belong to the next checkpoint, no longer the current checkpoint, and TableStore wants the metas of the current checkpoint. - Block problem: I think from the protocol Callback similar interfaces should be executed quickly, we can add comments, you can look at `org.apache.kafka.clients.producer.Callback`, it is also required without very heavy logic. > Regarding adding a method to the InitContext I think that is okay. Do you think there will be ever multiple Subscribers? Maybe it is safer to already add a list instead of an optional. I think we can let the implementer assemble it himself, if there is a need for list. > I am still a bit surprised that the TableStoreSink reads all metadata offsets nevertheless if they are committed in Kafka or not. `TableStoreSink` will only read these offsets at preSnapshot time, which is used to synchronize full (file) and incremental (log) data, the records must be flushed at this time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org