yuxiqian opened a new pull request, #3680: URL: https://github.com/apache/flink-cdc/pull/3680
This fixes schema operator hanging glitch under extreme parallelized pressure. --- Previously, `SchemaOperator`s will ask for schema evolve permission first, before sending `FlushEvent`s to downstream sinks. However, Flink regards `FlushEvent`s as normal data records and might block it to align checkpoint barriers. It might cause the following deadlock situation: * `SchemaOperator` A has obtained schema evolution permission * `SchemaOperator` B does not get the permission and hangs * `SchemaOperator` A sends a `FlushEvent`, but after a checkpoint barrier * `SchemaOperator` B received a checkpoint barrier after the schema change event (which is blocked) Now, neither A nor B can post any event records to downstream, and the entire job blocks with the following iconic error message (in TM): ``` 0> Schema Registry is busy now, waiting for next request... 1> Schema Registry is busy now, waiting for next request... 2> Schema Registry is busy now, waiting for next request... [Repeated lines] ``` --- This PR changes the schema evolution permission requesting workflow by: * `SchemaOperator`s emit `FlushEvent` immediately when they received a `SchemaChangeEvent`. * A schema change request could only be permitted when a) SchemaRegistry is IDLE and b) the requesting `SchemaOperator` has finished data flushing already. Since `FlushEvent` might be emitted from multiple `SchemaOperator`s simultaneously, a nonce value that is uniquely bound to a schema change event is added into `FlushEvent` payload. `WAITING_FOR_FLUSH` stage is no longer necessary since this state will not block the `SchemaRegistry` but one single `SchemaOperator` now. ``` -------- | IDLE | -------------------A -------- | ^ | C | \ v ------------ ------------ | FINISHED | <-- B -- | APPLYING | ------------ ------------ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org