[ https://issues.apache.org/jira/browse/FLINK-30413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yufan Sheng updated FLINK-30413: -------------------------------- Summary: Drop Shared and Key_Shared subscription support in Pulsar connector (was: Drop Share and Key_Shared subscription support in Pulsar connector) > Drop Shared and Key_Shared subscription support in Pulsar connector > ------------------------------------------------------------------- > > Key: FLINK-30413 > URL: https://issues.apache.org/jira/browse/FLINK-30413 > Project: Flink > Issue Type: Improvement > Components: Connectors / Pulsar > Affects Versions: 1.17.0 > Reporter: Yufan Sheng > Assignee: Yufan Sheng > Priority: Critical > Fix For: pulsar-4.0.0 > > > A lot of Pulsar connector test unstable issues are related to {{Shared}} and > {{Key_Shared}} subscription. Because this two subscription is designed to > consume the records in an unordered way. And we can support multiple > consumers in same topic partition. But this feature lead to some drawbacks in > connector. > 1. Performance > Flink is a true stream processor with high correctness support. But support > multiple consumer will require higher correctness which depends on Pulsar > transaction. But the internal implementation of Pulsar transaction on source > is record the message one by one and stores all the pending ack status in > client side. Which is slow and memory inefficient. > This means that we can only use {{Shared}} and {{Key_Shared}} on Flink with > low throughput. This against our intention to support these two subscription. > Because adding multiple consumer to same partition can increase the consuming > speed. > 2. Unstable > Pulsar transaction acknowledge the messages one by one in an internal > Pulsar's topic. But it's not stable enough to get it works. A lot of pending > issues in Flink JIRA are related to Pulsar transaction and we don't have any > workaround. > 3. Complex > Support {{Shared}} and {{Key_Shared}} subscription make the connector's code > more complex than we expect. We have to make every part of code into ordered > and unordered way. Which is hard to understand for the maintainer. > 4. Necessary > The current implementation on {{Shared}} and {{Key_Shared}} is completely > unusable to use in Production environment. For the user, this function is not > necessary. Because there is no bottleneck in consuming data from Pulsar, the > bottleneck is in processing the data, which we can achieve by increasing the > parallelism of the processing operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)