Dear Community,

I hope this email finds you well. I'd like to address an important
issue related to Apache Pulsar and discuss a solution I've proposed on
GitHub. The problem pertains to the handling of Chunk Messages after
enabling deduplication.

In the current version of Apache Pulsar, all chunks of a Chunk Message
share the same sequence ID. However, enabling the depublication
feature results in an inability to send Chunk Messages. To tackle this
problem, I've proposed a solution [1] that ensures messages are not
duplicated throughout end-to-end delivery. While this fix addresses
the duplication issue for end-to-end messages, there remains a
possibility of duplicate chunks within topics.

To address this concern, I believe we should introduce a "Chunk ID
map" at the Broker level, similar to the existing "sequence ID map",
to facilitate effective filtering. However, implementing this has led
to a challenge: a producer requires storage for two Long values
simultaneously (sequence ID and chunk ID). Because the snapshot of the
sequence ID map is stored through the properties of the cursor
(Map<String, Long>), so in order to satisfy the storage of two Longs
(sequence ID, chunk ID) corresponding to one producer, we hope to add
a mark DeleteProperties (Map<String, Long>) String, String>) to
replace the properties (Map<String, Long>) field. To resolve this,
I've proposed an alternative proposal [2] involving the introduction
of a "mark DeleteProperties" (Map<String, String>) to replace the
current properties (Map<String, Long>) field.

I'd appreciate it if you carefully review both PRs and share your
valuable feedback and insights. Thank you immensely for your time and
attention. I eagerly anticipate your valuable opinions and
recommendations.

Warm regards,
Xiangying

[1] https://github.com/apache/pulsar/pull/20948
[2] https://github.com/apache/pulsar/pull/21027

Reply via email to