> 1. SequenceID: 0, ChunkID: 0
> 2. SequenceID: 0, ChunkID: 1
> 3. SequenceID: 0, ChunkID: 0
> 4. SequenceID: 0, ChunkID: 1
> 5. SequenceID: 0, ChunkID: 2
> For the existing behavior, the consumer assembles
> messages 3,4,5 into
> the original large message. But the changes brought
> about by this PIP
> will cause the consumer to use messages 1,2,5 for
> assembly. There is
> no guarantee that the producer will split the message
> in the same way
> twice before and after. For example, the producer's
> maxMessageSize may
> be different. This may cause the consumer to
> receive a corrupt
> message.

Good point.


Thanks
Yubiao Feng

On Wed, Aug 23, 2023 at 12:34 PM Zike Yang <z...@apache.org> wrote:

> Hi, xiangying,
>
> Thanks for your PIP.
>
> IIUC, this may change the existing behavior and may introduce
> inconsistencies.
> Suppose that we have a large message with 3 chunks. But the producer
> crashes and resends the message after sending the chunk-1. It will
> send a total of 5 messages to the Pulsar topic:
>
> 1. SequenceID: 0, ChunkID: 0
> 2. SequenceID: 0, ChunkID: 1
> 3. SequenceID: 0, ChunkID: 0   -> This message will be dropped
> 4. SequenceID: 0, ChunkID: 1    -> Will also be dropped
> 5. SequenceID: 0, ChunkID: 2    -> The last chunk of the message
>
> For the existing behavior, the consumer assembles messages 3,4,5 into
> the original large message. But the changes brought about by this PIP
> will cause the consumer to use messages 1,2,5 for assembly. There is
> no guarantee that the producer will split the message in the same way
> twice before and after. For example, the producer's maxMessageSize may
> be different. This may cause the consumer to receive a corrupt
> message.
>
> Also, this PIP increases the complexity of handling chunks on the
> broker side. Brokers should, in general, treat the chunk as a normal
> message.
>
> I think a simple better approach is to only check the deduplication
> for the last chunk of the large message. The consumer only gets the
> whole message after receiving the last chunk. We don't need to check
> the deduplication for all previous chunks. Also by doing this we only
> need bug fixes, we don't need to introduce a new PIP.
>
> BR,
> Zike Yang
>
> On Fri, Aug 18, 2023 at 7:54 PM Xiangying Meng <xiangy...@apache.org>
> wrote:
> >
> > Dear Community,
> >
> > I hope this email finds you well. I'd like to address an important
> > issue related to Apache Pulsar and discuss a solution I've proposed on
> > GitHub. The problem pertains to the handling of Chunk Messages after
> > enabling deduplication.
> >
> > In the current version of Apache Pulsar, all chunks of a Chunk Message
> > share the same sequence ID. However, enabling the depublication
> > feature results in an inability to send Chunk Messages. To tackle this
> > problem, I've proposed a solution [1] that ensures messages are not
> > duplicated throughout end-to-end delivery. While this fix addresses
> > the duplication issue for end-to-end messages, there remains a
> > possibility of duplicate chunks within topics.
> >
> > To address this concern, I believe we should introduce a "Chunk ID
> > map" at the Broker level, similar to the existing "sequence ID map",
> > to facilitate effective filtering. However, implementing this has led
> > to a challenge: a producer requires storage for two Long values
> > simultaneously (sequence ID and chunk ID). Because the snapshot of the
> > sequence ID map is stored through the properties of the cursor
> > (Map<String, Long>), so in order to satisfy the storage of two Longs
> > (sequence ID, chunk ID) corresponding to one producer, we hope to add
> > a mark DeleteProperties (Map<String, Long>) String, String>) to
> > replace the properties (Map<String, Long>) field. To resolve this,
> > I've proposed an alternative proposal [2] involving the introduction
> > of a "mark DeleteProperties" (Map<String, String>) to replace the
> > current properties (Map<String, Long>) field.
> >
> > I'd appreciate it if you carefully review both PRs and share your
> > valuable feedback and insights. Thank you immensely for your time and
> > attention. I eagerly anticipate your valuable opinions and
> > recommendations.
> >
> > Warm regards,
> > Xiangying
> >
> > [1] https://github.com/apache/pulsar/pull/20948
> > [2] https://github.com/apache/pulsar/pull/21027
>

Reply via email to