Hi Haiting,

I've noticed Shadow Topic is *not* covered in the documentation.

Can you please add documentation for it?

It's such a great feature, it's almost a waste not having it documented.

Once it is, I can it to the proud list of features Pulsar have in this
lovely page we recently launched: https://pulsar.apache.org/features/


On Fri, Jul 29, 2022 at 12:29 PM Haiting Jiang <jianghait...@gmail.com>
wrote:

> > I think that there are still some references to shadow_message_id but
> > IIUC we don't need to add that field anymore because we are going to
> > use
>
>
> > optional MessageIdData message_id = 9;
>
>
> > is this correct ?
>
>
> Yes, no more `shadow_message_id`. Sorry I missed these references, already
> updated.
>
> Thanks,
> Haiting
>
> On Fri, Jul 29, 2022 at 5:15 PM Enrico Olivelli <eolive...@gmail.com>
> wrote:
>
> > Il giorno ven 29 lug 2022 alle ore 09:05 Haiting Jiang
> > <jianghait...@apache.org> ha scritto:
> > >
> > > Hi Enrico,
> > >
> > > Any further suggestion on this PIP?
> > > If not, I would like to raise a  revote on this in a few days.
> >
> > now the PIP LGTM
> >
> > I think that there are still some references to shadow_message_id but
> > IIUC we don't need to add that field anymore because we are going to
> > use
> >
> > optional MessageIdData message_id = 9;
> >
> > is this correct ?
> >
> > Enrico
> >
> > >
> > > Thanks,
> > > Haiting
> > >
> > > On 2022/07/07 11:30:59 Haiting Jiang wrote:
> > > > Hi Enrico,
> > > >
> > > > Thanks for your feedback.
> > > >
> > > > On 2022/07/05 08:03:43 Enrico Olivelli wrote:
> > > > > I have a couple of additional questions.
> > > > >
> > > > > 1. Security
> > > > > What about security permissions about the shadow topic ?
> > > > > We are reading from another topic.
> > > > > I think we must clarify the decisions in the PIP
> > > >
> > > > As shadow topic is usually in another namespace, it would have its
> own
> > > > independent permission settings, and we can configure different
> > permissions
> > > > for source topic and shadow topic. So there would be no guarantee
> that
> > you are
> > > > allowed to consume shadow topic if you have permission to consume
> > source
> > > > topic.
> > > >
> > > > On the other hand, we uses topic policy to store shadow topic
> > settings, so a
> > > > new policy permission item needs be added as PolicyName.SHADOW_TOPIC,
> > and user
> > > > must have PolicyOperation.WRITE to this policy to create/delete
> shadow
> > topics.
> > > >
> > > > >
> > > > > 2. Truncation and deletion
> > > > > What happens when you truncate or delete the source topic ?
> > > > > please add a paragraph on the PIP
> > > > >
> > > >
> > > > 1. Truncation, from command `bin/pulsar-admin topics truncate
> > source-topic`.
> > > > For source topic truncation, nothing changes. It still move all
> > cursors to the
> > > > end of the topic and delete all inactive ledgers.
> > > > As shadow topic will watch `ManagedLedgerInfo` in metadata store,
> once
> > it
> > > > knows ledgers deleted, all cursors will skip all deleted ledgers.
> > > >
> > > > 2. Deletion, from command `bin/pulsar-admin topics delete
> > source-topic`.
> > > > Like geo-replication, topic deletion is forbidden if topic have
> shadow
> > > > replicators, users have to delete shadow topics first. Here is the
> new
> > admin
> > > > API for managing shadow topics with source topic in
> > > > `org.apache.pulsar.client.admin.Topics` :
> > > > ```
> > > > void createShadowTopic(String sourceTopicName, String
> shadowTopicName);
> > > > void deleteShadowTopic(String sourceTopicName, String
> shadowTopicName);
> > > > List<String> admin.topics().getShadowTopics(String sourceTopicName);
> > > >
> > > > //And their async version methods.
> > > > ```
> > > > And this requires new REST interfaces in admin server, where
> > > > ```
> > > > PATH = "/{tenant}/{namespace}/{topic}/shadowTopics";
> > > > METHOD = POST/DELETE/GET;
> > > > ```
> > > >
> > > > > 3. Offloaders
> > > > > We are talking about BK metadata, how do Shadow Topics work with
> > > > > Offloaded ledgers ?
> > > > > Please clarify in the PIP
> > > >
> > > > Offloading a ledger is a kind of writing operation to topic's
> > metadata, so
> > > > shadow topic can't offload ledgers to other long term storage.
> > However, for
> > > > ledgers thats are already offloaded by source topic, it's expected to
> > support
> > > > reading from offload ledgers in shadow topic, just like read from
> > source
> > > > topic.
> > > >
> > > > The implementation depends on shadow topic watching
> > `ManagedLedgerInfo` in
> > > > metadata store, and if LedgerInfo.offloadContext is updated by source
> > topic
> > > > offloader, shadow topic can get fully information to get a readHandle
> > from
> > > > ledgerOffload. And of course, the pre-condition is the shadow topic
> > must have
> > > > the same offload driver settings.
> > > >
> > > > >
> > > > > 4. Changes in the number of partitions
> > > > > the PIP says that the number of partitions must match the source
> > topic.
> > > > > Are we preventing changes to the number of partitions in the source
> > topic ?
> > > > >
> > > >
> > > > No, the updates on partition number will be synced to the shadow
> topic.
> > > > A source topic or partition will be responsible for the creation and
> > deletion
> > > > of its corresponding shadow topic partitions.
> > > >
> > > > > 5. Topic stats
> > > > > We should add information on the source topic and on the shadow
> > topic.
> > > > > Please clarify or draft your intentions in the PIP
> > > > >
> > > >
> > > > For topic stats on source topic, as shadow replicator will reuse most
> > of current
> > > > PersistentReplicator, the ReplicatorStatsImpl also can be applied to
> > shadow replicators.
> > > > And we need to add a new field in `TopicStatsImpl` like
> > geo-replication:
> > > > ```
> > > > Map<String /*shadow topic name*/, ReplicatorStatsImpl>
> > shadowReplication;
> > > > ```
> > > >
> > > > As for topic stats on shadow topic, previous `TopicStatsImpl` still
> > applies.
> > > > And I don't see any other stats need to be added at this point.
> > > >
> > > >
> > > > > 6. GeoReplication
> > > > > I guess that GeoReplication will not be possible for shadow topics.
> > > > > Please clarify on the PIP
> > > > >
> > > >
> > > > Yes, this is decided by the nature of shadow topic that it don't have
> > the write access to BK.
> > > > And I don't see the necessary of supporting GeoReplication for shadow
> > topics.
> > > > We can make source topic geo-replicated and create the same shadow
> > topic in each clusters.
> > > >
> > > > > I believe that this feature is very powerful, but we must design it
> > > > > carefully and discuss
> > > > > about all the edge cases. Otherwise we will end up in something
> that
> > > > > is half-baked
> > > > > and we will have to resolve edge cases while developing or after
> > going
> > > > > to production.
> > > > >
> > > > > Every feature must be fully integrated with the rest of Pulsar
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang
> > > > > <jianghait...@apache.org> ha scritto:
> > > > > >
> > > > > > Hi Penghui
> > > > > >
> > > > > > On 2022/06/29 04:07:35 PengHui Li wrote:
> > > > > > > Hi Haiting,
> > > > > > >
> > > > > > > Thanks for the explanation. I'm clear for now.
> > > > > > >
> > > > > > > Pulsar functions also can do such things by connecting data
> from
> > one topic
> > > > > > > to another topic.
> > > > > > > But the difference is this proposal only copies the data to the
> > cache of
> > > > > > > another topic, and the data not
> > > > > > > in the cache is also available by reading from ledgers.
> > > > > > >
> > > > > > > And this approach also follows benefits compared with
> > replicating data to
> > > > > > > multiple "real" topics.
> > > > > > >
> > > > > > > - reuse the topic metadata
> > > > > > > - the same message ID which easy for troubleshooting
> > > > > > >
> > > > > > > Just one question
> > > > > > >
> > > > > > > >>>>>>>
> > > > > > > ```
> > > > > > > message CommandSend { // ... // message id for shadow topic
> > optional
> > > > > > >    MessageIdData shadow_message_id = 9; }
> > > > > > > ```
> > > > > > >
> > > > > > > Can we get the message ID from the replicated data to avoid
> > introducing a
> > > > > > > new command?
> > > > > > > Or use a marker message to avoid broker-to-broker directly
> > protobuf command
> > > > > > > interaction.
> > > > > > >
> > > > > > Sorry for not wrote it clearly. CommandSend is not a new command.
> > It's exactly the main
> > > > > > command producer used to send message to broker. The only change
> > is add a new field in it.
> > > > > > The whole command proto would be like this:
> > > > > > ```
> > > > > > message CommandSend {
> > > > > >     required uint64 producer_id = 1;
> > > > > >     required uint64 sequence_id = 2;
> > > > > >     optional int32 num_messages = 3 [default = 1];
> > > > > >     optional uint64 txnid_least_bits = 4 [default = 0];
> > > > > >     optional uint64 txnid_most_bits = 5 [default = 0];
> > > > > >
> > > > > >     /// Add highest sequence id to support batch message with
> > external sequence id
> > > > > >     optional uint64 highest_sequence_id = 6 [default = 0];
> > > > > >     optional bool is_chunk     =7 [default = false];
> > > > > >
> > > > > >     // Specify if the message being published is a Pulsar marker
> > or not
> > > > > >     optional bool marker = 8 [default = false];
> > > > > >
> > > > > >     // message id for shadow topic
> > > > > >     optional MessageIdData shadow_message_id = 9;
> > > > > > }
> > > > > > ```
> > > > > > So there won't be any broker-to-broker directly protobuf command
> > interactions.
> > > > > >
> > > > > > Thanks,
> > > > > > Haiting
> > > > > >
> > > > > > > Thanks,
> > > > > > > Penghui
> > > > > > >
> > > > > > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang <
> > jianghait...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Penghui & Asaf:
> > > > > > > >
> > > > > > > > Please allow me to provide some more detailes about
> > **metadata**
> > > > > > > > synchronization
> > > > > > > > between source topic and shadow topic.
> > > > > > > >
> > > > > > > > 1.When shadow topic initializes, it will read from metadata
> > store path
> > > > > > > > "/managed-ledgers/{source_topic_ledger_name}", which contains
> > all the
> > > > > > > > managed ledger info. We don't
> > > > > > > > need to read the  ledger information from source topic
> broker.
> > > > > > > >
> > > > > > > > 2. When shadow topic received new message from replicator, if
> > the ledger
> > > > > > > > id of the message
> > > > > > > >  is the same as the last ledger, it just updates the LAC. If
> > not, it will
> > > > > > > > update ledger list from metadata,
> > > > > > > > and then open the new ledger handle and update the LAC.
> > > > > > > >
> > > > > > > > As for the copy itself and add shadow message id in
> > CommandSend, it mostly
> > > > > > > > serves the purpose
> > > > > > > > of filling the EntryCache.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Haiting
> > > > > > > >
> > > > > > > > On 2022/06/23 02:08:46 PengHui Li wrote:
> > > > > > > > > > One question comes to mind here: Why not simply read the
> > ledger
> > > > > > > > information
> > > > > > > > > from original topic, without copy?
> > > > > > > > >
> > > > > > > > > I think this is a good idea.
> > > > > > > > >
> > > > > > > > > Penghui
> > > > > > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org,
> wrote:
> > > > > > > > > >
> > > > > > > > > > One question comes to mind here: Why not simply read the
> > ledger
> > > > > > > > information
> > > > > > > > > > from original topic, without copy?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > > > BR,
> > > > Haiting
> > > >
> >
>

Reply via email to