Il giorno ven 29 lug 2022 alle ore 09:05 Haiting Jiang <jianghait...@apache.org> ha scritto: > > Hi Enrico, > > Any further suggestion on this PIP? > If not, I would like to raise a revote on this in a few days.
now the PIP LGTM I think that there are still some references to shadow_message_id but IIUC we don't need to add that field anymore because we are going to use optional MessageIdData message_id = 9; is this correct ? Enrico > > Thanks, > Haiting > > On 2022/07/07 11:30:59 Haiting Jiang wrote: > > Hi Enrico, > > > > Thanks for your feedback. > > > > On 2022/07/05 08:03:43 Enrico Olivelli wrote: > > > I have a couple of additional questions. > > > > > > 1. Security > > > What about security permissions about the shadow topic ? > > > We are reading from another topic. > > > I think we must clarify the decisions in the PIP > > > > As shadow topic is usually in another namespace, it would have its own > > independent permission settings, and we can configure different permissions > > for source topic and shadow topic. So there would be no guarantee that you > > are > > allowed to consume shadow topic if you have permission to consume source > > topic. > > > > On the other hand, we uses topic policy to store shadow topic settings, so a > > new policy permission item needs be added as PolicyName.SHADOW_TOPIC, and > > user > > must have PolicyOperation.WRITE to this policy to create/delete shadow > > topics. > > > > > > > > 2. Truncation and deletion > > > What happens when you truncate or delete the source topic ? > > > please add a paragraph on the PIP > > > > > > > 1. Truncation, from command `bin/pulsar-admin topics truncate source-topic`. > > For source topic truncation, nothing changes. It still move all cursors to > > the > > end of the topic and delete all inactive ledgers. > > As shadow topic will watch `ManagedLedgerInfo` in metadata store, once it > > knows ledgers deleted, all cursors will skip all deleted ledgers. > > > > 2. Deletion, from command `bin/pulsar-admin topics delete source-topic`. > > Like geo-replication, topic deletion is forbidden if topic have shadow > > replicators, users have to delete shadow topics first. Here is the new admin > > API for managing shadow topics with source topic in > > `org.apache.pulsar.client.admin.Topics` : > > ``` > > void createShadowTopic(String sourceTopicName, String shadowTopicName); > > void deleteShadowTopic(String sourceTopicName, String shadowTopicName); > > List<String> admin.topics().getShadowTopics(String sourceTopicName); > > > > //And their async version methods. > > ``` > > And this requires new REST interfaces in admin server, where > > ``` > > PATH = "/{tenant}/{namespace}/{topic}/shadowTopics"; > > METHOD = POST/DELETE/GET; > > ``` > > > > > 3. Offloaders > > > We are talking about BK metadata, how do Shadow Topics work with > > > Offloaded ledgers ? > > > Please clarify in the PIP > > > > Offloading a ledger is a kind of writing operation to topic's metadata, so > > shadow topic can't offload ledgers to other long term storage. However, for > > ledgers thats are already offloaded by source topic, it's expected to > > support > > reading from offload ledgers in shadow topic, just like read from source > > topic. > > > > The implementation depends on shadow topic watching `ManagedLedgerInfo` in > > metadata store, and if LedgerInfo.offloadContext is updated by source topic > > offloader, shadow topic can get fully information to get a readHandle from > > ledgerOffload. And of course, the pre-condition is the shadow topic must > > have > > the same offload driver settings. > > > > > > > > 4. Changes in the number of partitions > > > the PIP says that the number of partitions must match the source topic. > > > Are we preventing changes to the number of partitions in the source topic > > > ? > > > > > > > No, the updates on partition number will be synced to the shadow topic. > > A source topic or partition will be responsible for the creation and > > deletion > > of its corresponding shadow topic partitions. > > > > > 5. Topic stats > > > We should add information on the source topic and on the shadow topic. > > > Please clarify or draft your intentions in the PIP > > > > > > > For topic stats on source topic, as shadow replicator will reuse most of > > current > > PersistentReplicator, the ReplicatorStatsImpl also can be applied to shadow > > replicators. > > And we need to add a new field in `TopicStatsImpl` like geo-replication: > > ``` > > Map<String /*shadow topic name*/, ReplicatorStatsImpl> shadowReplication; > > ``` > > > > As for topic stats on shadow topic, previous `TopicStatsImpl` still applies. > > And I don't see any other stats need to be added at this point. > > > > > > > 6. GeoReplication > > > I guess that GeoReplication will not be possible for shadow topics. > > > Please clarify on the PIP > > > > > > > Yes, this is decided by the nature of shadow topic that it don't have the > > write access to BK. > > And I don't see the necessary of supporting GeoReplication for shadow > > topics. > > We can make source topic geo-replicated and create the same shadow topic in > > each clusters. > > > > > I believe that this feature is very powerful, but we must design it > > > carefully and discuss > > > about all the edge cases. Otherwise we will end up in something that > > > is half-baked > > > and we will have to resolve edge cases while developing or after going > > > to production. > > > > > > Every feature must be fully integrated with the rest of Pulsar > > > > > > Enrico > > > > > > Il giorno mer 29 giu 2022 alle ore 08:40 Haiting Jiang > > > <jianghait...@apache.org> ha scritto: > > > > > > > > Hi Penghui > > > > > > > > On 2022/06/29 04:07:35 PengHui Li wrote: > > > > > Hi Haiting, > > > > > > > > > > Thanks for the explanation. I'm clear for now. > > > > > > > > > > Pulsar functions also can do such things by connecting data from one > > > > > topic > > > > > to another topic. > > > > > But the difference is this proposal only copies the data to the cache > > > > > of > > > > > another topic, and the data not > > > > > in the cache is also available by reading from ledgers. > > > > > > > > > > And this approach also follows benefits compared with replicating > > > > > data to > > > > > multiple "real" topics. > > > > > > > > > > - reuse the topic metadata > > > > > - the same message ID which easy for troubleshooting > > > > > > > > > > Just one question > > > > > > > > > > >>>>>>> > > > > > ``` > > > > > message CommandSend { // ... // message id for shadow topic optional > > > > > MessageIdData shadow_message_id = 9; } > > > > > ``` > > > > > > > > > > Can we get the message ID from the replicated data to avoid > > > > > introducing a > > > > > new command? > > > > > Or use a marker message to avoid broker-to-broker directly protobuf > > > > > command > > > > > interaction. > > > > > > > > > Sorry for not wrote it clearly. CommandSend is not a new command. It's > > > > exactly the main > > > > command producer used to send message to broker. The only change is add > > > > a new field in it. > > > > The whole command proto would be like this: > > > > ``` > > > > message CommandSend { > > > > required uint64 producer_id = 1; > > > > required uint64 sequence_id = 2; > > > > optional int32 num_messages = 3 [default = 1]; > > > > optional uint64 txnid_least_bits = 4 [default = 0]; > > > > optional uint64 txnid_most_bits = 5 [default = 0]; > > > > > > > > /// Add highest sequence id to support batch message with external > > > > sequence id > > > > optional uint64 highest_sequence_id = 6 [default = 0]; > > > > optional bool is_chunk =7 [default = false]; > > > > > > > > // Specify if the message being published is a Pulsar marker or not > > > > optional bool marker = 8 [default = false]; > > > > > > > > // message id for shadow topic > > > > optional MessageIdData shadow_message_id = 9; > > > > } > > > > ``` > > > > So there won't be any broker-to-broker directly protobuf command > > > > interactions. > > > > > > > > Thanks, > > > > Haiting > > > > > > > > > Thanks, > > > > > Penghui > > > > > > > > > > On Wed, Jun 29, 2022 at 10:31 AM Haiting Jiang > > > > > <jianghait...@apache.org> > > > > > wrote: > > > > > > > > > > > Hi Penghui & Asaf: > > > > > > > > > > > > Please allow me to provide some more detailes about **metadata** > > > > > > synchronization > > > > > > between source topic and shadow topic. > > > > > > > > > > > > 1.When shadow topic initializes, it will read from metadata store > > > > > > path > > > > > > "/managed-ledgers/{source_topic_ledger_name}", which contains all > > > > > > the > > > > > > managed ledger info. We don't > > > > > > need to read the ledger information from source topic broker. > > > > > > > > > > > > 2. When shadow topic received new message from replicator, if the > > > > > > ledger > > > > > > id of the message > > > > > > is the same as the last ledger, it just updates the LAC. If not, > > > > > > it will > > > > > > update ledger list from metadata, > > > > > > and then open the new ledger handle and update the LAC. > > > > > > > > > > > > As for the copy itself and add shadow message id in CommandSend, it > > > > > > mostly > > > > > > serves the purpose > > > > > > of filling the EntryCache. > > > > > > > > > > > > Thanks, > > > > > > Haiting > > > > > > > > > > > > On 2022/06/23 02:08:46 PengHui Li wrote: > > > > > > > > One question comes to mind here: Why not simply read the ledger > > > > > > information > > > > > > > from original topic, without copy? > > > > > > > > > > > > > > I think this is a good idea. > > > > > > > > > > > > > > Penghui > > > > > > > On Jun 22, 2022, 23:57 +0800, dev@pulsar.apache.org, wrote: > > > > > > > > > > > > > > > > One question comes to mind here: Why not simply read the ledger > > > > > > information > > > > > > > > from original topic, without copy? > > > > > > > > > > > > > > > > > > > > > > > > > BR, > > Haiting > >