Hi Dave,

On 2022/06/23 03:59:35 Dave Fisher wrote:
> 
> On Jun 21, 2022, at 1:00 AM, Haiting Jiang <jianghait...@apache.org> wrote:
> > 
> > Hi Pulsar community:
> > 
> > I open a pip to discuss "Shadow Topic, an alternative way to support 
> > readonly topic ownership."
> > 
> > Proposal Link: https://github.com/apache/pulsar/issues/16153
> > 
> > ---
> > 
> > ## Motivation
> > 
> > The motivation is the same as PIP-63[1], with a new broadcast use case of
> > supporting 100K subscriptions in a single topic.
> > 1. The bandwidth of a broker limits the number of subscriptions for a single
> >   topic.
> > 2. Subscriptions are competing for the network bandwidth on brokers. 
> > Different
> >   subscriptions might have different levels of severity.
> > 3. When synchronizing cross-city message reading, cross-city access needs to
> >   be minimized.
> > 4. [New] Broadcast with 100K subscriptions. There is a limitation of the
> >   subscription number of a single topic. It's tested by Hongjie from NTT Lab
> >   that with 40K subscriptions in a single topic, the client needs about 
> > 20min
> >   to start all client connections, and under 1 msg/s message producer rate,
> >   the average end to end latency is about 2.9s. And for 100K subscriptions,
> >   the time of start connection and E2E latency is beyond consideration.
> 
> Have you tested performance of two topics each with 40k subscriptions at the 
> same time in the same cluster?
> 
> I think that might simulate the notion of shadow topics in action and see if 
> much performance is actually gained by this notion of splitting.

I have not tested it yet. But as long as the bottle neck of this use case is 
not the metadata store, 
from the perspective of current architecture, the number of subscriptions 
pulsar can support can
be scaled horizontally. 

And also, the subscription limitation of one topic can be optimized, like 
Penghui did in github PR #16245, #16243,#16241.

> It seems to me that a better approach would be to have multiple local pulsar 
> clusters and balance the subscriptions between those.

With this approach, we have to replicate data storage. This is not tolerable 
for other use cases (like 1,2,3) when data flow is quite large.
And this is the reason why original PIP-63 dropped it as rejected alternatives, 
see 
https://github.com/apache/pulsar/wiki/PIP-63%3A-Readonly-Topic-Ownership-Support#rejected-alternatives

> I’m concerned that this shadow topic approach is adding new complexity to 
> Pulsar without a clear understanding of all of the impacts.
Yes, this is exactly the reason I prefer this new approach rather than split 
the original PR #11960 just for easier review.
This approach would be much more simpler and less impact on current 
implementation.  It would be appreciated if you can provide 
some more specific impacts.

Thanks,
Haiting

> Thanks,
> Dave
> 
> > 
> > However, it's too complicated to implement with original PIP-63 proposal, 
> > the
> > changed code is already over 3K+ lines, see PR#11960[2], and there are still
> > some problems left,
> > 1. The LAC in readonly topic is updated in a polling pattern, which 
> > increases
> >   the bookie load bookie.
> > 2. The message data of readonly topic won't be cached in broker. Increase 
> > the
> >   network usage between broker and bookie when there are more than one
> >   subscriber is tail-reading.
> > 3. All the subscriptions is managed in original writable-topic, so the 
> > support
> >   max subscription number is not scaleable.
> > 
> > This PIP tries to come up with a simpler solution to support readonly topic
> > ownership and solve the problems the previous PR left. The main idea of this
> > solution is to reuse the feature of geo-replication, but instead of
> > duplicating storage, it shares underlying bookie ledgers between different
> > topics.
> > 
> > ## Goal
> > 
> > The goal is to introduce **Shadow Topic** as a new type of topic to support
> > readonly topic ownership. Just as its name implies, a shadow topic is the
> > shadow of some normal persistent topic (let's call it source topic here). 
> > The
> > source topic and the shadow topic must have the same number of partitions or
> > both non-partitioned. Multiply shadow topics can be created from a source
> > topic.
> > 
> > Shadow topic shares the underlying bookie ledgers from its source topic. 
> > User
> > can't produce any messages to shadow topic directly and shadow topic don't
> > create any new ledger for messages, all messages in shadow topic come from
> > source topic.
> > 
> > Shadow topic have its own subscriptions and don't share with its source 
> > topic.
> > This means the shadow topic have its own cursor ledger to store persistent
> > mark-delete info for each persistent subscriptions.
> > 
> > The message sync procedure of shadow topic is supported by shadow 
> > replication,
> > which is very like geo-replication, with these difference:
> > 1. Geo-replication only works between topic with the same name in different
> >   broker clusters. But shadow topic have no naming limitation and they can 
> > be
> >   in the same cluster.
> > 2. Geo-replication duplicates data storage, but shadow topic don't.
> > 3. Geo-replication replicates data from each other, it's bidirectional, but
> >   shadow replication only have one way data flow.
> > 
> > 
> > ## API Changes
> > 
> > 1. PulsarApi.proto.
> > 
> > Shadow topic need to know the original message id of the replicated 
> > messages,
> > in order to update new ledger and lac. So we need add a `shadow_message_id` 
> > in
> > CommandSend for replicator.
> > 
> > ```
> > message CommandSend { // ... // message id for shadow topic optional
> >   MessageIdData shadow_message_id = 9; }
> > ```
> > 
> > 2. Admin API for creating shadow topic with source topic
> > ```
> >   admin.topics().createShadowTopic(source-topic-name, shadow-topic-name)
> > ```
> > 
> > ## Implementation
> > 
> > A picture showing key components relations is added in github issue [3].
> > 
> > There are two key changes for implementation.
> > 1. How to replicate messages to shadow topics.
> > 2. How shadow topic manage shared ledgers info.
> > 
> > ### 1. How to replicate messages to shadow topics. 
> > 
> > This part is mostly implemented by `ShadowReplicator`, which extends
> > `PersistentReplicator` introduced in geo-replication. The shadow topic list
> > is added as a new topic policy of the source topic. Source topic manage the
> > lifecycle of all the replicators. The key is to add `shadow_message_id` when
> > produce message to shadow topics.
> > 
> > ### 2. How shadow topic manage shared ledgers info. 
> > 
> > This part is mostly implemented by `ShadowManagedLedger`, which extends
> > current `ManagedLedgerImpl` with two key override methods.
> > 
> > 1. `initialize(..)`
> > a. Fetch ManagedLedgerInfo of source topic instead of current shadow topic.
> >   The source topic name is stored in the topic policy of the shadow topic.
> > b. Open the last ledger and read the explicit LAC from bookie, instead of
> >   creating new ledger. Reading LAC here requires that the source topic must
> >   enable explicit LAC feature by set `bookkeeperExplicitLacIntervalInMills`
> >   to non-zero value in broker.conf.
> > c. Do not start checkLedgerRollTask, which tries roll over ledger 
> > periodically
> > 
> > 2. `internalAsyncAddEntry()` Instead of write entry data to bookie, It only
> >   update metadata of ledgers, like `currentLedger`, `lastConfirmedEntry` and
> >   put the replicated message into `EntryCache`.
> > 
> > Besides, some other problems need to be taken care of.
> > - Any ledger metadata updates need to be synced to shadow topic, including
> >  ledger offloading or ledger deletion. Shadow topic needs to watch the 
> > ledger
> >  info updates with metadata store and update in time.
> > - The local cached LAC of `LedgerHandle` won't updated in time, so we need
> >  refresh LAC when a managed cursor requests entries beyond known LAC.
> > 
> > ## Reject Alternatives
> > 
> > See PIP-63[1].
> > 
> > ## Reference 
> > [1] 
> > https://github.com/apache/pulsar/wiki/PIP-63%3A-Readonly-Topic-Ownership-Support
> > [2] https://github.com/apache/pulsar/pull/11960 
> > [3] https://github.com/apache/pulsar/issues/16153
> > 
> > 
> > BR,
> > Haiting Jiang
> 
> 

Reply via email to