>> Could we add a system topic that has exactly one partition per broker?
> I think this depends on which type of topic we use for the events. > If it is > nonpersistent, I think this approach would work because the events wouldn't > outlive the broker. So, what happens to events in a nonpersistent topic when the broker goes down? How would data loss be prevented in that scenario? > However, if it is persistent, it would become > problematic when a broker stops running because the topic would then need > to be served by another broker in order to read from it. What's the concern here? When the topic is unloaded, another broker will pick up the data from the bookies. -- Devin G. Bost On Fri, Apr 23, 2021, 12:06 PM Michael Marshall <mikemars...@gmail.com> wrote: > Thanks for the clarification, Joe. I now see the nuance between the data > and admin paths. One way to possibly remove these updates from the data > flow is to make it a process that watches the metastore and sends metastore > changes to a topic. That would remove it completely from the data path. > However, there is still the problem of where that process gets scheduled > and how to ensure that it is collocated with the topic to which it is > publishing messages. Further, this paradigm could mean that some events > might get missed, and it would put additional load on the metastore. > > > What are you hoping to accomplish by knowing when a topic is > automatically > > created? > > I am looking to solve several problems. First, I want to discover new > topics so that I can create subscriptions for them. I use pulsar to buffer > data, so using a simple regex consumer that discovers topics, creates > subscriptions, and immediately consumes from those topics is not a viable > solution. Also, I have an external database that exposes topic stats joined > with relevant business metadata for each topic. I want to know when topics > are deleted so I can update the database appropriately. I want to avoid any > solution that requires polling pulsar's adin api. > > As Joe pointed out, there are many other potential cluster events that > could be useful. This PIP could be more general than my initial proposal. > > > Could we add a system topic that has exactly one partition per broker? > > I think this depends on which type of topic we use for the events. If it is > nonpersistent, I think this approach would work because the events wouldn't > outlive the broker. However, if it is persistent, it would become > problematic when a broker stops running because the topic would then need > to be served by another broker in order to read from it. > > One alternative might be to put a system topic partition in each namespace > bundle. Given that all topics exist within a bundle and bundles split but > don't join, it would guarantee that events would be local to the target > topic without needing to worry about joining event logs. This would require > a change to how topics are put into bundles though, as they are currently > assigned based on a hash of their name. This approach would only make sense > for events that are specific to a namespace, like topic/subscription > creation/deletion. > > We may need multiple strategies regarding topic placement for different > types of audit events. For example, some broker events are not namespaced, > and as such, they likely belong in the `pulsar/system` namespace. > Namespaced events would make sense in their source namespace, much like the > `__change_events` topic exists in each namespace where topic level policies > are allowed. > > Perhaps I should put together a Google doc for this proposal to make it > easier to collaborate on specific details. I can tell that there is > interest in this feature and that it will require a careful design. > > Thanks for all of your feedback, > Michael > > > On Fri, Apr 23, 2021 at 7:29 AM Jonathan Ellis <jbel...@gmail.com> wrote: > > > Could we add a system topic that has exactly one partition per broker? > > > > On Thu, Apr 22, 2021 at 11:22 PM Joe Francis > <j...@verizonmedia.com.invalid > > > > > wrote: > > > > > To be clear, I would love to have this feature. But I would not use > this > > > feature if that means whenever a broker that hosts a "system topic" > has > > a > > > hiccup, it would result in an outage for N other brokers. I run 100+ > > > brokers/million+ topics in a cluster (hence an "audit topic" would be > > > wonderful for all kinds of purposes), and would not want an "system > > topic" > > > as the single point of failure. > > > > > > So you have to make this log local to the broker, or sacrifice the > > > reliability of the log (best case log). Local log has its advantages - > > you > > > can log a lot more about the system itself into it, (eg: security > events > > > like failed auth etc), but you will need to provide an aggregate view > for > > > the cluster as a whole from all the brokers > > > > > > Joe > > > > > > > > > > > > > > > On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <j...@verizonmedia.com> > > wrote: > > > > > > > Completely disagree that we have accepted this risk with PIP-39. That > > is > > > > different because it is an admin flow. A failure in a namespace > policy > > > > change does not affect data flow. > > > > > > > > What you are proposing is in the data path. Topics and subs are > > > > created in the data flow path. Failure means outages. PIP-39 is not > > going > > > > to help you there. > > > > > > > > Joe > > > > > > > > On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall < > > mikemars...@gmail.com > > > > > > > > wrote: > > > > > > > >> Hi Joe, > > > >> > > > >> I agree there is a risk in adding more interdependencies between > > > brokers. > > > >> I > > > >> will point out that we have already accepted this risk with the > > > >> implementation of PIP 39, which propagates namespace policy changes > to > > > >> other brokers using messages sent to a system topic. However, that > > > doesn't > > > >> necessarily mean we should build more interdependencies between > > brokers. > > > >> > > > >> Here is the link to PIP 39: > > > >> > > > >> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e= > > > >> . > > > >> > > > >> I will look into the implementation of PIP 39 to better understand > its > > > >> design, as I think it will likely influence this feature's design. > > > >> > > > >> Thanks, > > > >> Michael > > > >> > > > >> On Wed, Apr 21, 2021 at 5:50 PM Joe F <joefranc...@gmail.com> > wrote: > > > >> > > > >> > I would be very careful about implementing such a feature, > because > > of > > > >> > introducing undesirable interdependencies. Broker processes only > > talk > > > >> to > > > >> > the metadata store or data store. This keeps brokers isolated from > > > each > > > >> > other - one broker is not dependent on the functioning of another > > > >> broker. > > > >> > > > > >> > A broker publishing to a topic hosted on another broker (which for > > eg: > > > >> is > > > >> > serving "system topic"), sets up an undesirable dependency, > which > > > >> reduces > > > >> > total system resiliency and availability for the cluster. These > are > > > >> better > > > >> > implemented as notifications off the metadata changes. > > > >> > > > > >> > Good feature, but needs careful thought to do it right > > > >> > Joe > > > >> > > > > >> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall < > > > mikemars...@gmail.com > > > >> > > > > >> > wrote: > > > >> > > > > >> > > Thanks for your response, PengHui. > > > >> > > > > > >> > > I think this feature would be useful to end users for cluster > > > >> management, > > > >> > > which is why I want to contribute a first class feature instead > of > > > >> > writing > > > >> > > my own plugin that would add little value to the community. > > > >> > > > > > >> > > > With the broker interceptor you can intercept all the REST API > > > >> request > > > >> > > and response, Pulsar commands between the broker and clients. > > > >> > > > > > >> > > Based on looking through the interceptor trait, I don't see a > way > > to > > > >> > > trigger events based on auto created/deleted topics. For > example, > > > >> when a > > > >> > > producer connects to a broker for a nonexistent topic (assuming > > auto > > > >> > topic > > > >> > > creation is allowed), a managed ledger, and thus a topic, is > > created > > > >> > > without ever interacting with that interceptor trait. The same > > > >> appears to > > > >> > > be true for garbage collected topics. I think we'll need more > than > > > >> this > > > >> > > interceptor to properly capture all cases where topics are > created > > > or > > > >> > > deleted. > > > >> > > > > > >> > > Regarding my reference to potential further work, it does appear > > > that > > > >> low > > > >> > > level auditing of connections and pulsar commands could be > covered > > > by > > > >> the > > > >> > > interceptor. However, it would still be on the end user to > > implement > > > >> such > > > >> > > functionality. > > > >> > > > > > >> > > Thanks, > > > >> > > Michael > > > >> > > > > > >> > > > > > >> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li < > > codelipeng...@gmail.com > > > > > > > >> > > wrote: > > > >> > > > > > >> > > > Hi Michael, > > > >> > > > > > > >> > > > Currently, Pulsar supports a pluginable Broker Interceptor, > you > > > can > > > >> > find > > > >> > > > it here > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e= > > > >> > > > > > > >> > > > With the broker interceptor you can intercept all the REST API > > > >> request > > > >> > > and > > > >> > > > response, Pulsar commands between the broker and clients. > > > >> > > > This can be used to audit the system events. > > > >> > > > > > > >> > > > Thanks, > > > >> > > > Penghui > > > >> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall < > > > >> > mikemars...@gmail.com > > > >> > > >, > > > >> > > > wrote: > > > >> > > > > Hello all, > > > >> > > > > > > > >> > > > > I would like to propose adding a new feature to Pulsar that > > will > > > >> > > require > > > >> > > > a > > > >> > > > > PIP. In addition to feedback on the proposed feature, I am > > > looking > > > >> > for > > > >> > > > > guidance on how to go about creating the PIP. Thanks for any > > > help > > > >> you > > > >> > > can > > > >> > > > > provide. > > > >> > > > > > > > >> > > > > I would like to add an optional system topic where topic > > > creation > > > >> and > > > >> > > > topic > > > >> > > > > deletion events are published. This feature will make it > > easier > > > to > > > >> > > > leverage > > > >> > > > > the auto topic creation and inactive topic deletion features > > by > > > >> > > > providing a > > > >> > > > > way for users to reactively discover changes to topics. The > > > >> largest > > > >> > > > benefit > > > >> > > > > is that users won't need to poll for these updates with an > > admin > > > >> > > client. > > > >> > > > > Instead, they will get them as messages. > > > >> > > > > > > > >> > > > > I looked to see if an equivalent feature already exists, > but I > > > >> don't > > > >> > > see > > > >> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl` > > > currently > > > >> > > polls > > > >> > > > > for all topics in a namespace and then does set operations > to > > > >> compute > > > >> > > the > > > >> > > > > "new" topics to which it should subscribe. This client > > > >> implementation > > > >> > > > could > > > >> > > > > possibly leverage the new feature. > > > >> > > > > > > > >> > > > > There are still details I need to work out, like how it will > > > work > > > >> for > > > >> > > > > partitioned vs unpartitioned topics and what kind of > > guarantees > > > we > > > >> > have > > > >> > > > > regarding messaging semantics (I think we'll want at least > > once > > > >> > message > > > >> > > > > delivery here). I plan to include these details in the PIP > > with > > > >> > > > discussions > > > >> > > > > about trade offs for different implementations. > > > >> > > > > > > > >> > > > > Does this feature sound helpful and reasonable to others? If > > so, > > > >> is > > > >> > the > > > >> > > > > next step to formally write a proposal in a Google Doc or to > > put > > > >> > > > together a > > > >> > > > > doc on the Pulsar GitHub Wiki? > > > >> > > > > > > > >> > > > > Related and/or future work to consider in this design: I can > > see > > > >> > adding > > > >> > > > > different system topics for these types of auditable system > > > >> events. > > > >> > We > > > >> > > > > currently rely on log lines as our primary way for end users > > to > > > >> audit > > > >> > > > > system events, e.g. a producer connecting to a broker or a > > > >> > subscription > > > >> > > > > getting created, but we could instead have topics that > > represent > > > >> > > streams > > > >> > > > of > > > >> > > > > these different kinds of events. A persistent topic could > make > > > >> these > > > >> > > > audit > > > >> > > > > events more durable and more structured which should lend > > > >> themselves > > > >> > to > > > >> > > > > being more easily analyzed. Further, users could choose to > > turn > > > >> > on/off > > > >> > > > > these audit events, perhaps at the broker or namespace > level, > > to > > > >> fit > > > >> > > > their > > > >> > > > > own needs. > > > >> > > > > > > > >> > > > > Let me know what you think and how I should proceed. > > > >> > > > > > > > >> > > > > Regards, > > > >> > > > > Michael Marshall > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > -- > > Jonathan Ellis > > co-founder, http://www.datastax.com > > @spyced > > >