>> Could we add a system topic that has exactly one partition per broker?

> I think this depends on which type of topic we use for the events.
> If it is
> nonpersistent, I think this approach would work because the events
wouldn't
> outlive the broker.

So, what happens to events in a nonpersistent topic when the broker goes
down? How would data loss be prevented in that scenario?

> However, if it is persistent, it would become
> problematic when a broker stops running because the topic would then need
> to be served by another broker in order to read from it.

What's the concern here? When the topic is unloaded, another broker will
pick up the data from the bookies.

--
Devin G. Bost

On Fri, Apr 23, 2021, 12:06 PM Michael Marshall <mikemars...@gmail.com>
wrote:

> Thanks for the clarification, Joe. I now see the nuance between the data
> and admin paths. One way to possibly remove these updates from the data
> flow is to make it a process that watches the metastore and sends metastore
> changes to a topic. That would remove it completely from the data path.
> However, there is still the problem of where that process gets scheduled
> and how to ensure that it is collocated with the topic to which it is
> publishing messages. Further, this paradigm could mean that some events
> might get missed, and it would put additional load on the metastore.
>
> > What are you hoping to accomplish by knowing when a topic is
> automatically
> > created?
>
> I am looking to solve several problems. First, I want to discover new
> topics so that I can create subscriptions for them. I use pulsar to buffer
> data, so using a simple regex consumer that discovers topics, creates
> subscriptions, and immediately consumes from those topics is not a viable
> solution. Also, I have an external database that exposes topic stats joined
> with relevant business metadata for each topic. I want to know when topics
> are deleted so I can update the database appropriately. I want to avoid any
> solution that requires polling pulsar's adin api.
>
> As Joe pointed out, there are many other potential cluster events that
> could be useful. This PIP could be more general than my initial proposal.
>
> > Could we add a system topic that has exactly one partition per broker?
>
> I think this depends on which type of topic we use for the events. If it is
> nonpersistent, I think this approach would work because the events wouldn't
> outlive the broker. However, if it is persistent, it would become
> problematic when a broker stops running because the topic would then need
> to be served by another broker in order to read from it.
>
> One alternative might be to put a system topic partition in each namespace
> bundle. Given that all topics exist within a bundle and bundles split but
> don't join, it would guarantee that events would be local to the target
> topic without needing to worry about joining event logs. This would require
> a change to how topics are put into bundles though, as they are currently
> assigned based on a hash of their name. This approach would only make sense
> for events that are specific to a namespace, like topic/subscription
> creation/deletion.
>
> We may need multiple strategies regarding topic placement for different
> types of audit events. For example, some broker events are not namespaced,
> and as such, they likely belong in the `pulsar/system` namespace.
> Namespaced events would make sense in their source namespace, much like the
> `__change_events` topic exists in each namespace where topic level policies
> are allowed.
>
> Perhaps I should put together a Google doc for this proposal to make it
> easier to collaborate on specific details. I can tell that there is
> interest in this feature and that it will require a careful design.
>
> Thanks for all of your feedback,
> Michael
>
>
> On Fri, Apr 23, 2021 at 7:29 AM Jonathan Ellis <jbel...@gmail.com> wrote:
>
> > Could we add a system topic that has exactly one partition per broker?
> >
> > On Thu, Apr 22, 2021 at 11:22 PM Joe Francis
> <j...@verizonmedia.com.invalid
> > >
> > wrote:
> >
> > > To be clear, I would love to have this feature. But I would not use
> this
> > > feature if that means whenever a  broker that hosts a "system topic"
> has
> > a
> > > hiccup, it would  result in an outage for N other brokers. I run 100+
> > > brokers/million+  topics in a cluster (hence an "audit topic" would be
> > > wonderful for all kinds of purposes), and would not want an "system
> > topic"
> > > as the single point of failure.
> > >
> > > So you have to make this log local to the broker, or sacrifice the
> > > reliability of the log (best case log).  Local log has its advantages -
> > you
> > > can log a lot more about the system itself into it, (eg: security
> events
> > > like failed auth etc), but you will need to provide an aggregate view
> for
> > > the cluster as a whole from all the brokers
> > >
> > > Joe
> > >
> > >
> > >
> > >
> > > On Thu, Apr 22, 2021 at 6:10 AM Joe Francis <j...@verizonmedia.com>
> > wrote:
> > >
> > > > Completely disagree that we have accepted this risk with PIP-39. That
> > is
> > > > different because it is an admin flow. A failure in a namespace
> policy
> > > > change does not affect data flow.
> > > >
> > > >  What you are proposing  is in the data path. Topics and subs are
> > > > created in the data flow path. Failure means outages. PIP-39 is not
> > going
> > > > to help you there.
> > > >
> > > > Joe
> > > >
> > > > On Wed, Apr 21, 2021 at 11:10 PM Michael Marshall <
> > mikemars...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Hi Joe,
> > > >>
> > > >> I agree there is a risk in adding more interdependencies between
> > > brokers.
> > > >> I
> > > >> will point out that we have already accepted this risk with the
> > > >> implementation of PIP 39, which propagates namespace policy changes
> to
> > > >> other brokers using messages sent to a system topic. However, that
> > > doesn't
> > > >> necessarily mean we should build more interdependencies between
> > brokers.
> > > >>
> > > >> Here is the link to PIP 39:
> > > >>
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_wiki_PIP-2D39-253A-2DNamespace-2DChange-2DEvents&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=G_K7a-seNfGGb-Z4Wy0Q5iMrbdL2j9WCoMUWfwUH5RY&e=
> > > >> .
> > > >>
> > > >> I will look into the implementation of PIP 39 to better understand
> its
> > > >> design, as I think it will likely influence this feature's design.
> > > >>
> > > >> Thanks,
> > > >> Michael
> > > >>
> > > >> On Wed, Apr 21, 2021 at 5:50 PM Joe F <joefranc...@gmail.com>
> wrote:
> > > >>
> > > >> > I would be very careful about implementing  such a feature,
> because
> > of
> > > >> > introducing  undesirable interdependencies. Broker processes only
> > talk
> > > >> to
> > > >> > the metadata store or data store. This keeps brokers isolated from
> > > each
> > > >> > other - one broker is not dependent on the functioning of another
> > > >> broker.
> > > >> >
> > > >> > A broker publishing to a topic hosted on another broker (which for
> > eg:
> > > >> is
> > > >> > serving "system topic"),  sets up an undesirable dependency,
> which
> > > >> reduces
> > > >> > total system resiliency and availability for the cluster. These
> are
> > > >> better
> > > >> > implemented as notifications off the metadata changes.
> > > >> >
> > > >> > Good feature, but needs careful thought to do it right
> > > >> > Joe
> > > >> >
> > > >> > On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall <
> > > mikemars...@gmail.com
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> > > Thanks for your response, PengHui.
> > > >> > >
> > > >> > > I think this feature would be useful to end users for cluster
> > > >> management,
> > > >> > > which is why I want to contribute a first class feature instead
> of
> > > >> > writing
> > > >> > > my own plugin that would add little value to the community.
> > > >> > >
> > > >> > > > With the broker interceptor you can intercept all the REST API
> > > >> request
> > > >> > > and response, Pulsar commands between the broker and clients.
> > > >> > >
> > > >> > > Based on looking through the interceptor trait, I don't see a
> way
> > to
> > > >> > > trigger events based on auto created/deleted topics. For
> example,
> > > >> when a
> > > >> > > producer connects to a broker for a nonexistent topic (assuming
> > auto
> > > >> > topic
> > > >> > > creation is allowed), a managed ledger, and thus a topic, is
> > created
> > > >> > > without ever interacting with that interceptor trait. The same
> > > >> appears to
> > > >> > > be true for garbage collected topics. I think we'll need more
> than
> > > >> this
> > > >> > > interceptor to properly capture all cases where topics are
> created
> > > or
> > > >> > > deleted.
> > > >> > >
> > > >> > > Regarding my reference to potential further work, it does appear
> > > that
> > > >> low
> > > >> > > level auditing of connections and pulsar commands could be
> covered
> > > by
> > > >> the
> > > >> > > interceptor. However, it would still be on the end user to
> > implement
> > > >> such
> > > >> > > functionality.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Michael
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Apr 21, 2021 at 3:51 AM PengHui Li <
> > codelipeng...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Michael,
> > > >> > > >
> > > >> > > > Currently, Pulsar supports a pluginable Broker Interceptor,
> you
> > > can
> > > >> > find
> > > >> > > > it here
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_pulsar_blob_6704f12104219611164aa2bb5bbdfc929613f1bf_pulsar-2Dbroker_src_main_java_org_apache_pulsar_broker_intercept_BrokerInterceptor.java&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=ke-uQGYh--_pwtgn13szgq-axZcRVTJXoSurefbZEk4&m=ACSaHFP9BC5MVQZZWMp0KvSJnvwUr4Jvd08xKKbQWBI&s=6Li1guS8lImjrxPo9A0nnQAmDMnYEKHlAGqlVYvB8Ug&e=
> > > >> > > >
> > > >> > > > With the broker interceptor you can intercept all the REST API
> > > >> request
> > > >> > > and
> > > >> > > > response, Pulsar commands between the broker and clients.
> > > >> > > > This can be used to audit the system events.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Penghui
> > > >> > > > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall <
> > > >> > mikemars...@gmail.com
> > > >> > > >,
> > > >> > > > wrote:
> > > >> > > > > Hello all,
> > > >> > > > >
> > > >> > > > > I would like to propose adding a new feature to Pulsar that
> > will
> > > >> > > require
> > > >> > > > a
> > > >> > > > > PIP. In addition to feedback on the proposed feature, I am
> > > looking
> > > >> > for
> > > >> > > > > guidance on how to go about creating the PIP. Thanks for any
> > > help
> > > >> you
> > > >> > > can
> > > >> > > > > provide.
> > > >> > > > >
> > > >> > > > > I would like to add an optional system topic where topic
> > > creation
> > > >> and
> > > >> > > > topic
> > > >> > > > > deletion events are published. This feature will make it
> > easier
> > > to
> > > >> > > > leverage
> > > >> > > > > the auto topic creation and inactive topic deletion features
> > by
> > > >> > > > providing a
> > > >> > > > > way for users to reactively discover changes to topics. The
> > > >> largest
> > > >> > > > benefit
> > > >> > > > > is that users won't need to poll for these updates with an
> > admin
> > > >> > > client.
> > > >> > > > > Instead, they will get them as messages.
> > > >> > > > >
> > > >> > > > > I looked to see if an equivalent feature already exists,
> but I
> > > >> don't
> > > >> > > see
> > > >> > > > > one. For reference, the `PatternMultiTopicsConsumerImpl`
> > > currently
> > > >> > > polls
> > > >> > > > > for all topics in a namespace and then does set operations
> to
> > > >> compute
> > > >> > > the
> > > >> > > > > "new" topics to which it should subscribe. This client
> > > >> implementation
> > > >> > > > could
> > > >> > > > > possibly leverage the new feature.
> > > >> > > > >
> > > >> > > > > There are still details I need to work out, like how it will
> > > work
> > > >> for
> > > >> > > > > partitioned vs unpartitioned topics and what kind of
> > guarantees
> > > we
> > > >> > have
> > > >> > > > > regarding messaging semantics (I think we'll want at least
> > once
> > > >> > message
> > > >> > > > > delivery here). I plan to include these details in the PIP
> > with
> > > >> > > > discussions
> > > >> > > > > about trade offs for different implementations.
> > > >> > > > >
> > > >> > > > > Does this feature sound helpful and reasonable to others? If
> > so,
> > > >> is
> > > >> > the
> > > >> > > > > next step to formally write a proposal in a Google Doc or to
> > put
> > > >> > > > together a
> > > >> > > > > doc on the Pulsar GitHub Wiki?
> > > >> > > > >
> > > >> > > > > Related and/or future work to consider in this design: I can
> > see
> > > >> > adding
> > > >> > > > > different system topics for these types of auditable system
> > > >> events.
> > > >> > We
> > > >> > > > > currently rely on log lines as our primary way for end users
> > to
> > > >> audit
> > > >> > > > > system events, e.g. a producer connecting to a broker or a
> > > >> > subscription
> > > >> > > > > getting created, but we could instead have topics that
> > represent
> > > >> > > streams
> > > >> > > > of
> > > >> > > > > these different kinds of events. A persistent topic could
> make
> > > >> these
> > > >> > > > audit
> > > >> > > > > events more durable and more structured which should lend
> > > >> themselves
> > > >> > to
> > > >> > > > > being more easily analyzed. Further, users could choose to
> > turn
> > > >> > on/off
> > > >> > > > > these audit events, perhaps at the broker or namespace
> level,
> > to
> > > >> fit
> > > >> > > > their
> > > >> > > > > own needs.
> > > >> > > > >
> > > >> > > > > Let me know what you think and how I should proceed.
> > > >> > > > >
> > > >> > > > > Regards,
> > > >> > > > > Michael Marshall
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
> >
>

Reply via email to