Re: [DISCUSSION] PIP-135: Include MetadataStore backend for Etcd

2022-01-12 Thread Joe F
+1

On Wed, Jan 12, 2022 at 3:52 PM Aloys Zhang  wrote:

> +1
>
> 陳智弘  于2022年1月12日周三 10:19写道:
>
> > +1
> >
> > Haiting Jiang  於 2022年1月12日 週三 09:50 寫道:
> >
> > > +1
> > >
> > > On 2022/01/12 01:44:21 PengHui Li wrote:
> > > > +1
> > > >
> > > > Penghui
> > > >
> > > > On Wed, Jan 12, 2022 at 8:39 AM mattison chao <
> mattisonc...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Wed, 12 Jan 2022 at 08:09, Matteo Merli 
> > wrote:
> > > > >
> > > > > > https://github.com/apache/pulsar/issues/13717
> > > > > >
> > > > > > -
> > > > > >
> > > > > > ## Motivation
> > > > > >
> > > > > > Since all the pieces that composed the proposal in PIP-45 were
> > > finally
> > > > > > merged
> > > > > > and are currently ready for 2.10 release, it is now possible to
> add
> > > other
> > > > > > metadata backends that can be used to support a BookKeeper +
> Pulsar
> > > > > > cluster.
> > > > > >
> > > > > > One of the popular systems that is most commonly used as an
> > > alternative
> > > > > of
> > > > > > ZooKeeper is Etcd, thus it makes sense to have this as the first
> > > > > > non-zookeeper
> > > > > > implementation.
> > > > > >
> > > > > > ## Goal
> > > > > >
> > > > > > Provide an Etcd implementation for the `MetadataStore` API. This
> > will
> > > > > allow
> > > > > > users to deploy Pulsar clusters using Etcd service for the
> metadata
> > > and
> > > > > it
> > > > > > will
> > > > > > not require the presence of ZooKeeper.
> > > > > >
> > > > > >
> > > > > > ## Implementation
> > > > > >
> > > > > >  * Use the existing JEtcd Java client library for Etcd
> > > > > >  * Extends the `AbstractBatchedMetadataStore` class, in order to
> > > reuse
> > > > > the
> > > > > >transparent batching logic that will be shared with the
> > ZooKeeper
> > > > > >implementation.
> > > > > >
> > > > > > Work in progress: https://github.com/apache/pulsar/pull/13225
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Matteo Merli
> > > > > > 
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] PIP-135: Include MetadataStore backend for Etcd

2022-01-15 Thread Joe F
+1 (binding)

On Sat, Jan 15, 2022 at 4:46 AM Enrico Olivelli  wrote:

> Il Sab 15 Gen 2022, 09:10 tamer Abdlatif  ha
> scritto:
>
> > Will that affect the existing ZK metadata in a pulsar instance , When we
> > upgrade from a lower version to 2.10 ?  In other words,
> > Do we need a metadate migration to switch from ZK to Etcd ?
> >
>
> There is no need to migrate.
>
> Most probably the first release will bring this feature as non production
> ready and it will take some time to stabilise
>
> Enrico
>
>
>
> >
> > Thanks
> > Tamer
> >
> >
> >
> > On Fri, 14 Jan 2022, 22:52 Matteo Merli,  wrote:
> >
> > > https://github.com/apache/pulsar/issues/13717
> > >
> > > -
> > >
> > > ## Motivation
> > >
> > > Since all the pieces that composed the proposal in PIP-45 were finally
> > > merged
> > > and are currently ready for 2.10 release, it is now possible to add
> other
> > > metadata backends that can be used to support a BookKeeper + Pulsar
> > > cluster.
> > >
> > > One of the popular systems that is most commonly used as an alternative
> > of
> > > ZooKeeper is Etcd, thus it makes sense to have this as the first
> > > non-zookeeper
> > > implementation.
> > >
> > > ## Goal
> > >
> > > Provide an Etcd implementation for the `MetadataStore` API. This will
> > allow
> > > users to deploy Pulsar clusters using Etcd service for the metadata and
> > it
> > > will
> > > not require the presence of ZooKeeper.
> > >
> > >
> > > ## Implementation
> > >
> > >  * Use the existing JEtcd Java client library for Etcd
> > >  * Extends the `AbstractBatchedMetadataStore` class, in order to reuse
> > the
> > >transparent batching logic that will be shared with the ZooKeeper
> > >implementation.
> > >
> > > Work in progress: https://github.com/apache/pulsar/pull/13225
> > >
> > > --
> > > Matteo Merli
> > > 
> > >
> >
>


Re: [DISCUSS] PIP-136: Sync Pulsar policies across multiple clouds

2022-02-03 Thread Joe F
>On my first reading, it wasn't clear if there was only one topic
required for this feature. I now see that the topic is not tied to a
specific tenant or namespace. As such, we can avoid complicated
authorization questions by putting the required event topic(s) into a
"system" tenant and namespace

We should consider complicated questions. We can say why we chose not to
address it, or why it does not apply. for a particular situation

Many namespace policies are administered by tenants.  As such any tenant
can load this topic.  Is it possible for one abusive tenant to make your
system topic dysfunctional?

Pulsar committers should think about
(1) scenarios where the Pulsar cluster operators and tenant admins  are
different entities and tenants can be malicious, or more probably, write
bad code that will produce malicious outcomes.
(2) whether the changes introduce  additional SPOFs into the cluster.

I don't think this PIP has those issues, but  as a matter of practice, I
would like to see backend/system PIPs consider these questions  and
explicitly state the conclusions with rationale

Joe


On Wed, Feb 2, 2022 at 9:27 PM Michael Marshall 
wrote:

> Thanks for your responses.
>
> > I don't see a need of protobuf for this particular usecase
>
> If no one else feels strongly on this point, I am good with using a POJO.
>
> > It doesn't matter if it's system-topic or not because it's
> > configurable and the admin of the system can decide and configure it
> > according to the required persistent policy.
>
> On my first reading, it wasn't clear if there was only one topic
> required for this feature. I now see that the topic is not tied to a
> specific tenant or namespace. As such, we can avoid complicated
> authorization questions by putting the required event topic(s) into a
> "system" tenant and namespace, by default. The `pulsar/system` tenant
> and namespace seem appropriate to me.
>
> > I would keep the system topic
> > separate because this topic serves a specific purpose with specific
> schema,
> > replication policy and retention policy.
>
> I think we need a more formal definition for system topics. This topic
> is exactly the kind of topic I would call a system topic: its intended
> producers and consumers are Pulsar components. However, because
> this feature can live on a topic in a system namespace, we can avoid
> the classification discussion for this PIP.
>
> > Source region will have a broker which will create a failover consumer on
> > that topic and a broker with an active consumer will watch the metadata
> > changes and publish the changes to the event topic.
>
> How do we designate the host broker? Is it manual? How does it work
> when the host broker is removed from the cluster?
>
> If we collocate the active consumer with the broker hosting the event
> topic, can we skip creating the failover consumer?
>
> > PIP briefly talks about it but I will update the PIP with more
> > explanation.
>
> I look forward to seeing more about this design for conflict resolution.
>
> Thanks,
> Michael
>
>
>
> On Tue, Feb 1, 2022 at 3:01 AM Rajan Dhabalia 
> wrote:
> >
> > Please find my response inline.
> >
> > On Mon, Jan 31, 2022 at 9:17 PM Michael Marshall 
> > wrote:
> >
> > > I think this is a very appropriate direction to take Pulsar's
> > > geo-replication. Your proposal is essentially to make the
> > > inter-cluster configuration event driven. This increases fault
> > > tolerance and better decouples clusters.
> > >
> > > Thank you for your detailed proposal. After reading through it, I have
> > > some questions :)
> > >
> > > 1. What do you think about using protobuf to define the event
> > > protocol? I know we already have a topic policy event stream
> > > defined with Java POJOs, but since this feature is specifically
> > > designed for egressing cloud providers, ensuring compact data transfer
> > > would keep egress costs down. Additionally, protobuf can help make it
> > > clear that the schema is strict, should evolve thoughtfully, and
> > > should be designed to work between clusters of different versions.
> > >
> >
> >  >>> I don't see a need of protobuf for this particular usecase because
> of
> > two reasons:
> >   >> a. policy changes don't generate huge traffic which could be 1 rps
> b.
> > and it doesn't need performance optimization.
> >   >> It should be similar as storing policy in text instead protobuf
> which
> > doesn't impact footprint size or performance due to limited number of
> >  >> update operations and relatively less complexity. I agree that
> protobuf
> > could be another option but in this case it's not needed. Also, POJO
> >  >> can also support schema and versioning.
> >
> >
> >
> > >
> > > 2. In your view, which tenant/namespace will host
> > > `metadataSyncEventTopic`? Will there be several of these topics or is
> > > it just hosted in a system tenant/namespace? This question gets back
> > > to my questions about system topics on this mailing list last week
> [0].

Re: [DISCUSS] Use the non-persistent topic to sync bundle load data

2022-03-17 Thread Joe F
IIRC, there is a historical  load profile for topic that feeds into
decisions by the load balancer.

What happens during a cluster startup, with this new proposal?




On Thu, Mar 17, 2022 at 7:50 AM PengHui Li  wrote:

> > But which brokers will own that topic ?
> in a Pulsar cluster with a high level of isolation of tenants, we must
> ensure that:
> - at least one broker is allowed to own the topic
> - brokers dedicated to tenants do not own the topic
> With the current approach the data in on zookeeper, and this is shared
> among all the brokers
>
> We have "pulsar/system" namespace which can be used to maintain
> system topics. If users consider broker isolation, it's all transparent.
>
> Using a topic we also can shared the data among all brokers.
> Who want a data copy, only need to create a reader when starting.
> And we have introduced table view, which will make it easier to cache
> the load data, and perform the load cache update.
>
> > Another point:
> will users be allowed to produce/consume this topic ? how do we deal
> with permissions =
>
> Good point. We should avoid the user's producers/consumers, and only
> the super user can access the system topic.
>
> Thanks,
> Penghui
>
> On Thu, Mar 17, 2022 at 10:08 PM Enrico Olivelli 
> wrote:
>
> > Il giorno gio 17 mar 2022 alle ore 02:42 PengHui Li
> >  ha scritto:
> > >
> > > >  we do not know
> > > anything about the availability of the owner of the topic.
> > >
> > > If the owner broker is not available, other brokers will take over.
> > >
> > > > We could make it simpler and when a broker wants to push its data, it
> > > looks
> > > up the REST address of the "leader broker" and then pushes the data to
> > it,
> > > I mean, without involving a "topic"
> > >
> > > Any broker may become the leader broker, in this case, the brokers need
> > to
> > > know all the addresses of the brokers in the cluster. With the topic
> > > approach,
> > > they only need to know the topic name.
> >
> > I thought about this a little more.
> > Using a non persistent topic makes sense. So I am closer to be
> > convinced about this move.
> >
> > But which brokers will own that topic ?
> > in a Pulsar cluster with a high level of isolation of tenants, we must
> > ensure that:
> > - at least one broker is allowed to own the topic
> > - brokers dedicated to tenants do not own the topic
> > With the current approach the data in on zookeeper, and this is shared
> > among all the brokers
> >
> > Another point:
> > will users be allowed to produce/consume this topic ? how do we deal
> > with permissions =
> >
> >
> > Enrico
> >
> > >
> > > Penghui
> > >
> > > On Thu, Mar 17, 2022 at 12:35 AM Enrico Olivelli 
> > > wrote:
> > >
> > > > But in order to read from a topic you need a broker that is the owner
> > of
> > > > the owner of the special "temporary topic".
> > > >
> > > > While the metadata service (ZooKeeper) is already a central point and
> > it is
> > > > meant to be available (otherwise Pulsar doesn't work), we do not know
> > > > anything about the availability of the owner of the topic.
> > > >
> > > > Or do you mean to create a special topic that is always owned by the
> > > > "leader broker" ?
> > > >
> > > > We could make it simpler and when a broker wants to push its data, it
> > looks
> > > > up the REST address of the "leader broker" and then pushes the data
> to
> > it,
> > > > I mean, without involving a "topic".
> > > >
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > > Il Mer 16 Mar 2022, 12:55 PengHui Li  ha
> scritto:
> > > >
> > > > > +1
> > > > >
> > > > > The load data don't need to be persistent to the storage layer,
> > > > > Using a non-persistent topic is more efficient.
> > > > >
> > > > > Thanks,
> > > > > Penghui
> > > > >
> > > > > On Wed, Mar 16, 2022 at 2:14 PM Kai Wang
> > 
> > > > > wrote:
> > > > >
> > > > > > Hi Pulsar Community,
> > > > > >
> > > > > > Currently, Pulsar LoadManager is using Zookeeper to store the
> local
> > > > > broker
> > > > > > data, the LoadReportUpdaterTask will report the local load data
> to
> > > > > > Zookeeper, the leader broker will collect load data and store it
> to
> > > > > > Zookeeper.
> > > > > >
> > > > > > When we have a lot of brokers and bundles, this load datas will
> put
> > > > some
> > > > > > pressure on Zookeeper.
> > > > > >
> > > > > > Since the load data are not strongly consistent, we can use the
> > > > > > non-persistent topics to sync the load data. And it will reduce
> our
> > > > > > dependence on Zookeeper.
> > > > > >
> > > > > > If this proposal is acceptable, I will draft a PIP.
> > > > > >
> > > > > > Any suggestions are appreciated.
> > > > > >
> > > > > > Thanks,
> > > > > > Kai
> > > > > >
> > > > >
> > > >
> >
>


Re: [DISCUSS] Use the non-persistent topic to sync bundle load data

2022-03-17 Thread Joe F
To clarify, historical profile that is persisted in ZK

On Thu, Mar 17, 2022 at 11:54 AM Joe F  wrote:

> IIRC, there is a historical  load profile for topic that feeds into
> decisions by the load balancer.
>
> What happens during a cluster startup, with this new proposal?
>
>
>
>
> On Thu, Mar 17, 2022 at 7:50 AM PengHui Li  wrote:
>
>> > But which brokers will own that topic ?
>> in a Pulsar cluster with a high level of isolation of tenants, we must
>> ensure that:
>> - at least one broker is allowed to own the topic
>> - brokers dedicated to tenants do not own the topic
>> With the current approach the data in on zookeeper, and this is shared
>> among all the brokers
>>
>> We have "pulsar/system" namespace which can be used to maintain
>> system topics. If users consider broker isolation, it's all transparent.
>>
>> Using a topic we also can shared the data among all brokers.
>> Who want a data copy, only need to create a reader when starting.
>> And we have introduced table view, which will make it easier to cache
>> the load data, and perform the load cache update.
>>
>> > Another point:
>> will users be allowed to produce/consume this topic ? how do we deal
>> with permissions =
>>
>> Good point. We should avoid the user's producers/consumers, and only
>> the super user can access the system topic.
>>
>> Thanks,
>> Penghui
>>
>> On Thu, Mar 17, 2022 at 10:08 PM Enrico Olivelli 
>> wrote:
>>
>> > Il giorno gio 17 mar 2022 alle ore 02:42 PengHui Li
>> >  ha scritto:
>> > >
>> > > >  we do not know
>> > > anything about the availability of the owner of the topic.
>> > >
>> > > If the owner broker is not available, other brokers will take over.
>> > >
>> > > > We could make it simpler and when a broker wants to push its data,
>> it
>> > > looks
>> > > up the REST address of the "leader broker" and then pushes the data to
>> > it,
>> > > I mean, without involving a "topic"
>> > >
>> > > Any broker may become the leader broker, in this case, the brokers
>> need
>> > to
>> > > know all the addresses of the brokers in the cluster. With the topic
>> > > approach,
>> > > they only need to know the topic name.
>> >
>> > I thought about this a little more.
>> > Using a non persistent topic makes sense. So I am closer to be
>> > convinced about this move.
>> >
>> > But which brokers will own that topic ?
>> > in a Pulsar cluster with a high level of isolation of tenants, we must
>> > ensure that:
>> > - at least one broker is allowed to own the topic
>> > - brokers dedicated to tenants do not own the topic
>> > With the current approach the data in on zookeeper, and this is shared
>> > among all the brokers
>> >
>> > Another point:
>> > will users be allowed to produce/consume this topic ? how do we deal
>> > with permissions =
>> >
>> >
>> > Enrico
>> >
>> > >
>> > > Penghui
>> > >
>> > > On Thu, Mar 17, 2022 at 12:35 AM Enrico Olivelli > >
>> > > wrote:
>> > >
>> > > > But in order to read from a topic you need a broker that is the
>> owner
>> > of
>> > > > the owner of the special "temporary topic".
>> > > >
>> > > > While the metadata service (ZooKeeper) is already a central point
>> and
>> > it is
>> > > > meant to be available (otherwise Pulsar doesn't work), we do not
>> know
>> > > > anything about the availability of the owner of the topic.
>> > > >
>> > > > Or do you mean to create a special topic that is always owned by the
>> > > > "leader broker" ?
>> > > >
>> > > > We could make it simpler and when a broker wants to push its data,
>> it
>> > looks
>> > > > up the REST address of the "leader broker" and then pushes the data
>> to
>> > it,
>> > > > I mean, without involving a "topic".
>> > > >
>> > > >
>> > > > Enrico
>> > > >
>> > > >
>> > > >
>> > > > Il Mer 16 Mar 2022, 12:55 PengHui Li  ha
>> scritto:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > The load data don't need to be persistent to the storage layer,
>> > > > > Using a non-persistent topic is more efficient.
>> > > > >
>> > > > > Thanks,
>> > > > > Penghui
>> > > > >
>> > > > > On Wed, Mar 16, 2022 at 2:14 PM Kai Wang
>> > 
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Pulsar Community,
>> > > > > >
>> > > > > > Currently, Pulsar LoadManager is using Zookeeper to store the
>> local
>> > > > > broker
>> > > > > > data, the LoadReportUpdaterTask will report the local load data
>> to
>> > > > > > Zookeeper, the leader broker will collect load data and store
>> it to
>> > > > > > Zookeeper.
>> > > > > >
>> > > > > > When we have a lot of brokers and bundles, this load datas will
>> put
>> > > > some
>> > > > > > pressure on Zookeeper.
>> > > > > >
>> > > > > > Since the load data are not strongly consistent, we can use the
>> > > > > > non-persistent topics to sync the load data. And it will reduce
>> our
>> > > > > > dependence on Zookeeper.
>> > > > > >
>> > > > > > If this proposal is acceptable, I will draft a PIP.
>> > > > > >
>> > > > > > Any suggestions are appreciated.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Kai
>> > > > > >
>> > > > >
>> > > >
>> >
>>
>


Re: [VOTE] PIP-136: Sync Pulsar policies across multiple clouds

2022-03-18 Thread Joe F
+1

On Thu, Mar 17, 2022 at 12:07 PM Rajan Dhabalia 
wrote:

> Hi,
>
> I would like to start VOTE on PIP-136:
> https://github.com/apache/pulsar/issues/13728
>
> Thanks,
> Rajan
>
> On Tue, Feb 8, 2022 at 4:58 PM Rajan Dhabalia 
> wrote:
>
> >
> > >> How do we designate the host broker? Is it manual? How does it work
> > when the host broker is removed from the cluster?
> > No, it will not be manual but as I explained earlier a broker which has a
> > failover consumer to consume remote events will be the publisher for
> > metadata update. If that broker is removed then a new failover
> > consumer/broker will be selected for the same.
> >
> > >> I look forward to seeing more about this design for conflict
> resolution.
> > Sure, I have updated PIP to handle such race condition:
> https://github.com/apache/pulsar/issues/13728
> >
> >
> > >> (1) scenarios where the Pulsar cluster operators and tenant admins
> are
> > different entities and tenants can be malicious, or more probably, write
> > bad code that will produce malicious outcomes.
> > I agree, Pulsar should have provision to prevent such scenarios where
> > changes from one tenant in a cluster can impact other clusters. This PIP
> > considers the tenant/admin will be the same at both the ends but that can
> > not be true in all cases. We can add an enhancement later or we can
> create
> > a separate PIP to start discussion with the possible solutions.
> >
> > Thanks,
> > Rajan
> >
> > On Thu, Feb 3, 2022 at 9:59 AM Joe F  wrote:
> >
> >> >On my first reading, it wasn't clear if there was only one topic
> >> required for this feature. I now see that the topic is not tied to a
> >> specific tenant or namespace. As such, we can avoid complicated
> >> authorization questions by putting the required event topic(s) into a
> >> "system" tenant and namespace
> >>
> >> We should consider complicated questions. We can say why we chose not to
> >> address it, or why it does not apply. for a particular situation
> >>
> >> Many namespace policies are administered by tenants.  As such any tenant
> >> can load this topic.  Is it possible for one abusive tenant to make your
> >> system topic dysfunctional?
> >>
> >> Pulsar committers should think about
> >> (1) scenarios where the Pulsar cluster operators and tenant admins  are
> >> different entities and tenants can be malicious, or more probably, write
> >> bad code that will produce malicious outcomes.
> >> (2) whether the changes introduce  additional SPOFs into the cluster.
> >>
> >> I don't think this PIP has those issues, but  as a matter of practice, I
> >> would like to see backend/system PIPs consider these questions  and
> >> explicitly state the conclusions with rationale
> >>
> >> Joe
> >>
> >>
> >> On Wed, Feb 2, 2022 at 9:27 PM Michael Marshall 
> >> wrote:
> >>
> >> > Thanks for your responses.
> >> >
> >> > > I don't see a need of protobuf for this particular usecase
> >> >
> >> > If no one else feels strongly on this point, I am good with using a
> >> POJO.
> >> >
> >> > > It doesn't matter if it's system-topic or not because it's
> >> > > configurable and the admin of the system can decide and configure it
> >> > > according to the required persistent policy.
> >> >
> >> > On my first reading, it wasn't clear if there was only one topic
> >> > required for this feature. I now see that the topic is not tied to a
> >> > specific tenant or namespace. As such, we can avoid complicated
> >> > authorization questions by putting the required event topic(s) into a
> >> > "system" tenant and namespace, by default. The `pulsar/system` tenant
> >> > and namespace seem appropriate to me.
> >> >
> >> > > I would keep the system topic
> >> > > separate because this topic serves a specific purpose with specific
> >> > schema,
> >> > > replication policy and retention policy.
> >> >
> >> > I think we need a more formal definition for system topics. This topic
> >> > is exactly the kind of topic I would call a system topic: its intended
> >> > producers and consumers are Pulsar components. However, because
> >> > this feature can live on a topic in a

Re: [DISCUSS] [PIP-152] Support subscription level dispatch rate limiter setting.

2022-04-12 Thread Joe F
-1

The rate limits that are currently in place are there to protect the Pulsar
service,  to manage  multi-tenancy on the broker. This is not  meant as a
feature to  manage demand side throttling.

In my opinion,  this is best done as a client side feature. There is no
need to add complexity  to  the broker to manage demand side throttling.
Just throttle demand on the client side.



-joe

On Mon, Apr 11, 2022 at 7:54 PM Zixuan Liu  wrote:

> +1
>
> Zixuan
>
> mattison chao  于2022年4月12日周二 09:24写道:
>
> > +1
> >
> > Looks like a very useful feature. Thank you.
> >
> > Best,
> > Mattison
> >
> > On Mon, 11 Apr 2022 at 08:55, PengHui Li  wrote:
> >
> > > +1
> > >
> > > Penghui
> > >
> > > On Sat, Apr 9, 2022 at 4:24 PM Haiting Jiang 
> > > wrote:
> > >
> > > > Hi Pulsar community,
> > > >
> > > > I created a PIP to add support for subscription level dispatch rate
> > > > limiter setting.
> > > >
> > > > The proposal can be found:
> > https://github.com/apache/pulsar/issues/15094
> > > >
> > > > 
> > > >
> > > > ## Motivation
> > > >
> > > > Currently, for message dispatch rate limiter in a subscription , we
> > have
> > > 3
> > > > level setting :
> > > > - Broker level setting: configured with
> > > > `dispatchThrottlingRatePerSubscriptionInMsg` and
> > > > `dispatchThrottlingRatePerSubscriptionInByte` in broker.conf
> > > > - Namespace level setting: configured with
> > > >
> `org.apache.pulsar.client.admin.Namespaces#setSubscriptionDispatchRate`
> > > > - Topic level setting: configured with
> > > >
> > >
> >
> `org.apache.pulsar.client.admin.TopicPolicies#setSubscriptionDispatchRate`
> > > >
> > > > As we all know, in the pub-sub messaging model, different subscriber
> of
> > > > the same topic process the messages for various purpose, and they may
> > > have
> > > > different requirement of message dispatch rate limiter. Here are some
> > use
> > > > case in my organization:
> > > > - On the client side, subscriptions have different
> > max-process-capacity.
> > > > If the dispatch rate is too large, they may crush their downstream
> > > services.
> > > > - We are billing base on the max message rate of the subscription.
> Some
> > > > are sensitive to budgets and willing to pay less for lower
> throughput.
> > > >
> > > >
> > > > ## Goal
> > > >
> > > > Support subscription level dispatch rate limiter setting.
> > > >
> > > > ## API Changes
> > > >
> > > >
> > > > 1. Add client api in org.apache.pulsar.client.admin.TopicPolicies.
> > > > ```
> > > > void getSubscriptionDispatchRate(String topic, String sub) throws
> > > > PulsarAdminException;
> > > > void getSubscriptionDispatchRate(String topic, String sub, boolean
> > > > applied) throws PulsarAdminException;
> > > > void setSubscriptionDispatchRate(String topic, String sub,
> DispatchRate
> > > > dispatchRate) throws PulsarAdminException;
> > > > void removeSubscriptionDispatchRate(String topic, String sub) throws
> > > > PulsarAdminException;
> > > >
> > > > //And the async version of these methods.
> > > >
> > > > ```
> > > >
> > > > 2. Add new admin  API
> > > >
> > > > ```
> > > > @PUT @DELETE @GET
> > > > @Path("/{tenant}/{namespace}/{topic}/{subName}/dispatchRate")
> > > >
> > > > ```
> > > >
> > > > ## Implementation
> > > >
> > > > The rate limiter itself is already implemented with each
> subscription.
> > We
> > > > only need to update the rate limiter settings if subscription level
> > > config
> > > > is set.
> > > > I propose to just add a new field in
> > > > `org.apache.pulsar.common.policies.data.TopicPolicies` to store the
> > data.
> > > > ```
> > > > private Map
> > > > subscriptionDispatchRateMap;
> > > > ```
> > > > And subscription level rate limiter setting has higher priority than
> > > topic
> > > > level. We need to calculate the applied value when we create the
> > > > subscription or any level config is changed.
> > > >
> > > > ## Reject Alternatives
> > > > None yet.
> > > >
> > > >
> > > > Thanks,
> > > > Haiting
> > > >
> > >
> >
>


Re: [DISCUSS] PIP-181: Pulsar Shell

2022-07-02 Thread Joe F
Why are there 2 PIPs being discussed as  PIP-181? There's another
discussion on dev with

[DISCUSS] PIP-181: Provide new load balance placement strategy
implementation for ModularLoadManagerStrategy

On Sat, Jul 2, 2022 at 1:13 AM Haiting Jiang 
wrote:

> +1, Great feature.
>
> Thanks,
> Haiting
>
> On 2022/07/01 16:56:08 Nicolò Boschi wrote:
> > Yes it's not a good idea to add such new features to active release
> > branches.
> > However, the shell will work with older cluster versions (as soon as the
> > java client is compatible). Also you will be able to download the shell
> > tarball (that just contains the minimal stuff needed to run the shell).
> >
> > Nicolò Boschi
> >
> >
> > Il giorno ven 1 lug 2022 alle ore 18:44 tison  ha
> > scritto:
> >
> > > Hi Xiaoyu,
> > >
> > > IIUC patch release must not include new features but only bug fixes.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Anon Hxy  于2022年7月1日周五 23:59写道:
> > >
> > > > Hi Nicolò Boschi,
> > > >
> > > > The Pulsar Shell is really cool and I like it. And also I have a
> > > question:
> > > >
> > > > >  I'd like to target this feature for 2.11
> > > >
> > > > Will it be possible to use Pulsar Shell in the legacy version of
> Pulsar,
> > > in
> > > > other words, could we cherry-pick the PR to another branch easily?
> > > >
> > > > Thanks,
> > > > Xiaoyu Hou
> > > >
> > > > Nicolò Boschi  于2022年7月1日周五 20:40写道:
> > > >
> > > > > I updated the issue linking all the implementations pull requests.
> > > > > If you open the issue you will be able to see 2 videos that better
> > > > explain
> > > > > how the tool works.
> > > > > https://github.com/apache/pulsar/issues/16250
> > > > >
> > > > > Let me know if you have any feedbacks
> > > > >
> > > > > I'd like to target this feature for 2.11
> > > > >
> > > > > BR,
> > > > > Nicolò Boschi
> > > > >
> > > > >
> > > > > Il giorno mar 28 giu 2022 alle ore 10:06 Nicolò Boschi <
> > > > > boschi1...@gmail.com>
> > > > > ha scritto:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I opened a new PIP about Pulsar CLI tools.
> > > > > > Looking forward to seeing comments and suggestions.
> > > > > >
> > > > > > PIP: https://github.com/apache/pulsar/issues/16250
> > > > > >
> > > > > > I posted a short video that shows how the new tool will work:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://user-images.githubusercontent.com/23314389/176125261-35e123a1-1826-4553-b912-28d00914c0e4.mp4
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > ## Motivation
> > > > > >
> > > > > > Currently Pulsar comes with a couple of utility scripts with the
> goal
> > > > of
> > > > > > managing an existing cluster, test behaviours and verify
> > > performances:
> > > > > > these tools are available as SH script inside the `bin`
> directory.
> > > > > > The `pulsar-admin` is the CLI tool supposed to help users and
> > > operators
> > > > > to
> > > > > > configure the system, operate over policies, install functions
> and
> > > much
> > > > > > else.
> > > > > >
> > > > > > This proposal basically aims to solve two different problems:
> > > > > >
> > > > > > 1. `pulsar-admin` is terribly slow. Every time the script is
> > > > triggered, a
> > > > > > new JVM process is spawned. The JVM process creation is heavy and
> > > most
> > > > of
> > > > > > the time is spent by the JVM initialization process. A very
> common
> > > use
> > > > > case
> > > > > > for cluster operators is to create scripts with several commands
> with
> > > > the
> > > > > > goal of initialize the cluster, initialize a specific tenant
> > > > (namespaces,
> > > > > > topics, policies, functions..); in this case, one JVM is
> initialized
> > > > for
> > > > > > each scripts leads to waste of time and resources.
> > > > > >
> > > > > > 2. User experience. The current design of the Pulsar CLIs can be
> > > > > improved.
> > > > > > There are a couple of aspects that may be annoying for a user
> and can
> > > > > > discourage a user to use Pulsar.
> > > > > > 1. Poking around available commands and options in a CLI tool
> > > > > > (`pulsar-admin` for instance, but it's the same for
> `pulsar-perf` and
> > > > > > `pulsar-client`) is slow and hard. In order to discover commands
> and
> > > > > > options you need to use `-h` option and, since the performance
> issue
> > > > > > pointed at 1., it can be annoying and time-consuming.
> Autocomplete
> > > > > feature
> > > > > > could be a real game-changer in this context.
> > > > > > 2. Different CLI tools. There are a couple of different shell
> > > > > scripts.
> > > > > > They have different goals and it's okay to keep them separated.
> > > > However,
> > > > > > they raise a barrier for a non Pulsar expert that doesn't have a
> > > > > convenient
> > > > > > entry-point.
> > > > > >
> > > > > > ## Goal
> > > > > >
> > > > > > Address all the issues in the previous section with a single
> > > solution.
> > > > > >
> > > > > > ## API Changes
> > > > > >
> > > > > > A new shell

Re: [DISCUSS] PIP-184: Cluster migration or Blue-Green cluster deployment support in Pulsar

2022-07-13 Thread Joe F
Hi Rajan,
>
> > For consumers, the message will be sent when there are no more messages
> to
> > read?
> >
> Yes.


I think it would be helpful to add a bit more detail on the  consumer
switch (sp. Shared consumers). There is a window of transition where
"dispatched-but-not-acked messages" might need to be dispatched again.

Dispatch will then need to move to the new cluster to continue dispatch
(which creates some interesting issues)  or hold up the migration of
consumption to the new cluster till the topic is completely consumed on the
old cluster. What is the behavior here? Would be helpful to be clear on
this.

The other point is about how replication would move over (Blue -> Red) and
(Green ->Red) To maintain ordering,  will Green delay replication until
Blue has finished replication?  Is the expectation that this is operator
managed ?

There are potentially other dependencies that might be impacted. Not asking
that all these be solved, but clarity would be nice - .a, b, c features
will work through this migration, x,y.z features will not work and are not
supported by this migration (or will need to be manually dealt with)

-joe


On Wed, Jul 13, 2022 at 9:31 AM Rajan Dhabalia  wrote:

> On Wed, Jul 13, 2022 at 2:30 AM Enrico Olivelli 
> wrote:
>
> > Very interesting. Overall I support this proposal
> >
> > A couple of questions:
> > - What about monitoring the status of the migration?
> >
> There will be a cluster level state. Once all topics are migrated and
> deleted, the broker will change cluster's state to migration completed. So,
> it will not require any additional monitoring.
>
>
> > - Should we block all maintenance operations on the "blue" cluster ?
> > like deleting/creating stuff
> >
> Broker iterates over all topics and first mark them terminated. So, it will
> eventually block operations on all topics.
>
>
> > - Should we stop ledger trimming and offloading ?
> >
> As topics will be terminated, topics will not receive any new writes. so, I
> don't think there will be any issues in that process.
>
>
> > - What about authentication/authorization between the two clusters ?
> > should we provide a dedicated set of credentials for the "migration"?
> >
> New cluster should have identical policies including auth compare to old
> cluster, and transition should be seamless for users without any downtime
> and any user involvement. You can use PIP-136 (
> https://github.com/apache/pulsar/issues/16424) to synchronize policies on
> new cluster before redirecting traffic on the cluster.
>
> Thanks,
> Rajan
>
>
> >
> > thanks
> > Enrico
> >
> > Il giorno mer 13 lug 2022 alle ore 10:55 Asaf Mesika
> >  ha scritto:
> > >
> > > Few questions
> > >
> > > "CompletableFuture asyncMigrate();"
> > > Does this method only change the status of the managed ledger?
> > >
> > > "message ManagedLedgerInfo {
> > >
> > >// Flag to check if topic is terminated and migrated to different
> > cluster
> > >optional bool migrated = 4;
> > >
> > > }"
> > >
> > > This flag then is only changed to true when it has finished migration:
> > i.e.
> > > no new messages were written, all existing consumers finished reading
> all
> > > messages and disconnected and the topic can now be deleted?
> > >
> > > "Broker sends topic migration message to client so, producer/consumer
> at
> > > client side can handle redirection accordingly"
> > >
> > > For producers, the message will be sent the moment the status of the
> > topic
> > > has changed, so all messages from there on will be written to the new
> > > cluster?
> > > For consumers, the message will be sent when there are no more messages
> > to
> > > read?
> > >
> > >
> > >
> > > On Tue, Jul 12, 2022 at 8:23 PM Rajan Dhabalia 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have created PIP-184 which helps users to perform cluster
> migration
> > with
> > > > Apache Pulsar. Cluster migration or Blue-Green cluster deployment is
> > one of
> > > > the proven solutions to migrate live traffic from one cluster to
> > another.
> > > > One of the examples is applications running on Kubernetes sometimes
> > require
> > > > a Kubernetes cluster upgrade which can cause downtime for the entire
> > > > application during a Kubernetes cluster upgrade. Blue-green
> deployment
> > is
> > > > an application release model that gradually transfers user traffic
> > from a
> > > > previous version of an app or microservice to a nearly identical new
> > > > release—both of which are running in production.
> > > >
> > > > The old version can be called the blue environment while the new
> > version
> > > > can be known as the green environment. Once production traffic is
> fully
> > > > transferred from blue to green, blue can standby in case of rollback
> > or be
> > > > pulled from production and updated to become the template upon which
> > the
> > > > next update is made. We need such capability in Apache pulsar to
> > migrate
> > > > live traffic from the blue cluster to the green cluster s

Re: [DISCUSS] PIP-204: Reactive Java client for Apache Pulsar

2022-08-30 Thread Joe F
+1

On Tue, Aug 30, 2022 at 9:37 AM Matteo Merli  wrote:

> +1
>
>
> --
> Matteo Merli
> 
>
> On Mon, Aug 29, 2022 at 5:56 AM Lari Hotari  wrote:
> >
> > Hi all,
> >
> > I have drafted PIP-204: Reactive Java client for Apache Pulsar.
> >
> > PIP link:
> > https://github.com/apache/pulsar/issues/17335
> >
> > Here's a copy of the contents of the GH issue for your references:
> >
> > Motivation
> >
> > There's a need to "go reactive from end-to-end" when building modern
> > reactive applications with platforms such as Spring Reactive.
> > There are ways to adapt the Apache Pulsar Java client async API calls to
> > Reactive Streams with a few lines of code.
> > However, a lot will be missing and achieving the complete solution will
> > require much more effort.
> >
> > A better solution would be to have first-class support Reactive Streams
> in
> > Apache Pulsar.
> >
> > Reactive Streams  is an
> interoperability
> > specification and there are multiple implementations for the JVM.
> > It's not about a single programming language.
> > For example, a Reactive client for Apache Pulsar supporting Reactive
> > Streams can be used together with Project Reactor / Spring Reactive, Akka
> > Streams, RxJava 3, Vert.x, SmallRye Mutiny (RedHat/Quarkus) and others.
> > Goal
> >
> > Provide Reactive Java client for Apache Pulsar
> >
> > The Reactive Java client for Apache Pulsar exposes a Reactive Streams
> > compatible Reactive client API for Apache Pulsar.
> > Reactive programming is about non-blocking applications that are
> > asynchronous and event-driven and require a small number of threads to
> > scale. The Reactive Java client for Apache Pulsar supports non-blocking
> > reactive asynchronous back pressure for producing and consuming messages
> so
> > that the producing or consuming pipeline doesn't get overwhelmed by
> > producing or consuming.
> > Libraries that support Reactive Streams provide a programming model that
> is
> > efficient and optimal for message producing and consuming (processing)
> use
> > cases.
> > API Changes
> >
> > Establish a Reactive Streams compatible client API for Apache Pulsar.
> > This client will be published in Maven central as a library.
> > Implementation
> >
> > There's an existing proof-of-concept available at
> > https://github.com/datastax/pulsar .
> > This implementation will be used as a reference for an entirely new
> > implementation that is started as a new repository under the Apache
> Pulsar
> > project.
> >
> > The proposal for the repository location is
> > https://github.com/apache/pulsar-client-reactive .
> > The Maven central group Id is "org.apache.pulsar" and the main artifact
> id
> > is "pulsar-client-reactive".
> > The root package name is "org.apache.pulsar.reactive.client".
> >
> > The implementation will provide an interface module that abstracts the
> > Reactive client API.
> > This interface is implemented by wrapping the current Apache Pulsar Java
> > client and adapts the existing async Java API to the the Reactive client
> > API.
> > The reason for this particular detail is that it is possible to provide a
> > native Reactive client later while having the possibility to start
> > developing applications immediately using the Reactive client API.
> > Applications depending on the API will be able to migrate to use the
> native
> > Reactive client with minor or no changes when it becomes available.
> > Anything else?
> >
> > By having an official Reactive Java client for Apache Pulsar, it will
> > provide a way to contribute and improve the official client.
> > Other opensource projects might want to provide support for using Apache
> > Pulsar within reactive application frameworks. Without an official
> reactive
> > client, this becomes hard, since open source projects would like to use
> > stable client dependencies instead of a hobby project provided by an
> > individual.
> > There are several members within the existing Apache Pulsar contributors
> > and committers that have expressed the desire to contribute to a Reactive
> > client for Apache Pulsar and are willing to maintain the new repository.
> > With the new repository and sub-project we will most likely see new
> active
> > contributors and could possibly appoint new Apache Pulsar committers to
> the
> > project to empower the developers working on this new sub-project.
> >
> > I'm looking forward to the discussion.
> >
> >
> > BR,
> >
> >
> > Lari
>


Re: [VOTE] Accept DotPulsar as part of Apache Pulsar project

2020-01-14 Thread Joe F
+1

On Tue, Jan 14, 2020 at 6:36 AM Yu Liu  wrote:

> +1 👏
>
> On Tue, Jan 14, 2020 at 10:19 AM Guangning E  wrote:
>
> > +1, 👏👏👏
> >
> > Thanks,
> > Guangning
> >
> > xiaolong ran  于2020年1月14日周二 上午10:15写道:
> >
> > > Thanks Danske contribution, LGTM +1
> > >
> > > --
> > > Thanks
> > > Xiaolong Ran
> > >
> > > > 在 2020年1月13日,下午9:06,Sijie Guo  写道:
> > > >
> > > > Hi all,
> > > >
> > > > I'd like to start a vote thread for accepting DotPulsar [1] as part
> of
> > > > Apache Pulsar project. DotPulsar is a pulsar client library for .NET
> > > which
> > > > was originally developed by Danske Commodities A/S
> > > > 
> > > >
> > > > Please vote with your opinions. The vote will be open for at least 72
> > > hours.
> > > >
> > > > [1] https://github.com/danske-commodities/dotpulsar
> > > > [2] PIP-53:
> > > >
> > >
> >
> https://github.com/apache/pulsar/wiki/PIP-53%3A-Contribute-DotPulsar-to-Apache-Pulsar
> > > >
> > > > Thanks,
> > > > Sijie
> > >
> > >
> >
>


Re: [DISCUESS] PIP 57: Improve Broker's Zookeeper Session Timeout Handling

2020-02-22 Thread Joe F
My concerns are listed in the PR comments.

A broker is allowed to operate on a (resource) bundle under a lock. When a
broker loses its session, the lock ownership COULD be lost. The right thing
at this point is to give up the resource and re-acquire it. ( In fact,
shutdown is just a shortcut to doing exactly this)  The broker continuing
to operate, ASSUMING  that it owns the bundle, violates the axiom that the
resource is protected by the lock. It breaks fundamental distribution
system principles for two nodes to own an exclusive resource concurrently.

 It does not matter even if no other broker grabbed the resource in the
meantime and the original broker successfully re-acquires the lock after
session loss. There is no way for the original broker to ascertain this
apriori,  for it to justify  operating on the resource, AS IF  it never
lost the lock.

It may be possible that underlying lower level locks may prevent
catastrophe, but that does not validate this violation of  basic
principles. Not only will it make incredibly difficult to assert the
correctness of the system, but makes the system more complex and difficult
to  maintain going forward.

The Global ZK and BK use  of ZK are not comparable to this situation. Doing
something like this would be incorrect in any distributed system.  The only
way something like this could even be attempted is if the broker can freeze
for the window of the time from where it loses the session and reacquires
the session.

Joe



On Fri, Feb 21, 2020 at 8:27 PM PengHui Li  wrote:

> Hi all,
>
> I have drafted a proposal for improving broker's Zookeeper session timeout
> handling. You can find at
> https://github.com/apache/pulsar/wiki/PIP-57%3A-Improve-Broker%27s-Zookeeper-Session-Timeout-Handling
>
> Also I copy it to the email thread for easier to view. Any suggestions or
> ideas welcome to join the discussion.
>
>
> PIP 57: Improve Broker's Zookeeper Session Timeout Handling
> Motivation
> In Pulsar, brokers use Zookeeper as the configuration store and broker
> metadata maintaining. We can also call them Global Zookeeper and Local
> Zookeeper.
> The Global Zookeeper maintains the namespace policies, cluster metadata,
> and partitioned topic metadata. To reduce read operations on Zookeeper,
> each broker has a cache for global Zookeeper. The Global Zookeeper cache
> updates on znode changed. Currently, when the present session timeout
> happens on global Zookeeper, a new session starts. Broker does not create
> any EPHEMERAL znodes on global Zookeeper.
> The Local Zookeeper maintains the local cluster metadata, such as broker
> load data, topic ownership data, managed ledger metadata, and Bookie rack
> information. All of broker load data and topic ownership data are create
> EPHEMERAL nodes on Local Zookeeper. Currently, when session timeout happens
> on Local Zookeeper, the broker shutdown itself.
> Shutdown broker results in ownership change of topics that the broker
> owned. However, we encountered lots of problems related to the current
> session timeout handling. Such as broker with long JVM GC pause, Local
> Zookeeper under high workload. Especially the latter may cause all broker
> shutdowns.
> So, the purpose of this proposal is to improve session timeout handling on
> Local Zookeeper to avoid unnecessary broker shutdown.
> Approach
> Same as the Global Zookeeper session timeout handling and Zookeeper
> session timeout handling in BookKeeper, a new session should start when the
> present session timeout.
> If a new session failed to start, the broker would retry several times.
> The retry times depend on the configuration of the broker. After the number
> of retries, if still can't start session success, the broker still needs to
> be shut down since this may be a problem with the Zookeeper cluster. The
> user needs to restart the broker after the zookeeper cluster returns to
> normal.
> If a new session starts success, the issue is slightly more complicated.
> So, I will introduce every scene separately.
> Topic ownership data handling
> The topic ownership data maintain all namespace bundles that owned by the
> broker. In Zookeeper, create an EPHEMERAL znode for each namespace bundle.
> When the session timeout happens on the local Zookeeper, all of the
> EPHEMERAL znode maintained by this broker will delete automatically. We
> need some mechanism to avoid the unnecessary ownership transfer of the
> bundles. Since the broker cached the owned bundles in memory, the broker
> can use the cache to re-own the bundles.
> Firstly, when the broker to re-own the bundle, if the znode of the bundle
> exists at Zookeeper and the owner is this broker, it may be that Zookeeper
> has not deleted the znode. The broker should check if the ephemeral owner
> is the current session ID. If not, the broker should wait for the znode
> deletion.
> Then the broker tries to own the bundle. If the broker owns the bundle
> success means the bundle not owned by other brokers, the brok

Re: [DISCUESS] PIP 57: Improve Broker's Zookeeper Session Timeout Handling

2020-02-22 Thread Joe F
On Sat, Feb 22, 2020 at 6:28 PM PengHui Li  wrote:

> Hi, joe
>
> The fundamental correctness is guaranteed by the fencing mechanism
> provided by Apache BookKeeper and the CAS operation provided by the
> metadata storage. Both fencing and CAS operations will prevent two owners
> updating data or metadata at the same time.


This may be, as I said  "It may be possible that underlying lower level
locks may prevent catastrophe, but that does not validate this violation
of  basic principles. " I am far too familiar with how BK works, and how
CAS works for ML metadata storage, to know where all the bodies are buried.

This default shutdown behavior isn’t changed. We just introduce an
> alternative way for improving stability when Zookeeper is doing leader
> election or becomes unavailable


 This is  just a claim. I would argue that it does the opposite.


>
>
According to the following rules, I think this will not break current
> system principles.
>
>
> 1. If the znode for the bundle is deleted, this is consistent with the
> current behavior. The broker who acquires for the lock first will become
> the owner.
> 2. If the znode for the bundle is not deleted, other brokers also unable
> to acquire the lock. Both the broker that re-creates the session and other
> brokers need to wait for the znode deletion. Then the broker who acquires
> for the lock first will become the owner.
> 3. If the bundle ownership changed, the broker that re-creates the session
> unable to acquire the lock. So the broker should unload the bundle, This is
> also consistent with current ownership change behavior.
> 4. Also, if unexpected exceptions throw during the re-own process, the
> broker needs to shutdown itself.
>
>
I don't think you address the issue I have raised. Say 30 secs is the
timeout. Let us  say the broker B1 lost connection at t. Then B1 loses the
session at t+30 secs,With your logic, B1 continues to service the topic
as if it still owns it. Meanwhile B2 acquires the topic at t+31 and loses
its connection at t+32. (and loses its session at t+62) At t+62 B3 acquires
it. .And loses it connection at t+63. Now B4 acquires it  B4 crashes. Now
the original broker B1 reacquires the session and goes own as if nothing
occurred in between, merrily operating as if nothing occurred in the
meantime ( and so could B2 and B3 ).

All fine, as you say,  because of lower level locks in BK and ML to prevent
catastrophe...

If you want to make the case that bundle ownership does not guarantee
underlying topic ownership, and topic ownership is arbitrated  by BK/
ML(metadata),  then explicitly make that case. Then we can debate the
merits of that, and see if the code and design allows for it.  Because as
it is, that is not how Pulsar is designed. Now, topic ownership is
arbitrated by the bundle lock.  This is not a change that should casually
be slipped into the system.

And my original qn still stands - if this session loss is such an issue
for  some use cases, why not raise the session timeout?  The broker can
safely keep the session for longer. That's far preferable to running the
risk of doing this.


>
> Thanks,
> Penghui
> On Feb 22, 2020, 12:27 +0800, PengHui Li , wrote:
> > Hi all,
> >
> > I have drafted a proposal for improving broker's Zookeeper session
> timeout handling. You can find at
> https://github.com/apache/pulsar/wiki/PIP-57%3A-Improve-Broker%27s-Zookeeper-Session-Timeout-Handling
> >
> > Also I copy it to the email thread for easier to view. Any suggestions
> or ideas welcome to join the discussion.
> >
> >
> > PIP 57: Improve Broker's Zookeeper Session Timeout Handling
> > Motivation
> > In Pulsar, brokers use Zookeeper as the configuration store and broker
> metadata maintaining. We can also call them Global Zookeeper and Local
> Zookeeper.
> > The Global Zookeeper maintains the namespace policies, cluster metadata,
> and partitioned topic metadata. To reduce read operations on Zookeeper,
> each broker has a cache for global Zookeeper. The Global Zookeeper cache
> updates on znode changed. Currently, when the present session timeout
> happens on global Zookeeper, a new session starts. Broker does not create
> any EPHEMERAL znodes on global Zookeeper.
> > The Local Zookeeper maintains the local cluster metadata, such as broker
> load data, topic ownership data, managed ledger metadata, and Bookie rack
> information. All of broker load data and topic ownership data are create
> EPHEMERAL nodes on Local Zookeeper. Currently, when session timeout happens
> on Local Zookeeper, the broker shutdown itself.
> > Shutdown broker results in ownership change of topics that the broker
> owned. However, we encountered lots of problems related to the current
> session timeout handling. Such as broker with long JVM GC pause, Local
> Zookeeper under high workload. Especially the latter may cause all broker
> shutdowns.
> > So, the purpose of this proposal is to improve session timeout handling
> on Local Zookeeper to avoid unneces

Re: [DISCUESS] PIP 57: Improve Broker's Zookeeper Session Timeout Handling

2020-02-24 Thread Joe F
Sjiie, Penghui,

Thank you. Can we get the PIP to be a more detailed write-up  ?  I would
like this PIP to be more comprehensive

>>Hence we need to draw an agreement on understanding
>>WHAT is actually guarantees the correctness in current Pulsar design. We
>>then can move forward with a conclusion about how to do it.

That would be great.  I have been thinking recently about whether we can
formalize the system through a TLA model. It would be ideal,  but it will
also require BK to provide one. Whether we use TLA or not, we should have
an understanding of WHAT is actually guarantees the correctness in current
Pulsar design _written down_ . At least then we  will have a model against
which we can align change and fixes.

Joe

On Sun, Feb 23, 2020 at 11:44 PM Sijie Guo  wrote:

> Sorry for the late reply.
>
> Joe - There are two things I would like to clarify first.
>
> 1) I think you have a misunderstanding about the zookeeper lock "ephemeral
> znode" and bookkeeper/ML fencing. Let's step back to understand the current
> Pulsar's behavior first.
>
> - A zookeeper lock doesn't prevent a dural-writers situation happening. A
> simple case - Node A creates a lock and Node B stands by. If the lock (the
> ephemeral znode expired at zookeeper), Node B acquires the lock and becomes
> the owner. But Node A might NOT receive the session expire notification
> because of a network partition, hence A still thinks it is the owner. So
> there is a given duration that both A and B think they are the owners.
>
> - The correctness of using a zookeeper lock should be gated by the
> exclusiveness of a resource. In ML, the exclusiveness is provided by
> single-writer semantics offered by bookkeeper and CAS operations offered by
> zookeeper.
>
> So a zookeeper lock (or an external locking mechanism) only ensures
> "stable" ownership of a resource in a long duration. but it doesn't prevent
> dural ownerships. The resource itself should provide a mechanism to ensure
> exclusiveness. BookKeeper/ML does that via fencing. HBase uses ACL to
> achieve "fencing" for regions.
>
> Martin Kleppmann wrote a blog post about this. It is a well-written blog
> post to check out.
>
> https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
>
> 2) Increasing session timeout doesn't prevent session expired. Session
> expiry can still happen when zookeeper leader is crashed, or client paused,
> or any network hiccups. Increasing session timeout introduces side-effects.
> The broker fail-over time will increase as well. It means the topic
> unavailable duration is increased when a broker is going down.
>
>
> ---
>
> Penghui, Joe,
>
> I think there are multiple things coupled in this discussion relate to
> zookeeper.
>
> ZooKeeper is mainly used for two places.
>
> 1) "ownership" for bundles and service discovery for "brokers". It uses
> ephemeral znodes. They are session related and will be impacted by session
> expiries.
> 2) metadata management for both policies and ML. It uses persistent znodes.
> They are not session related. But unfortunately, they are also impacted by
> session expiries. Because zookeeper ties session management to connection
> management.
>
> For 2), it is safe to retry creating a zookeeper client to establish a
> session when the session expired. We can give as high session expire time
> as possible, since they don't impact failover time.
>
> So the main discussion should be about 1) - whether we are safe to
> re-establish a zookeeper session to re-acquire bundles after the previous
> session is expired. Hence we need to draw an agreement on understanding
> WHAT is actually guarantees the correctness in current Pulsar design. We
> then can move forward with a conclusion about how to do it.
>
> - Sijie
>
> On Sat, Feb 22, 2020 at 11:40 PM Joe F  wrote:
>
> > On Sat, Feb 22, 2020 at 6:28 PM PengHui Li 
> > wrote:
> >
> > > Hi, joe
> > >
> > > The fundamental correctness is guaranteed by the fencing mechanism
> > > provided by Apache BookKeeper and the CAS operation provided by the
> > > metadata storage. Both fencing and CAS operations will prevent two
> owners
> > > updating data or metadata at the same time.
> >
> >
> > This may be, as I said  "It may be possible that underlying lower level
> > locks may prevent catastrophe, but that does not validate this violation
> > of  basic principles. " I am far too familiar with how BK works, and how
> > CAS works for ML metadata storage, to know where all the bodies are
> buried.
> >
> > This default shutdown behavior isn’t ch

Re: Reschedule Pulsar Summit SF to August/September

2020-03-10 Thread Joe F
Thank you Sijie. Given what we know, this is the right thing to do at this
time

Joe

On Tue, Mar 10, 2020 at 10:38 AM Sijie Guo  wrote:

> Dear all,
>
> In light of growing concern and the further spread of the COVID-19
> (Corona) virus and after close consultation with event stakeholders and the
> organizing parties, we have decided to reschedule the Pulsar Summit San
> Francisco *from April to August/September*. It was decided that with due
> regard to the health and safety of our attendees, our hosting city, and the
> ever-increasing travel restrictions, it was necessary to reschedule the
> event.
>
> We are still working closely with the conference center on confirming the
> new date for the conference. We are holding on publishing the schedule and
> opening the registration until the rescheduled date is confirmed. Thank you
> very much for your understanding!
>
> Please reach out to us if you have any questions.
>
> - Sijie on behalf of Pulsar Summit organizers
>


Re: ReadOnly Topic Ownership Support

2020-05-21 Thread Joe F
Very useful feature.

I would like the proposers  to think just beyond scaling consumers. If done
right, this has the potential to open up a lot of use cases  in ML, where
you need to reprocess old/archived  data. Being able to spin up a read-only
broker, ( dedicated brokers that read from tiered storage, without
interfering with current flow of the stream)  is extremely valuable. With
small tweaks to this PIP, about data access boundaries,  and without lot of
additional complexity to this proposal, that  can be achieved

On Tue, May 12, 2020 at 5:37 AM Jia Zhai  wrote:

> 👍
>
> Best Regards.
>
>
> Jia Zhai
>
> Beijing, China
>
> Mobile: +86 15810491983
>
>
>
>
> On Fri, May 8, 2020 at 4:29 AM Sijie Guo  wrote:
>
> > Dezhi, thank you for sharing the proposal!
> >
> > It is great to see Tencent started contributing this great feature back
> to
> > Pulsar! This feature will unlock a lot of new capabilities of Pulsar.
> >
> > I have moved the proposal to
> >
> >
> https://github.com/apache/pulsar/wiki/PIP-63:-Readonly-Topic-Ownership-Support
> >
> > - Sijie
> >
> >
> > On Thu, May 7, 2020 at 5:23 AM dezhi liu  wrote:
> >
> > > Hi all,
> > > Here is a suggest (PIP) ReadOnly Topic Ownership Support
> > > 
> > > # PIP-63: ReadOnly Topic Ownership Support
> > >
> > > * Author: Penghui LI, Jia Zhai, Sijie Guo, Dezhi Liu
> > >
> > > ## Motivation
> > > People usually use Pulsar as an event-bus or event center to unify all
> > > their message data or event data.
> > > One same set of event data will usually be shared across multiple
> > > applications. Problems occur when the number of subscriptions of same
> > topic
> > > increased.
> > >
> > > - The bandwidth of a broker limits the number of subscriptions for a
> > single
> > > topic.
> > > - Subscriptions are competing the network bandwidth on brokers.
> Different
> > > subscription might have different level of severity.
> > > - When synchronizing cross-city message reading, cross-city access
> needs
> > to
> > > be minimized.
> > >
> > > This proposal is proposing adding readonly topic ownership support. If
> > > Pulsar supports readonly ownership, users can then use it to setup a
> > (few)
> > > separated broker clusters for readonly, to segregate the consumption
> > > traffic by their service severity. And this would also allow Pulsar
> > > supporting large number of subscriptions.
> > >
> > > ## Changes
> > > There are a few key changes for supporting readonly topic ownership.
> > >
> > > - how does readonly topic owner read data
> > > - how does readonly topic owner keep metadata in-sync
> > > - how does readonly topic owner handle acknowledges
> > >
> > > The first two problems have been well addressed in DistributedLog. We
> can
> > > just add similar features in managed ledger.
> > >
> > > ### How readonly topic owner read data
> > >
> > > In order for a readonly topic owner keep reading data in a streaming
> way,
> > > the managed ledger should be able to refresh its LAC.  The easiest
> change
> > > is to call `readLastAddConfirmedAsync` when a cursor requests entries
> > > beyond existing LAC. A more advanced approach is to switch the regular
> > read
> > > entries request to bookkeeper’s long poll read requests. However long
> > poll
> > > read requests are not support in the bookkeeper v2 protocol.
> > >
> > > Required Changes:
> > >
> > > - Refresh LastAddConfirmed when a managed cursor requests entries
> beyond
> > > known LAC.
> > > - Enable `explicitLac` at managed ledger. So the topic writable owner
> > will
> > > periodically advance LAC, which will make sure readonly owner will be
> > able
> > > to catch with the latest data.
> > >
> > > ### How readonly topic owner keep metadata in-sync
> > >
> > > Ledgers are rolled at a given interval. Readonly topic owner should
> find
> > a
> > > way to know the ledgers has been rolled. There are a couple of options.
> > > These options are categorized into two approaches : notification vs
> > > polling.
> > >
> > > *Notification*
> > >
> > > A) use zookeeper watcher. Readonly topic owner will set a watcher at
> the
> > > managed ledger’s metadata. So it will be notified when a ledger is
> > rolled.
> > > B) similar as A), introduce a “notification” request between readonly
> > topic
> > > owner and writable topic owner. Writable topic owner notifies readonly
> > > topic owner with metadata changes.
> > >
> > > *Polling*
> > >
> > > C) Readonly Broker polling zookeeper to see if there is new metadata,
> > > *only* when LAC in the last ledger has not been advanced for a given
> > > interval. Readonly Broker checks zookeeper to see if there is a new
> > ledger
> > > rolled.
> > > D)Readonly Broker polling new metadata by read events from system topic
> > of
> > > write broker cluster, write broker add the ledger meta change events to
> > the
> > > system topic when mledger metadata update.
> > >
> > > Solution C) will be the simplest solution to start with
> > >
> > > ### How does readonly topic

Re: Proposal for Consumer Filtering in Pulsar brokers

2020-11-16 Thread Joe F
We have had discussions in the community list on server side logic
previously. I would like to keep the specific proposal in this PIP aside,
and address what this PIP is  implicitly changing in core Pulsar design.  I
want to have an explicit discussion on that topic: what is the path for
server-side business logic in Pulsar?

Pulsar has been designed to do a few things very well.  It is designed to
be run as a hosted service, meaning it can be scaled horizontally by adding
storage or compute hardware, as traffic or tenants on the service grows. It
is optimized for data streaming at  throughput and scale,  and does
multi-tenancy extremely well.  Part of that design is that there is no
business logic that is in the data flow path. Since  business logic lives
outside of the core data flow path in Pulsar, the core is optimized for
data flow. Do plain byte movement - no ser/de, no byte copy, no
computations - and do it extremely well. Other systems, like Kafka and
Kinesis have taken the same approach;  no to server side business logic.

This particular PIP  may be  expensive on the server, or not. The next PIP
could be, and there is no rationale to stop adding any kind of business
logic into the broker, once this concept is allowed.

Selective consumers are an anti-pattern for data flow systems. There are
systems out there that support implementation of business logic in the data
flow path, and they don't scale.   Take the example of AMQ.   AMQ allows
JMS/SQL-92 expressions server side. Once the door to this anti-pattern  is
opened, there is no rhyme or reason to deny anything, upto  including a
full-blown SQL query evaluation in the dispatch path.

So why not allow that? Why not allow a full blown expression evaluation in
the data flow path?

Unfortunately there  is no way to answer this without bringing up the
conflict of interest between small users vs. large scale users running
multi-tenant hosted Pulsar, at huge traffic volumes.

For low scale, single (or few) tenant installations, efficiency of flow,
latency and throughput are not the driving concern. In a small cluster,
the implications of cost and scale, is minimal in absolute terms,  when
server side business logic is executed.

For large scale users (like me) this is a no go. There are many problems
with this,  that makes it very difficult to run a hosted platform with
predictable  SLAs, once users can introduce business logic into the broker.
These are on top of the performance and cost  implications

First, broker throughput and performance becomes unpredictable.  The
current Pulsar load model (and it is used in the load manager for load
balancing) becomes unusable. Not only that, there will be no pre-computed
model that can be used in the load manager. Since  the producer and
consumer randomly decide on what is the business logic,and the computation
can change based on the data,  the model itself becomes dynamic and the
load manager has to rebuild the model anytime an user updates the business
logic. That is a tall order, worth years of work to implement.

Second, this introduces the noisy neighbor issue. Two tenants will happily
run on the same broker, till one of them decides to change the logic on the
subscription, and suddenly the  quality for the other tenant is degraded
because the broker is impacted.  The system operator of the cluster has now
to get involved out of the blue, because one tenant did a change.
Basically  any tenant can disrupt the system by triggering additional
business logic in the server, or by specific data patterns that can make
the business logic expensive on the server

Third, this makes provisioning capacity impossible. Today Pulsar users can
be provisioned on flow - bw in/out. Msgs in/out.  With server side business
logic, there is some random overhead that needs to be accounted in the
capacity calculation.

We, who run Pulsar as a hosted service, do not want any of our tenants to
introduce server side logic into the service.  Because,  to do it well
requires a load balancer that can continuously and dynamically adjust its
load model and capacity model (based on ML on the traffic maybe).  The
scope of building such a system will convert Pulsar  from a  data streaming
project  to a load balancer/resource manager  project. The only viable
solution will be to give each tenant their own dedicated servers - at which
point all claims to multi-tenancy in Pulsar  should be dropped.


So large multi-tenant clusters will have big problems with the addition of
business logic into the broker.

But this problem - Pulsar users attempting to add server side logic into
Pulsar - is not going to go away. There will always be yet another new user
who will ask for adding ‘one more simple implementation' of server side
business logic into the broker.

My suggestion here is simple. Make the dispatcher a configurable module.
Let users who want to do server side logic configure their own
computational logic in custom dispatchers and   use it

Re: [PIP-78] Split the individual acknowledgments into multiple entries

2021-01-19 Thread Joe F
I have a simpler question. Just storing the message-ids raw will fit ~300K
entries in one ledger entry. With the bitmap  changes, we can store a
couple of million  within one 5MB ledger entry.  So can you tell us what
numbers of unacked messages are  creating a problem?  What exactly are the
issues you face, and at what numbers of unacked messages/memory use etc?

I have my own concerns about this proposal, but I would like to understand
the problem first

Joe

On Sun, Jan 17, 2021 at 10:16 PM Sijie Guo  wrote:

> Hi Lin,
>
> Thanks you and Penghui for drafting this! We have seen a lot of pain points
> of `managedLedgerMaxUnackedRangesToPersist` when enabling delayed messages.
> Glad that you and Penghui are spending time on resolving this!
>
> Overall the proposal looks good. But I have a couple of questions about the
> proposal.
>
> 1. What happens if the broker fails to write the entry marker? For example,
> at t0, the broker flushes dirty pages and successfully writes an entry
> marker. At t1, the broker tries to flushes dirty pages but failed to write
> the new entry marker. How can you recover the entry marker?
>
> 2.  When a broker crashes and recovers the managed ledger, the cursor
> ledger is not writable anymore. Are you going to create a new cursor ledger
> and copy all the entries from the old cursor ledger to the new one?
>
> It would be good if you can clarify these two questions.
>
> - Sijie
>
> On Sun, Jan 17, 2021 at 9:48 PM linlin  wrote:
>
> > Hi, community:
> > Recently we encountered some problems when using individual
> > acknowledgments, such as:
> > when the amount of acknowledgment is large, entry writing fails; a large
> > amount of cache causes OOM, etc.
> > So I drafted a PIP in `
> >
> >
> https://docs.google.com/document/d/1uQtyb8t6X04v2vrSrdGWLFkuCkBcGYZbqK8XsVJ4qkU/edit?usp=sharing`
> 
> > <
> https://docs.google.com/document/d/1uQtyb8t6X04v2vrSrdGWLFkuCkBcGYZbqK8XsVJ4qkU/edit?usp=sharing
> >
> > ,
> > any voice is welcomed.
> >
>


Re: [Discuss] PIP to add system topic for topic creation/deletion events

2021-04-21 Thread Joe F
I would be very careful about implementing  such a feature, because of
introducing  undesirable interdependencies. Broker processes only talk to
the metadata store or data store. This keeps brokers isolated from each
other - one broker is not dependent on the functioning of another broker.

A broker publishing to a topic hosted on another broker (which for eg: is
serving "system topic"),  sets up an undesirable dependency,  which reduces
total system resiliency and availability for the cluster. These are better
implemented as notifications off the metadata changes.

Good feature, but needs careful thought to do it right
Joe

On Wed, Apr 21, 2021 at 4:03 PM Michael Marshall 
wrote:

> Thanks for your response, PengHui.
>
> I think this feature would be useful to end users for cluster management,
> which is why I want to contribute a first class feature instead of writing
> my own plugin that would add little value to the community.
>
> > With the broker interceptor you can intercept all the REST API request
> and response, Pulsar commands between the broker and clients.
>
> Based on looking through the interceptor trait, I don't see a way to
> trigger events based on auto created/deleted topics. For example, when a
> producer connects to a broker for a nonexistent topic (assuming auto topic
> creation is allowed), a managed ledger, and thus a topic, is created
> without ever interacting with that interceptor trait. The same appears to
> be true for garbage collected topics. I think we'll need more than this
> interceptor to properly capture all cases where topics are created or
> deleted.
>
> Regarding my reference to potential further work, it does appear that low
> level auditing of connections and pulsar commands could be covered by the
> interceptor. However, it would still be on the end user to implement such
> functionality.
>
> Thanks,
> Michael
>
>
> On Wed, Apr 21, 2021 at 3:51 AM PengHui Li 
> wrote:
>
> > Hi Michael,
> >
> > Currently, Pulsar supports a pluginable Broker Interceptor, you can find
> > it here
> >
> https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-broker/src/main/java/org/apache/pulsar/broker/intercept/BrokerInterceptor.java
> >
> > With the broker interceptor you can intercept all the REST API request
> and
> > response, Pulsar commands between the broker and clients.
> > This can be used to audit the system events.
> >
> > Thanks,
> > Penghui
> > On Apr 21, 2021, 5:13 AM +0800, Michael Marshall  >,
> > wrote:
> > > Hello all,
> > >
> > > I would like to propose adding a new feature to Pulsar that will
> require
> > a
> > > PIP. In addition to feedback on the proposed feature, I am looking for
> > > guidance on how to go about creating the PIP. Thanks for any help you
> can
> > > provide.
> > >
> > > I would like to add an optional system topic where topic creation and
> > topic
> > > deletion events are published. This feature will make it easier to
> > leverage
> > > the auto topic creation and inactive topic deletion features by
> > providing a
> > > way for users to reactively discover changes to topics. The largest
> > benefit
> > > is that users won't need to poll for these updates with an admin
> client.
> > > Instead, they will get them as messages.
> > >
> > > I looked to see if an equivalent feature already exists, but I don't
> see
> > > one. For reference, the `PatternMultiTopicsConsumerImpl` currently
> polls
> > > for all topics in a namespace and then does set operations to compute
> the
> > > "new" topics to which it should subscribe. This client implementation
> > could
> > > possibly leverage the new feature.
> > >
> > > There are still details I need to work out, like how it will work for
> > > partitioned vs unpartitioned topics and what kind of guarantees we have
> > > regarding messaging semantics (I think we'll want at least once message
> > > delivery here). I plan to include these details in the PIP with
> > discussions
> > > about trade offs for different implementations.
> > >
> > > Does this feature sound helpful and reasonable to others? If so, is the
> > > next step to formally write a proposal in a Google Doc or to put
> > together a
> > > doc on the Pulsar GitHub Wiki?
> > >
> > > Related and/or future work to consider in this design: I can see adding
> > > different system topics for these types of auditable system events. We
> > > currently rely on log lines as our primary way for end users to audit
> > > system events, e.g. a producer connecting to a broker or a subscription
> > > getting created, but we could instead have topics that represent
> streams
> > of
> > > these different kinds of events. A persistent topic could make these
> > audit
> > > events more durable and more structured which should lend themselves to
> > > being more easily analyzed. Further, users could choose to turn on/off
> > > these audit events, perhaps at the broker or namespace level, to fit
> > their
> > > own needs.
> > >
> > > L

Re: Lack of retries on TooManyRequests

2021-08-06 Thread Joe F
Suppose you have about a million topics and your Pulsar cluster goes down
(say, ZK down). ..many millions of producers and consumers are now
anxiously awaiting the cluster to comeback. .. fun experience for the first
broker that comes up.   Don't ask me how I know,  ref blame
ServerCnx.java#L429
.
The broker limit was added to get through a cold restart.

-j


On Fri, Aug 6, 2021 at 12:29 PM Ivan Kelly  wrote:

> Inline
>
> > In that scenario,
> > should we block or fail fast and let the application decide which is what
> > we do today? Also, should we distinguish between the two scenarios, i.e.
> > broker sends the error vs client internally throws the error?
> While I agree that the client limit is different to the broker limit,
> how likely are we to hit the client limit? 50k lookups is a lot. How
> many topics/partitions will a single client be talking to.
>
> Broker level limiting is a funny one. What we've seen is that
> TooManyRequest will only trigger if the server has to go to zookeeper
> to look up the topic. Otherwise, if the broker has cached the
> assignment, you'll never hit TooManyRequests as the handler is pretty
> much synchronous from this point. What is more likely to happen is
> that the request will timeout as it is queued in the TCP queue while
> waiting for other lookups to be processed. So TooManyRequests and
> request timeout are basically equivalent in the bad case.
>
> In terms of what the client should do, it should probably be
> configurable. In most cases, the default will be to block. The client
> isn't going to go "oh well, pulsar down, time to go home". Most
> likely, if we error, process will crash, restart and try the same
> thing again.
>
> -Ivan
>


Re: Proposing a meetup organizing committee

2021-08-20 Thread Joe F
The conversation here seems  incoherent because of a few factors. One is by
the use of "community'' and "project" interchangeably,  as required to
support this proposal  - community in one context  to support holding
meetups/conferences and  at the same time asking  the PMC to manage this
effort under the project.  Adding to that confusion is  that part of this
conversation is happening on the private list.

There are things assumed in this proposal  that are implied, and not
explicit. The issue is not about the PMC and the creation of a
sub-committee/working group/Umbrella group, ( however you name it )  but
with what it implies.

Consider* "(with representatives from multiple vendors as well as
unaffiliated participants)". *  That seems  like corporations/vendors
getting rights/endorsements/blessings, via some  governance/PMC/ blessed
roles,  bypassing Apache meritocracy for individuals- in this case, by
means of  "sub-committees/working/umbrella groups ".

- It is very clear that ASF does not allow corporations to participate
directly in Apache project management.
- It is also clear that there is nothing limiting any vendor - other than
compliance to ASF policy - to market, sell software, organize conferences,
meetups etc

So what is new here in this proposal ?  Other than "vendor representation"
as a means to  bypass the meritocratic constraint on the project, and
introduce vendor rights and privileges into the project?

*>> "what we meant to say in the Marketing/Communications working group
proposal is that we wanted a diversity of members, rather than all
volunteers to be from the same company or dominated by one company."*

The vast majority of Pulsar PMC and committers are not affiliated with any
vendor, and are just Pulsar users.
Vendor representation, by itself,  is not  a basis for anything in ASF
projects. Vendors are not directly represented in the project  . It's
individuals. This seems like asserting vendor neutrality trumps merit, and
merit should be sacrificed for vendor neutrality.  I see that as hard to
buy. Marketing smells of commercial activity, dragging the PMC into vendor
business activities


*>>Having an Umbrella Group also prevents or at least makes it tougher for
the “wild west” of meetup organizations to happen.  For Apache Hadoop, both
Cloudera and Hortonworks sponsored competing meetups early on, which led to
tons of problems for that community around vendor neutrality.*
This seems a roundabout way of demanding that  PMC should
mediate/endorse/coordinate among vendors, under the perceived  cloud of
"else bad things will happen".
[As an aside,  neither Cloudera nor Hortonworks had any rights by virtue of
just being a vendor, There were merited individuals  in both camps]. .


*>>but it would be highly unfortunate for the PMC to say "we don't want to
be responsible for this AND no one from the community is allowed to do this
either",*
 Enforcing compliance to ASF policy cannot be equated  to prohibition of
anyone. There is nothing prohibiting  vendors/users/groups to host their
own groups/meetups/events . ASF already has an event/branding policy that
lays out how this can be done, and it's neutral and allows anyone to host
events.

Vendors/Users are also free to associate  in whatever manner they choose,
and host events,  subject to the same ASF policy. They don't need  the PMC
to manage this under the Project flag to do so. Anyone can follow ASF
policy and have as many events as needed.  The more of these events, the
better it is.

This proposal  implicitly demands that  being a vendor, by itself, should
confer some privileges/rights or blessings by the project PMC  (call it
membership in working group/subcommittee/Umbrella group .. ) and that  the
PMC should get into the business of running/marketing vendor activities.
That seems to stand on its head the Apache policy of vendor neutrality.
It's essentially insisting that the PMC actively market all vendors,
instead of none.

I  think there is no reason for the PMC/project to  take on the "Project
should manage vendors/vendor activities" role, or provide rights to
vendors.   It is not the PMC's role to manage vendors, mediate  between
vendors or  to promote/market vendor interests.

There is a well established,  time-tested  ASF policy on events and
branding, and there is no need to invent a new one.  This proposal is a
solution in search of a problem.

-j


On Thu, Aug 19, 2021 at 4:26 PM Aaron Williams  wrote:

> Hello all,
>
> I think that there is some confusion on a couple of terms.
>
> Vendor Neutrality-  What we said caused a lot of confusion, what we meant
> to say in the Marketing/Communications working group proposal is that we
> wanted a diversity of members, rather than all volunteers to be from the
> same company or dominated by one company.  Community members want to
> volunteer to promote the project and not a company or group of companies.
> If they feel that their hard work is used to promote the Community the

Re: Protocol Handlers in Pulsar Proxy

2021-08-28 Thread Joe F
To give some history and context, Pulsar proxy was meant to be a barebones
TCP proxy when it was built.   It's sole reason to exist was to forward
network traffic to the right host.  Discovery/authn/z was a dependency .
The way it came around, it was for a narrow use case  (and it was not for
k8s), and a quick and dirty solution

I am all for  making the life of protocol developers easier. I'm just
concerned that something that was hastily done  is now evolving into a
full-fledged service in a piece-meal, spaghetti style.  (There was another
proposal a few days ago for  dynamic proxy roles.)

We can let the Proxy evolve like the proverbial ball of mud, or put some
thought into it

-j

On Fri, Aug 27, 2021 at 2:04 PM Michael Marshall 
wrote:

> +1 Thanks for your proposal, Enrico.
>
> I completely agree that the Pulsar Proxy is an integral component in a
> Pulsar cluster running on k8s. Further, considering that the proxy
> interacts with clients as if it were a broker and that we already support
> protocol handlers in the broker, I think it is a logical next step to add
> support for protocol handlers in the proxy.
>
> I look forward to reviewing the PIP.
>
> - Michael
>
> On Fri, Aug 27, 2021 at 8:21 AM Enrico Olivelli 
> wrote:
>
> > Hello,
> > Currently we have the ability to add Protocol Handlers to the Pulsar
> > Broker, this is great, because you can add your code that uses internal
> > Pulsar APIs and implement your own protocols.
> >
> > When you run Pulsar in k8s (and this is happening more and more) you need
> > to run the Pulsar proxy.
> > The Pulsar proxy is put in front of a Pulsar Cluster and allows clients
> > outside of the cluster to access with a single endpoint (the proxy may be
> > replicated, but let's not enter too much into the details).
> >
> > As we are doing for the Pulsar Broker I would like to add support for
> > adding ProtocolHandlers to the Pulsar Proxy service.
> > The API will be the same, apart from the fact that you have access to the
> > PulsarProxy object instead of the PulsarBroker.
> >
> > It would be great to see this feature, if you have ever come to create
> your
> > own Pulsar proxy in front of a ProtocolHandler you had to deal with:
> > - Broker Discovery
> > - Authentication
> > - Authorization
> >
> > Reimplementing this, using APIs that are not officially exported by the
> > internals of Pulsar code, is very error prone and also it is very
> difficult
> > to follow Pulsar evolution.
> >
> > PHs for the Proxy will ease the Deployment of Pulsar with PHs as you do
> not
> > need to add other Services/Pods to your cluster.
> >
> > If this idea sounds good to you I will be happy to write up a PIP and
> send
> > the implementation.
> >
> > Enrico
> >
>


Re: PIP-93 Pulsar Proxy Protocol Handlers

2021-09-08 Thread Joe F
Enrico, my initial comment  when you brought up PH was in relation to the
larger question about proxying, rather than looking at this in a limited
fashion on how to  make it easy to add new PH in the proxy.

But specifically with this, here are my comments. Two very
distinct abstractions are being mixed up here, and I'm not sure
whether that is a good idea or not.

The proxy was designed to move bits and bytes without interpretation,  from
one network to the another.  The issue with Pulsar  is that  it requires
some interpretation of the data to find to which server  a client should
connect. .  Protocol translation crept into the proxy, just to be able to
ask this question. Since auth is required to answer this question,  auth
also crept in.Essentially the proxy was built as a TCP proxy, not as a
wire protocol translator.   Some additional hacky things needed to be done
to make it work as a TCP proxy,  and in my opinion those things  should
die away to the fullest extent possible

Because of all this, the current implementation is not ideal.  It's usage
is highly restricted in actual deployments, because of potential security
risks if the proxy is  misconfigured. One needs to be strict about setting
up the proxy  to meet security standards in highly regulated environments.



>And we faced the limitation of the need to create a new proxy service for
>each new protocol, but all of these "proxy services" have in common
>most of the features of the Pulsar proxy.
>When we also came to deal with System Architects it was clear the
>requirement to have only one single "place" to put all of the interactions
>at "cluster level" with Pulsar.

Good idea, a single place seems right. Can the proxy answer the traffic
routing question without interpreting the data? Essentially, move what is
done within the proxy now,  to a well known service within the cluster, and
use that ?

>I think this is a good picture of what I mean:
>- PH in the Broker -> add protocols inside the Broker, work for owned
topics
>- PH in the Proxy -> add protocols in front of the whole Cluster
>There is a good amount of processing that should be executed on the proxy,
>and it is not good to run it on a broker.

 Is a TCP proxy a good place to do wire protocol translation (computation)?
Especially if that translation is a good amount of processing?  if it's not
good to run this much processing on the broker, then it's even worse to run
it on a network proxy. I can foresee this as a path that will lead to
cluster and load management creeping into the proxy, as soon as you move
beyond what a single proxy can handle.

But I think these issues (of n/w vs protocol translation) are moot when you
look at the larger needs of  generic proxy that will support ingress,
configurable protocol handlers, load balancing etc for use with Pulsar. You
can run a bunch of Pulsar's  proxies today, and there is no means to manage
them properly. eg: load balance between them/ manage them as a cluster/
have affinity of proxies to topics/tenants. etc. This applies even before
this PIP (and more so once you add more processing into the proxy).

The Pulsar proxy, as it is,  is not amenable to creating anything like a
service mesh. It would demand a lot of work in the proxy. Hence my
initial comment about the proxy eventually becoming a mudball, and why we
should rethink this entire proxy.

 It is tempting to evolve the Pulsar proxy into a service that supports
everything.. ingress, transformation chains, cluster management  etc .
This  will eventually end up  duplicating something which already exists
elsewhere.  My take is that this is better done by building on top of
something like envoy ( or similar) which has built in and mature  features,
and supported by a wide user base.

-j

On Tue, Sep 7, 2021 at 11:11 PM Enrico Olivelli  wrote:

> (ping)
>
>
> Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli <
> eolive...@gmail.com>
> ha scritto:
>
> > Sijie,
> > Thanks for your questions, answers inline below.
> >
> > Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo 
> ha
> > scritto:
> >
> >> I would like to see the clarification between the broker protocol
> handlers
> >> and proxy protocol handlers before moving it to a vote thread.
> >>
> >
> > A PH in the broker is very useful as it allows you to directly access the
> > ManagedLedger and implement high performance adapters for
> > other wire protocols.
> > The bigger limitation is that you can access efficiently only the topics
> > owned by the local broker.
> > If you try to forward/proxy the request to another broker (you can do it,
> > and this was Matteo's suggestion at the latest Video Community meeting)
> > you have the downside that the broker has to waste resources to do the
> > "proxy work"
> > and you generally want a broker machine to be used only to deal with the
> > local traffic.
> >
> > The load balancing mechanism of the brokers is not meant to deal with
> > additional work due to proxying requests r

Re: Topic metadata

2021-09-22 Thread Joe F
I don't think pulsar should increase its dependency on ZK. Can we store it
somewhere else?

On Wed, Sep 22, 2021, 3:30 PM Matteo Merli  wrote:

> We already have a mechanism to store custom properties in managed
> ledgers. So we don't need to store it separately.
> --
> Matteo Merli
> 
>
> On Wed, Sep 22, 2021 at 3:21 PM Michael Marshall 
> wrote:
> >
> > Hi Enrico,
> >
> > This sounds like a useful feature. Do you expect to use it internally in
> > pulsar or should it be limited to external applications?
> >
> > In thinking about authorization, what level of permission should be
> > required to create/update/delete these labels? It seems to me that these
> > should require some kind of admin permission.
> >
> > Thanks,
> > Michael
> >
> > On Sun, Sep 19, 2021 at 7:21 PM PengHui Li  wrote:
> >
> > > Hi enrico
> > >
> > > +1, sorry I missed the email before. It should be a useful feature for
> > > Pulsar users and I have a discussion with the Flink Pulsar connector
> > > authors several days ago, they also looking for this feature.
> > >
> > > Thanks
> > > Penghui
> > >
> > > Enrico Olivelli 于2021年9月12日 周日17:24写道:
> > >
> > > > Hello,
> > > > I would like to have the ability to store metadata about topics.
> > > >
> > > > This would be very useful as with metadata you could add labels and
> other
> > > > pieces of information that would allow to define the purpose of a
> topic,
> > > > custom application level properties.
> > > > This feature will allow application level diagnostic tools and
> > > maintenance
> > > > tools to not need external databases to store such metadata.
> > > >
> > > > I imagine that we could add a simple key value map (String keys and
> > > String
> > > > values) to the topic.
> > > > These metadata could be set during topic creation and also updated.
> > > >
> > > > I would store these metadata on the Metadata service (zookkeeper).
> > > >
> > > > Is there any interest or any ongoing effort in this direction?
> > > >
> > > >
> > > > Best regards
> > > > Enrico
> > > >
> > >
>


Re: PIP-95 Live migration of producer consumer from one cluster to another.

2021-09-24 Thread Joe F
 Is this mostly addressing a topic cutover or a whole cluster cutover?

What is "pulsar-admin clusters set-generation --generation "

That seems to dangle in the doc, without any explanation





On Thu, Sep 16, 2021 at 11:19 AM Prashant Kumar <
prashant.kumar.si...@gmail.com> wrote:

> Dear Pulsar community.
> I am happy to submit a proposal for *Seamless migration of live traffic
> from one cluster to another cluster.*
>
> Marked down version of the proposal
> <
> https://github.com/pkumar-singh/pulsar/wiki/PIP-95:-Live-migration-of-producer-consumer-or-reader-from-one-Pulsar-cluster-to-another
> >
> .
>
> https://github.com/pkumar-singh/pulsar/wiki/PIP-95:-Live-migration-of-producer-consumer-or-reader-from-one-Pulsar-cluster-to-another
>
> Requesting your support.
> Thanks and regards
> -Prashant
>


Re: Correct semantics of producer close

2021-09-28 Thread Joe F
Clients should not depend on any of this behaviour, since the broker is at
the other end of an unreliable  network connection. The
semantic differences are kind of meaningless from a usability point, since
flushing on close =/= published.  What exactly does "graceful" convey
here?  Flush the  buffer on the client end and hope it makes it to the
server.

Is there a  difference whether you flush(or process) pending messages  or
not? There is no guarantee that either case will ensure the message is
published.

The only way to ensure that messages are published is to wait for the ack.
The correct model should be to wait for return on the blocking API, or wait
for future completion of the async API, then handle any publish errors and
then only close the producer.


On Mon, Sep 27, 2021 at 8:50 PM Yunze Xu 
wrote:

> Hi all,
>
> Recently I found a PR (https://github.com/apache/pulsar/pull/12195 <
> https://github.com/apache/pulsar/pull/12195>) that
> modifies the existing semantics of producer close. There're already some
> communications in this PR, but I think it's better to start a discussion
> here
> to let more know.
>
> The existing implementation of producer close is:
> 1. Cancel all timers, including send and batch container
> (`batchMessageContainer`).
> 2. Complete all pending messages (`pendingMessages`) with
> `AlreadyCloseException`.
>
> See `ProducerImpl#closeAsync` for details.
>
> But the JavaDoc of `Producer#closeAsync` is:
>
> > No more writes will be accepted from this producer. Waits until all
> pending write request are persisted.
>
> Anyway, the document and implementation are inconsistent. But specifically,
> we need to define the behavior for how to process `pendingMessages` and
> `batchMessageContainer` when producer call `closeAsync`.
>
> 1. batchMessageContainer: contains the buffered single messages
> (`Message`).
> 2. pendingMessages: all inflight messages (`OpSendMsg`) in network.
>
> IMO, from the JavaDoc, only `pendingMessages` should be processed and the
> messages in `batchMessageContainer` should be discarded.
>
> Since other clients might have already implemented the similar semantics of
> Java clients. If we changed the semantics now, the behaviors among
> different
> clients might be inconsistent.
>
> Should we add a configuration to support graceful close to follow the
> docs? Or
> just change the current behavior?
>
> Thanks,
> Yunze


Re: Correct semantics of producer close

2021-09-28 Thread Joe F
essages` method includes logic
> > to call `ProducerImpl#failPendingBatchMessages`, which implies that
> > these batched, but not sent, messages have been historically
> > considered "pending".
> >
> > If we view the Javadoc as non-binding, I think my guiding influence
> > for the new design would be that the `closeAsync` method should result
> > in a "graceful" shutdown of the client.
> >
> >> What exactly does "graceful" convey here?
> >
> > This is a great question, and will likely drive the design here. I
> > view graceful to mean that the producer attempts to avoid artificial
> > failures. That means trying to drain the queue instead of
> > automatically failing all of the queue's callbacks. The tradeoff is
> > that closing the producer takes longer. This reasoning would justify
> > my claim that we should first flush the `batchMessageContainer`
> > instead of failing the batch without any effort at delivery, as that
> > would be artificial.
> >
> >> There is no guarantee that either case will ensure the message
> >> is published.
> >
> > I don't think that implementing `closeAsync` with graceful shutdown
> > logic implies a guarantee of message publishing. Rather, it guarantees
> > that failures will be the result of a real exception or a timeout.
> > Since calling `closeAsync` prevents additional messages from
> > delivering, users leveraging this functionality might be operating
> > with "at most once" delivery semantics where they'd prefer to deliver
> > the messages if possible, but they aren't going to delay application
> > shutdown indefinitely to deliver its last messages. If users need
> > stronger guarantees about whether their messages are delivered, they
> > are probably already using the flush methods to ensure that the
> > producer's queues are empty before calling `closeAsync`.
> >
> > I also agree that in all of these cases, we're assuming that users are
> > capturing references to the async callbacks and then making business
> > logic decisions based on the results of those callbacks.
> >
> > Thanks,
> > Michael
> >
> > On Tue, Sep 28, 2021 at 4:58 AM Yunze Xu 
> wrote:
> >>
> >> I can’t agree more, just like what I’ve said in PR 12195:
> >>
> >>> At any case, when you choose `sendAsync`, you should always make use
> of the returned future to confirm the result of all messages. In Kafka,
> it's the send callback.
> >>
> >> But I found many users are confused about the current behavior,
> especially
> >> those are used to Kafka’s close semantics. They might expect a simple
> try
> >> to flush existing messages, which works at a simple test environment,
> even
> >> there's no guarantee for exception cases.
> >>
> >>
> >>
> >>> 2021年9月28日 下午4:37,Joe F  写道:
> >>>
> >>> Clients should not depend on any of this behaviour, since the broker
> is at
> >>> the other end of an unreliable  network connection. The
> >>> semantic differences are kind of meaningless from a usability point,
> since
> >>> flushing on close =/= published.  What exactly does "graceful" convey
> >>> here?  Flush the  buffer on the client end and hope it makes it to the
> >>> server.
> >>>
> >>> Is there a  difference whether you flush(or process) pending messages
> or
> >>> not? There is no guarantee that either case will ensure the message is
> >>> published.
> >>>
> >>> The only way to ensure that messages are published is to wait for the
> ack.
> >>> The correct model should be to wait for return on the blocking API, or
> wait
> >>> for future completion of the async API, then handle any publish errors
> and
> >>> then only close the producer.
> >>>
> >>>
> >>> On Mon, Sep 27, 2021 at 8:50 PM Yunze Xu  >
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> Recently I found a PR (https://github.com/apache/pulsar/pull/12195 <
> >>>> https://github.com/apache/pulsar/pull/12195>) that
> >>>> modifies the existing semantics of producer close. There're already
> some
> >>>> communications in this PR, but I think it's better to start a
> discussion
> >>>> here
> >>>> to let more know.
> >>>>
> >>>> The existing

Re: Correct semantics of producer close

2021-10-01 Thread Joe F
; >> On Wed, Sep 29, 2021 at 7:04 AM Enrico Olivelli 
> wrote:
> > >>>
> > >>> I agree that we must ensure that every pending callback must be
> completed
> > >>> eventually (timeout or another error is not a problem),
> > >>> because we cannot let the client application hang forever.
> > >>> I believe that the application can perform a flush() explicitly and
> also
> > >>> wait for every callback to be executed if that is the requirement.
> > >>>
> > >>> Usually you call close() when:
> > >>> 1. you have a serious problem: you already know that there is a hard
> error,
> > >>> and you want to close the Producer or the Application and possibly
> start a
> > >>> new one to recover
> > >>> 2. you are shutting down your application or component: you have
> control
> > >>> over the callbacks, so you can wait for them to complete
> > >>>
> > >>> So case 2. can be covered by the application. We have to support
> case 1:
> > >>> fail fast and close (no need for flush()) .
> > >>>
> > >>> In my experience trying to implement "graceful stops" adds only
> complexity
> > >>> and false hopes to the users.
> > >>>
> > >>> Enrico
> > >>>
> > >>>
> > >>>
> > >>> Il giorno mer 29 set 2021 alle ore 13:58 Nirvana
> <1572139...@qq.com.invalid>
> > >>> ha scritto:
> > >>>
> > >>>> I agree to try to ensure ”at most once“ when closing。
> > >>>>
> > >>>>
> > >>>> > That would still get controlled by send timeout, after that,
> the send
> > >>>> will fail and close should proceed.
> > >>>> This sounds more in line with “at most once”。
> > >>>>
> > >>>>
> > >>>> -- 原始邮件 --
> > >>>> 发件人:
> > >>>>  "dev"
> > >>>><
> > >>>> matteo.me...@gmail.com>;
> > >>>> 发送时间: 2021年9月29日(星期三) 下午3:55
> > >>>> 收件人: "Dev" > >>>>
> > >>>> 主题: Re: Correct semantics of producer close
> > >>>>
> > >>>>
> > >>>>
> > >>>> > but equally they might be
> > >>>> > surprised when closeAsync doesn't complete because the pending
> > >>>> messages
> > >>>> > can't be cleared
> > >>>>
> > >>>> That would still get controlled by send timeout, after that, the
> send
> > >>>> will fail and close should proceed.
> > >>>>
> > >>>> --
> > >>>> Matteo Merli
> > >>>>  > >>>>
> > >>>> On Wed, Sep 29, 2021 at 12:52 AM Jack Vanlightly
> > >>>>  > >>>> >
> > >>>> > I can see both sides of the argument regarding whether to flush
> > >>>> pending
> > >>>> > messages or not. But I think what is definitely in the
> contract is
> > >>>> not to
> > >>>> > discard any callbacks causing user code to block forever. No
> matter
> > >>>> what,
> > >>>> > we must always call the callbacks.
> > >>>> >
> > >>>> > Personally, I am in favour of a close operation not flushing
> pending
> > >>>> > messages (and I define pending here as any message that has a
> > >>>> callback).
> > >>>> > The reason is that if we wait for all pending messages to be
> sent
> > >>>> then we
> > >>>> > now face a number of edge cases that could cause the close
> operation
> > >>>> to
> > >>>> > take a very long time to complete. What if the user code
> really just
> > >>>> needs
> > >>>> > to close the producer right now? If we amend the documentation
> to
> > >>>> make it
> > >>>> > clear that close does not flush pending messages then the user
> is now
> > >>>> able
> > >>>> > to explicitly craft the behaviour they need. If they want all
> messages
> > >>>

Re: [PIP] Broker extensions to provide operators of enterprise-wide clusters better control and flexibility

2021-11-18 Thread Joe F
Agree with Enrico.

I am not clear on how this (allow interception of write and read operations
of a managed ledger and modify the payload)  would work with e2e
encryption.
That is literally a MITM proposal.

Joe

On Thu, Nov 18, 2021 at 9:04 AM Enrico Olivelli  wrote:

> Madhavan,
> Thanks for sharing your PIP.
> It looks interesting, but I see a major problem with this approach.
> Basically we would be adding a way to tweak everything in Pulsar, from
> Connections to what we are reading and writing to storage.
>
> This feature will become very hard to maintain for users, as Pulsar changes
> and there are things that may be different in the future.
>
> We recently had other PIPs that try to add more flexibility and add code
> into Pulsar.
>
> It is not clear to me the kind of operations that you want to cover,
> perhaps we could provide dedicated extensibility points to fulfill your
> needs with specific APIs, that we can maintain and for which we can
> guarantee
> the compatibility in the future
>
>
> Enrico
>
> Il giorno gio 18 nov 2021 alle ore 16:31 Narayanan, Madhavan
>  ha scritto:
>
> > Hi All,
> >
> >I request your help to review, discuss and resolve the problem and
> > solution approach outlined in the PIP entry
> > https://github.com/apache/pulsar/issues/12858
> >
> > Regards,
> > Madhavan
> >
>


Re: Pulsar release 2.2

2018-09-21 Thread Joe F
I was planning to cut the release now, but seems like we still have a few
PRs almost ready, but not committed. Should I hold off till tomorrow
morning?

Joe

On Mon, Sep 17, 2018 at 6:48 PM 李鹏辉gmail  wrote:

> Please contain this PR(
> https://github.com/apache/incubator-pulsar/pull/2543 <
> https://github.com/apache/incubator-pulsar/pull/2543>)  as far as
> possible.
>
> Thanks.
>
> > 在 2018年9月18日,02:30,Joe F  写道:
> >
> > Now that 2.1.1 is completed, I intend to start the Pulsar 2.2 release
> > process in a few days, and would like to freeze the code this Friday.
> >
> > Please ensure that all your committed PR's are merged and complete. I
> still
> > see 11 outstanding items.
> > https://github.com/apache/incubator-pulsar/milestone/16
> >
> > Are we OK with pushing some of these to the next release, if not
> completed
> > by Friday ?
> >
> > Joe
>
>


Re: Pulsar release 2.2

2018-10-02 Thread Joe F
On  it.  The branch is created.  The release instructions are kind of stale
at this point. I'm working through them

On Tue, Oct 2, 2018 at 12:32 PM Sanjeev Kulkarni 
wrote:

> Hi Joe,
> Have we started the 2.2 release process yet?
> Thanks!
>
> On Thu, Sep 27, 2018 at 9:18 PM Dave Fisher  wrote:
>
> > Hurray for rolling the first TLP release!
> >
> > Regards,
> > Dave
> >
> > Sent from my iPhone
> >
> > > On Sep 27, 2018, at 11:43 PM, Joe Francis 
> wrote:
> > >
> > > Good. I will get this going.
> > >
> > > Joe
> > >
> > >> On Thu, Sep 27, 2018 at 8:30 PM, Sijie Guo 
> wrote:
> > >>
> > >> Hi Joe,
> > >>
> > >> I think all the issues for 2.2 are merged :) we are ready to go with
> the
> > >> release :)
> > >>
> > >> - Sijie
> > >>
> > >>
> > >> On Fri, Sep 21, 2018 at 5:54 PM Matteo Merli 
> > >> wrote:
> > >>
> > >>> Hopefully there should be no issues because github transparently
> > >> redirects
> > >>> the repository after renames, though... of course there will be some
> > >> things
> > >>> to polish here and there :)
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Sep 21, 2018 at 5:49 PM Dave Fisher 
> > >> wrote:
> > >>>
> > >>>> BTW - the Infra team just now moved the repository out of the
> > >> incubator!
> > >>>>
> > >>>> Check the release scripts!
> > >>>>
> > >>>> Sent from my iPhone
> > >>>>
> > >>>>> On Sep 21, 2018, at 4:29 PM, Sijie Guo  wrote:
> > >>>>>
> > >>>>> I would suggest holding off a bit. trying to get those PR merged.
> > >>>>>
> > >>>>> - Sijie
> > >>>>>
> > >>>>>> On Fri, Sep 21, 2018 at 4:26 PM Joe F  wrote:
> > >>>>>>
> > >>>>>> I was planning to cut the release now, but seems like we still
> have
> > >> a
> > >>>> few
> > >>>>>> PRs almost ready, but not committed. Should I hold off till
> tomorrow
> > >>>>>> morning?
> > >>>>>>
> > >>>>>> Joe
> > >>>>>>
> > >>>>>>> On Mon, Sep 17, 2018 at 6:48 PM 李鹏辉gmail <
> codelipeng...@gmail.com>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>> Please contain this PR(
> > >>>>>>> https://github.com/apache/incubator-pulsar/pull/2543 <
> > >>>>>>> https://github.com/apache/incubator-pulsar/pull/2543>)  as far
> as
> > >>>>>>> possible.
> > >>>>>>>
> > >>>>>>> Thanks.
> > >>>>>>>
> > >>>>>>>> 在 2018年9月18日,02:30,Joe F  写道:
> > >>>>>>>>
> > >>>>>>>> Now that 2.1.1 is completed, I intend to start the Pulsar 2.2
> > >>> release
> > >>>>>>>> process in a few days, and would like to freeze the code this
> > >>> Friday.
> > >>>>>>>>
> > >>>>>>>> Please ensure that all your committed PR's are merged and
> > >> complete.
> > >>> I
> > >>>>>>> still
> > >>>>>>>> see 11 outstanding items.
> > >>>>>>>> https://github.com/apache/incubator-pulsar/milestone/16
> > >>>>>>>>
> > >>>>>>>> Are we OK with pushing some of these to the next release, if not
> > >>>>>>> completed
> > >>>>>>>> by Friday ?
> > >>>>>>>>
> > >>>>>>>> Joe
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>> --
> > >>> Matteo Merli
> > >>> 
> > >>>
> > >>
> >
> >
>


Re: Pulsar release 2.2

2018-10-02 Thread Joe F
Please, if you know of a problem that is bound to cause issue in release
testing, notify the people working on the release. It will save duplicate
and unnecessary efforts

 I will  the reset the process, recreate 2.2 and reset and restart.

Joe




23:56:31.971 [Timer-0] ERROR
org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process
death exception
java.lang.RuntimeException:
at
org.apache.pulsar.functions.runtime.ProcessRuntime.tryExtractingDeathException(ProcessRuntime.java:287)
[org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
at
org.apache.pulsar.functions.runtime.ProcessRuntime.isAlive(ProcessRuntime.java:274)
[org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
at
org.apache.pulsar.functions.runtime.RuntimeSpawner$1.run(RuntimeSpawner.java:85)
[org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
at java.util.TimerThread.mainLoop(Timer.java:555) [?:1.8.0_181]
at java.util.TimerThread.run(Timer.java:505) [?:1.8.0_181]
23:56:31.972 [Timer-0] ERROR
org.apache.pulsar.functions.runtime.RuntimeSpawner -
test/test-namespace/example-java.lang.RuntimeException:  Function Container
is dead with exception.. restarting


Exception in thread "main"
org.apache.pulsar.functions.runtime.shaded.com.google.protobuf.InvalidProtocolBufferException:
Expect message object but got:
"{\"tenant\":\"test\",\"namespace\":\"test-namespace\",\"name\":\"example\",\"className\":\"org.apache.pulsar.functions.api.examples.ExclamationFunction\",\"userConfig\":\"{\"PublishTopic\":\"test_result\"}\",\"autoAck\":true,\"parallelism\":1,\"source\":{\"typeClassName\":\"java.lang.String\",\"inputSpecs\":{\"test_src\":{}}},\"sink\":{\"topic\":\"test_result\",\"typeClassName\":\"java.lang.String\"},\"resources\":{}}"
at
org.apache.pulsar.functions.runtime.shaded.com.google.protobuf.util.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1296)
at
org.apache.pulsar.functions.runtime.shaded.com.google.protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1273)
at
org.apache.pulsar.functions.runtime.shaded.com.google.protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1155)
at
org.apache.pulsar.functions.runtime.shaded.com.google.protobuf.util.JsonFormat$Parser.merge(JsonFormat.java:338)
at
org.apache.pulsar.functions.runtime.JavaInstanceMain.start(JavaInstanceMain.java:114)





On Tue, Oct 2, 2018 at 3:21 PM Joe F  wrote:

> On  it.  The branch is created.  The release instructions are kind of
> stale at this point. I'm working through them
>
> On Tue, Oct 2, 2018 at 12:32 PM Sanjeev Kulkarni 
> wrote:
>
>> Hi Joe,
>> Have we started the 2.2 release process yet?
>> Thanks!
>>
>> On Thu, Sep 27, 2018 at 9:18 PM Dave Fisher 
>> wrote:
>>
>> > Hurray for rolling the first TLP release!
>> >
>> > Regards,
>> > Dave
>> >
>> > Sent from my iPhone
>> >
>> > > On Sep 27, 2018, at 11:43 PM, Joe Francis 
>> wrote:
>> > >
>> > > Good. I will get this going.
>> > >
>> > > Joe
>> > >
>> > >> On Thu, Sep 27, 2018 at 8:30 PM, Sijie Guo 
>> wrote:
>> > >>
>> > >> Hi Joe,
>> > >>
>> > >> I think all the issues for 2.2 are merged :) we are ready to go with
>> the
>> > >> release :)
>> > >>
>> > >> - Sijie
>> > >>
>> > >>
>> > >> On Fri, Sep 21, 2018 at 5:54 PM Matteo Merli > >
>> > >> wrote:
>> > >>
>> > >>> Hopefully there should be no issues because github transparently
>> > >> redirects
>> > >>> the repository after renames, though... of course there will be some
>> > >> things
>> > >>> to polish here and there :)
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Fri, Sep 21, 2018 at 5:49 PM Dave Fisher 
>> > >> wrote:
>> > >>>
>> > >>>> BTW - the Infra team just now moved the repository out of the
>> > >> incubator!
>> > >>>>
>> > >>>> Check the release scripts!
>> > >>>>
>> > >>>> Sent from my iPhone
>> > >>>>
>> > >>>>> On Sep 21, 2018, at 4:29 PM, Sijie Guo 
>> wrote:
>> > &

Re: Pulsar release 2.2

2018-10-03 Thread Joe F
Yes, I know. -- I spend quite some time debugging this in 2.2, and found
that it was fixed in master after I cut 2.2 ..

Joe

On Wed, Oct 3, 2018 at 12:10 AM Rajan Dhabalia  wrote:

> Hi Joe,
>
> This bug <https://github.com/apache/pulsar/pull/2704> has been fixed in
> master with #2693 <https://github.com/apache/pulsar/pull/2693>. So,
> rebasing with mater will fix it.
>
> Thanks,
> Rajan
>
> On Tue, Oct 2, 2018 at 7:16 PM, Joe F  wrote:
>
> > Please, if you know of a problem that is bound to cause issue in release
> > testing, notify the people working on the release. It will save duplicate
> > and unnecessary efforts
> >
> >  I will  the reset the process, recreate 2.2 and reset and restart.
> >
> > Joe
> >
> >
> >
> >
> > 23:56:31.971 [Timer-0] ERROR
> > org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process
> > death exception
> > java.lang.RuntimeException:
> > at
> > org.apache.pulsar.functions.runtime.ProcessRuntime.
> > tryExtractingDeathException(ProcessRuntime.java:287)
> > [org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
> > at
> > org.apache.pulsar.functions.runtime.ProcessRuntime.
> > isAlive(ProcessRuntime.java:274)
> > [org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
> > at
> > org.apache.pulsar.functions.runtime.RuntimeSpawner$1.run(
> > RuntimeSpawner.java:85)
> > [org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
> > at java.util.TimerThread.mainLoop(Timer.java:555) [?:1.8.0_181]
> > at java.util.TimerThread.run(Timer.java:505) [?:1.8.0_181]
> > 23:56:31.972 [Timer-0] ERROR
> > org.apache.pulsar.functions.runtime.RuntimeSpawner -
> > test/test-namespace/example-java.lang.RuntimeException:  Function
> > Container
> > is dead with exception.. restarting
> >
> >
> > Exception in thread "main"
> > org.apache.pulsar.functions.runtime.shaded.com.google.protobuf.
> > InvalidProtocolBufferException:
> > Expect message object but got:
> > "{\"tenant\":\"test\",\"namespace\":\"test-namespace\"
> > ,\"name\":\"example\",\"className\":\"org.apache.
> > pulsar.functions.api.examples.ExclamationFunction\",\"userConfig\":\"{\"
> > PublishTopic\":\"test_result\"}\",\"autoAck\":true,\"
> > parallelism\":1,\"source\":{\"typeClassName\":\"java.lang.
> > String\",\"inputSpecs\":{\"test_src\":{}}},\"sink\":{\"
> > topic\":\"test_result\",\"typeClassName\":\"java.lang.
> > String\"},\"resources\":{}}"
> > at
> > org.apache.pulsar.functions.runtime.shaded.com.google.
> > protobuf.util.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1296)
> > at
> > org.apache.pulsar.functions.runtime.shaded.com.google.
> > protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1273)
> > at
> > org.apache.pulsar.functions.runtime.shaded.com.google.
> > protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1155)
> > at
> > org.apache.pulsar.functions.runtime.shaded.com.google.
> > protobuf.util.JsonFormat$Parser.merge(JsonFormat.java:338)
> > at
> > org.apache.pulsar.functions.runtime.JavaInstanceMain.
> > start(JavaInstanceMain.java:114)
> >
> >
> >
> >
> >
> > On Tue, Oct 2, 2018 at 3:21 PM Joe F  wrote:
> >
> > > On  it.  The branch is created.  The release instructions are kind of
> > > stale at this point. I'm working through them
> > >
> > > On Tue, Oct 2, 2018 at 12:32 PM Sanjeev Kulkarni 
> > > wrote:
> > >
> > >> Hi Joe,
> > >> Have we started the 2.2 release process yet?
> > >> Thanks!
> > >>
> > >> On Thu, Sep 27, 2018 at 9:18 PM Dave Fisher 
> > >> wrote:
> > >>
> > >> > Hurray for rolling the first TLP release!
> > >> >
> > >> > Regards,
> > >> > Dave
> > >> >
> > >> > Sent from my iPhone
> > >> >
> > >> > > On Sep 27, 2018, at 11:43 PM, Joe Francis 
> > >> wrote:
> > >> > >
> > >> > > Good. I will get this going.
> > >> > >
> > >> > > Joe
> > >> > >
> > >> > >> On Thu, Sep 

Re: Pulsar release 2.2

2018-10-04 Thread Joe F
Update on 2.2 release
  ---  found some issues with schema enforcement when testing yesterday,
and Sijie is working on fixing them.

Joe

On Wed, Oct 3, 2018 at 10:13 AM Joe F  wrote:

> Yes, I know. -- I spend quite some time debugging this in 2.2, and found
> that it was fixed in master after I cut 2.2 ..
>
> Joe
>
> On Wed, Oct 3, 2018 at 12:10 AM Rajan Dhabalia 
> wrote:
>
>> Hi Joe,
>>
>> This bug <https://github.com/apache/pulsar/pull/2704> has been fixed in
>> master with #2693 <https://github.com/apache/pulsar/pull/2693>. So,
>> rebasing with mater will fix it.
>>
>> Thanks,
>> Rajan
>>
>> On Tue, Oct 2, 2018 at 7:16 PM, Joe F  wrote:
>>
>> > Please, if you know of a problem that is bound to cause issue in release
>> > testing, notify the people working on the release. It will save
>> duplicate
>> > and unnecessary efforts
>> >
>> >  I will  the reset the process, recreate 2.2 and reset and restart.
>> >
>> > Joe
>> >
>> >
>> >
>> >
>> > 23:56:31.971 [Timer-0] ERROR
>> > org.apache.pulsar.functions.runtime.ProcessRuntime - Extracted Process
>> > death exception
>> > java.lang.RuntimeException:
>> > at
>> > org.apache.pulsar.functions.runtime.ProcessRuntime.
>> > tryExtractingDeathException(ProcessRuntime.java:287)
>> > [org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
>> > at
>> > org.apache.pulsar.functions.runtime.ProcessRuntime.
>> > isAlive(ProcessRuntime.java:274)
>> > [org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
>> > at
>> > org.apache.pulsar.functions.runtime.RuntimeSpawner$1.run(
>> > RuntimeSpawner.java:85)
>> > [org.apache.pulsar-pulsar-functions-runtime-2.2.0.jar:2.2.0]
>> > at java.util.TimerThread.mainLoop(Timer.java:555) [?:1.8.0_181]
>> > at java.util.TimerThread.run(Timer.java:505) [?:1.8.0_181]
>> > 23:56:31.972 [Timer-0] ERROR
>> > org.apache.pulsar.functions.runtime.RuntimeSpawner -
>> > test/test-namespace/example-java.lang.RuntimeException:  Function
>> > Container
>> > is dead with exception.. restarting
>> >
>> >
>> > Exception in thread "main"
>> > org.apache.pulsar.functions.runtime.shaded.com.google.protobuf.
>> > InvalidProtocolBufferException:
>> > Expect message object but got:
>> > "{\"tenant\":\"test\",\"namespace\":\"test-namespace\"
>> > ,\"name\":\"example\",\"className\":\"org.apache.
>> > pulsar.functions.api.examples.ExclamationFunction\",\"userConfig\":\"{\"
>> > PublishTopic\":\"test_result\"}\",\"autoAck\":true,\"
>> > parallelism\":1,\"source\":{\"typeClassName\":\"java.lang.
>> > String\",\"inputSpecs\":{\"test_src\":{}}},\"sink\":{\"
>> > topic\":\"test_result\",\"typeClassName\":\"java.lang.
>> > String\"},\"resources\":{}}"
>> > at
>> > org.apache.pulsar.functions.runtime.shaded.com.google.
>> > protobuf.util.JsonFormat$ParserImpl.mergeMessage(JsonFormat.java:1296)
>> > at
>> > org.apache.pulsar.functions.runtime.shaded.com.google.
>> > protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1273)
>> > at
>> > org.apache.pulsar.functions.runtime.shaded.com.google.
>> > protobuf.util.JsonFormat$ParserImpl.merge(JsonFormat.java:1155)
>> > at
>> > org.apache.pulsar.functions.runtime.shaded.com.google.
>> > protobuf.util.JsonFormat$Parser.merge(JsonFormat.java:338)
>> > at
>> > org.apache.pulsar.functions.runtime.JavaInstanceMain.
>> > start(JavaInstanceMain.java:114)
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Oct 2, 2018 at 3:21 PM Joe F  wrote:
>> >
>> > > On  it.  The branch is created.  The release instructions are kind of
>> > > stale at this point. I'm working through them
>> > >
>> > > On Tue, Oct 2, 2018 at 12:32 PM Sanjeev Kulkarni > >
>> > > wrote:
>> > >
>> > >> Hi Joe,
>> > >> Have we started the 2.2 release process yet?
>> > >> Thanks!
>> > >>
>> > >> On Thu, Sep 27, 2018 at 9:18 P

[VOTE] Pulsar Release 2.2.0 Candidate 1

2018-10-10 Thread Joe F
This is the first release candidate for Apache Pulsar, version  2.2.0

It adds new features and also fixes for various issues from 2.1.1

 * Pulsar Java Client Interceptors
 * Integration of functions and io with schema registry
 * Dead Letter Topic
 * Flink Source connector
 * JDBC Sink Connector
 * HDFS Sink Connector
 * Google Cloud Storage Offloader
 * Pulsar SQL


A complete list of enhancements and fixes can be viewed at
https://github.com/apache/pulsar/milestone/16?closed=1


*** Please download, test and vote on this release. This vote will stay
open for at least 72 hours ***

Note that we are voting upon the source (tag), binaries are provided for
convenience.

Source and binary files:

https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.2.0-candidate-1/

SHA-512 checksums:

591abefd40ce20b1a1d76717c5322b749ce5254d982b653632ea8e06f99115f1a9f1772139e184f524dbd81d6d230c59b3a6032abcc6185b4d7b06b79a3592c2
./apache-pulsar-2.2.0-bin.tar.gz

b35d4f6a1e5313c51b1d4caabc6e2d88ab30602b6b59436531c512c692b6f9dbb585d1f03a57761dae5b5ab045b3d3281bd49690e0a1a374b0a4ee01bfafac48
./apache-pulsar-2.2.0-src.tar.gz

7dcf6a94eb22785366c2b98cf98c1804b3828798e17d9700a438a41d3934a4f26f415c17003be1c4e664cce98d50f3dd937ccf927f98a5e0de10a3256cb1a4fc
./apache-pulsar-io-connectors-2.2.0-bin.tar.gz


Maven staging repo:
https://repository.apache.org/content/repositories/orgapachepulsar-1029

The tag to be voted upon:
v2.2.0-candidate-1 dd509298aad74f720089f675884196f0cf3
https://github.com/apache/pulsar/releases/tag/v2.2.0-candidate-1

Pulsar's KEYS file containing PGP keys we use to sign the release:
https://dist.apache.org/repos/dist/release/pulsar/KEYS

Please download the the source package, and follow the README to build and
run the Pulsar standalone service.

Here is a guide for validating a release candidate:
https://github.com/apache/pulsar/wiki/Release-Candidate-Validation



Joe


[CANCEL] [VOTE] Pulsar Release 2.2.0 Candidate 1

2018-10-11 Thread Joe F
Given the issue identified, this vote is cancelled until
https://github.com/apache/pulsar/issues/2778 is fixed

Candidate-2 up will be put up for vote after this fix is completed

Thank you
Joe

-- Forwarded message -
From: Matteo Merli 
Date: Thu, Oct 11, 2018 at 3:30 PM
Subject: Re: [VOTE] Pulsar Release 2.2.0 Candidate 1
To: 


We have found there are some issues with Netty shading in master (that are
also reflected in this candidate). This impacts all clients that have other
Netty dependencies in the classpath.

Working on a fix.


Matteo

On Wed, Oct 10, 2018 at 5:16 PM Joe F  wrote:

> This is the first release candidate for Apache Pulsar, version  2.2.0
>
> It adds new features and also fixes for various issues from 2.1.1
>
>  * Pulsar Java Client Interceptors
>  * Integration of functions and io with schema registry
>  * Dead Letter Topic
>  * Flink Source connector
>  * JDBC Sink Connector
>  * HDFS Sink Connector
>  * Google Cloud Storage Offloader
>  * Pulsar SQL
>
>
> A complete list of enhancements and fixes can be viewed at
> https://github.com/apache/pulsar/milestone/16?closed=1
>
>
> *** Please download, test and vote on this release. This vote will stay
> open for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
>
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.2.0-candidate-1/
>
> SHA-512 checksums:
>
>
>
591abefd40ce20b1a1d76717c5322b749ce5254d982b653632ea8e06f99115f1a9f1772139e184f524dbd81d6d230c59b3a6032abcc6185b4d7b06b79a3592c2
> ./apache-pulsar-2.2.0-bin.tar.gz
>
>
>
b35d4f6a1e5313c51b1d4caabc6e2d88ab30602b6b59436531c512c692b6f9dbb585d1f03a57761dae5b5ab045b3d3281bd49690e0a1a374b0a4ee01bfafac48
> ./apache-pulsar-2.2.0-src.tar.gz
>
>
>
7dcf6a94eb22785366c2b98cf98c1804b3828798e17d9700a438a41d3934a4f26f415c17003be1c4e664cce98d50f3dd937ccf927f98a5e0de10a3256cb1a4fc
> ./apache-pulsar-io-connectors-2.2.0-bin.tar.gz
>
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1029
>
> The tag to be voted upon:
> v2.2.0-candidate-1 dd509298aad74f720089f675884196f0cf3
> https://github.com/apache/pulsar/releases/tag/v2.2.0-candidate-1
>
> Pulsar's KEYS file containing PGP keys we use to sign the release:
> https://dist.apache.org/repos/dist/release/pulsar/KEYS
>
> Please download the the source package, and follow the README to build and
> run the Pulsar standalone service.
>
> Here is a guide for validating a release candidate:
> https://github.com/apache/pulsar/wiki/Release-Candidate-Validation
> <
>
https://github.com/apache/incubator-pulsar/wiki/Release-Candidate-Validation
> >
>
>
> Joe
>


Release 2.2.0 is blocked for archive size limit

2018-10-15 Thread Joe F
Issue INFRA-17151 - Increase SVN size limits for archive


Our packages have busted the current size limit

Joe


[VOTE] Pulsar Release 2.2.0 Candidate 2

2018-10-16 Thread Joe F
This is the Second release candidate for Apache Pulsar, version  2.2.0

Release 2.2.0 adds new features and also fixes for various issues from 2.1.1

 * Pulsar Java Client Interceptors
 * Integration of functions and io with schema registry
 * Dead Letter Topic
 * Flink Source connector
 * JDBC Sink Connector
 * HDFS Sink Connector
 * Google Cloud Storage Offloader
 * Pulsar SQL


A complete list of enhancements and fixes can be viewed at
https://github.com/apache/pulsar/milestone/16?closed=1


*** Please download, test and vote on this release. This vote will stay
open for at least 72 hours ***

Note that we are voting upon the source (tag), binaries are provided for
convenience.

Source and binary files:
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.2.0-candidate-2/

SHA-512 checksums:

71513019e2470c3835864a8dc0f3934bbf50c4ee27fd58c0dc0c670eb3c6e88382665265d1138e0c3cf72bb55b5cff7b63887072b739b8e506bfe8185b544063
./apache-pulsar-2.2.0-bin.tar.gz

5b60358f8ea52f5dce5c69b21fe31730bb200fb257982058bfefd74404ed1249937c2eed49422cbea15bfd80b8dcf4fcb80acaa14d46dd016bd56646f2433eff
./apache-pulsar-2.2.0-src.tar.gz

a33536daccb3a5b4cf095fa98c048cda3d3b05bbfbe50d87b0a117c435038ddd8aa2b6f429538f46ace4647c780f1bcae8abfb7465b69058f4185a9e5dfd2a9f
./apache-pulsar-io-connectors-2.2.0-bin.tar.gz


Maven staging repo:

https://repository.apache.org/content/repositories/orgapachepulsar-1030/

The tag to be voted upon:
v2.2.0-candidate-28e4b35f2e0b4f0a5fa4dfa94aeb40bfb366f990e
https://github.com/apache/pulsar/releases/tag/v2.2.0-candidate-2

Pulsar's KEYS file containing PGP keys we use to sign the release:
https://dist.apache.org/repos/dist/release/pulsar/KEYS

Please download the the source package, and follow the README to build and
run the Pulsar standalone service.

Here is a guide for validating a release candidate:
https://github.com/apache/pulsar/wiki/Release-Candidate-Validation



Re: [VOTE] Pulsar Release 2.2.0 Candidate 2

2018-10-23 Thread Joe F
The vote is now closed for Pulsar 2.2.0 Release Candidate 2 with  9  +1s ,
(7 binding) and no -1s

Binding +1
Masahiro Sakamoto
Jim Jagielski
Hiroyuki Sakai
Rajan Dhabalia
Jerry Peng
Nozomi Kurihara
Matteo Merli

Non-binding +1
Yuto Furuta
Ali Ahmed

No  -1

Thank you for your work in validating the release

Joe

On Fri, Oct 19, 2018 at 8:11 AM Masahiro Sakamoto 
wrote:

> +1
>
> - Check signatures
> - Src distribution
>   - RAT check
>   - compile and unit tests
> - Bin distribution
>   - standalone/producer/consumer/function worked
>   - can build Go client using RPM/DEB packages
>
> --
> Masahiro Sakamoto
> Yahoo Japan Corp.
> E-mail: massa...@yahoo-corp.jp
> --
>
> > -Original Message-
> > From: Joe F [mailto:j...@apache.org]
> > Sent: Wednesday, October 17, 2018 3:34 AM
> > To: dev@pulsar.apache.org
> > Subject: [VOTE] Pulsar Release 2.2.0 Candidate 2
> >
> > This is the Second release candidate for Apache Pulsar, version  2.2.0
> >
> > Release 2.2.0 adds new features and also fixes for various issues from
> 2.1.1
> >
> >  * Pulsar Java Client Interceptors
> >  * Integration of functions and io with schema registry
> >  * Dead Letter Topic
> >  * Flink Source connector
> >  * JDBC Sink Connector
> >  * HDFS Sink Connector
> >  * Google Cloud Storage Offloader
> >  * Pulsar SQL
> >
> >
> > A complete list of enhancements and fixes can be viewed at
> > https://github.com/apache/pulsar/milestone/16?closed=1
> >
> >
> > *** Please download, test and vote on this release. This vote will stay
> > open for at least 72 hours ***
> >
> > Note that we are voting upon the source (tag), binaries are provided for
> > convenience.
> >
> > Source and binary files:
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.2.0-candidate-2
> > /
> >
> > SHA-512 checksums:
> >
> > 71513019e2470c3835864a8dc0f3934bbf50c4ee27fd58c0dc0c670eb3c6e883826652
> > 65d1138e0c3cf72bb55b5cff7b63887072b739b8e506bfe8185b544063
> > ./apache-pulsar-2.2.0-bin.tar.gz
> >
> > 5b60358f8ea52f5dce5c69b21fe31730bb200fb257982058bfefd74404ed1249937c2e
> > ed49422cbea15bfd80b8dcf4fcb80acaa14d46dd016bd56646f2433eff
> > ./apache-pulsar-2.2.0-src.tar.gz
> >
> > a33536daccb3a5b4cf095fa98c048cda3d3b05bbfbe50d87b0a117c435038ddd8aa2b6
> > f429538f46ace4647c780f1bcae8abfb7465b69058f4185a9e5dfd2a9f
> > ./apache-pulsar-io-connectors-2.2.0-bin.tar.gz
> >
> >
> > Maven staging repo:
> >
> > https://repository.apache.org/content/repositories/orgapachepulsar-103
> > 0/
> >
> > The tag to be voted upon:
> > v2.2.0-candidate-28e4b35f2e0b4f0a5fa4dfa94aeb40bfb366f990e
> > https://github.com/apache/pulsar/releases/tag/v2.2.0-candidate-2
> >
> > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > https://dist.apache.org/repos/dist/release/pulsar/KEYS
> >
> > Please download the the source package, and follow the README to build
> and
> > run the Pulsar standalone service.
> >
> > Here is a guide for validating a release candidate:
> > https://github.com/apache/pulsar/wiki/Release-Candidate-Validation
> > <https://github.com/apache/incubator-pulsar/wiki/Release-Candidate-Val
> > idation>
>


Subject: [ANNOUNCE] Apache Pulsar 2.2.0 released

2018-10-25 Thread Joe F
The Apache Pulsar team is proud to announce Apache Pulsar version 2.2.0.

This is the first release of Apache Pulsar as an Apache Top Level Project.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management
for  subscribers, and cross-datacenter replication. It also provides
lightweight stream native processing through Pulsar Functions

Release 2.20 introduces several major features and improvements from many
contributors. These major features include

Pulsar SQL  which enables
users to query structured data stored within Pulsar
Pulsar I/O  adds
Connector support  for HDFS, JDBC sinks and Flink source
Pulsar Tiered Storage
  adds support
for Google Cloud Storage Offloader
Pulsar Functions
integration
with  Pulsar Schema registry

Dead Letter Topic
Pulsar Java Client Interceptors

For more details about  Pulsar 2.2.0 release and downloads, visit:
https://pulsar.apache.org/download

Release Notes are at:
http://pulsar.apache.org/release-notes

We would like to thank the contributors that made the release possible.

Regards,
The Pulsar Team


Re: [DISCUSS] PIP 26: Delayed Message Delivery

2018-11-09 Thread Joe F
I am not a fan of adding complexity to the dispatch path, and I will always
have serious concerns about proposals that do so, including this one.

 In general, I would prefer Pulsar to keep the dispatch path simple and
efficient, and avoid server side implementations of business logic.
Streaming at scale, at low latency is what  I think Pulsar should do.  I am
biased here, because that is one of the reasons Pulsar got created
originally, at a time when there were many other message brokers out there
( and many under the Apache umbrella too)

All those other message brokers do all kinds of server-side logic -
filtering, transforming, scheduling, and so on. All of those systems have
more or less ended up with bottlenecks and  complexity.  And  this is not
without reason. Message queues are queues, and most of the server side
logic implementations are attempts to make a queue into a database. A
system that is optimized for flow as a queue, will not be good as a
database, and vice-versa.

I think the right way to do this kind of business logic is in the client or
leverage Pulsar functions, and the core broker dispatch path and process
space should just deal with performance and flow at scale

Joe




On Thu, Nov 8, 2018 at 1:39 PM 李鹏辉gmail  wrote:

> Dear all
>
> This is a PIP to add feature of delayed message delivery.
>
> ## Motivation
> Scheduled and delayed message delivery is a very common feature to support
> in a message system. Basically individual message can have a header which
> will be set by publisher and based on the header value the broker should
> hold on delivering messages until the configured delay or the scheduled
> time is met.
>
> ## Usage
> The delayed message delivery feature is enabled per message at producer
> side.
>
> Delayed messages publish example in client side:
>
> ```java
> // message to be delivered at the configured delay interval
> producer.newMessage().delayAt(3L, TimeUnit.Minute).value("Hello
> Pulsar!").send();
>
> // message to be delivered at the configure time.
> producer.newMessage().scheduleAt(new Date(2018, 10, 31, 23, 00, 00))
> ```
>
> To enable or disable delay message feature:
>
> ```shell
> pulsar-admin namespaces
>
> enable-delayed-message   Enable delayed message for all topics of the
> namespace
>   Usage: enable-delayed-message [options] tenant/namespace
>
> Options:
>   -p --time-partition-granularity
> Granularities of time will be partitioned, every time
> partition will be
> stored into legders and current time partition will be
> load in memory
> and organized in a TimeWheel.(eg: 30s, 5m, 1h, 3d, 2w)
> Default: 5m
>   -t --tick-duration
> The duration between tick in TimeWheel. Calculate ticks
> per wheel
> using time-partition-granularity / tick-duration before
> load time
> partition into a TimeWheel.(eg: 500ms, 1s, 5m)
> Default: 1s
>
> disable-delayed-message  Disable delayed message for all topics of
> the namespace
>   Usage: disable-delayed-message tenant/namespace
> ```
>
> ## Design
>
> ### Delayed Message Index
>
> The “DelayedMessageIndex” will be implemented using a [TimeWheel approach](
> http://www.cs.columbia.edu/~nahum/w6998/papers/sosp87-timing-wheels.pdf).
> We will be maintaining a delayed index, indexing the delayed message by its
> time and actual message id.
>
> The index is partitioned by the delayed time. Each time partition will be
> stored using one (or few) ledger(s). For example, if we are configuring
> the index to be partitioned by 5 minutes, we will store the index data for
> every 5 minutes by its delayed time. The latest time partition will be
> loaded in memory and organized in a TimeWheel.
>
> The TimeWheel is indexed by ticks. For example, if we configured the tick
> to be 1 second, we will be maintaining 300 ticks for 5 minutes’ index. A
> timer task is scheduled every tick, and it will pick the indexed message
> from the TimeWheel and dispatch them to the real consumers.
>
> After completing dispatching the messages in current TimeWheel, it will
> load the TimeWheel from the next time partition.
>
> Delayed message option ` time-partition-granularity ` and `tick-duration`
> properly be reset to adapt delay message throughput change.   `
> time-partition-granularity `  can't be shrink. For example, exist config is
> time-partition-granularity = 5m and tick-duration = 1s, delay message index
> will store in 300 slot, If increase the time-partition-granularity to 10m,
> when load next time partition TimeWheel will init with 600 slot, Timewheel
> has enough slot to maintain already exist time partition(5m), but if
> decrease the time-partition-granularity to 2m, Timewheel can't load already
> exist time partition(5m) into 120 slot. So regardless the
> time-partition-granularity shrink first, It's can be improve by split time
> partition when load time partition

Re: [DISCUSSION] Delayed message delivery

2019-01-24 Thread Joe F
  To me this discussion presupposes that a streaming system should provide
a service like a database. Before we   discuss about how to implement this,
we should look at whether this is something that fits into what is the core
of Pulsar. I still have the same concerns against doing this in the broker
dispatch side.

What exactly is the delayed delivery use case?  Random insertion, dynamic
sorting,  and deletion from the top of the sort.  That is a priority queue.
It is best implemented as a heap. For larger sets it's some sort of tree
structure. You can simulate that on a database with an index.

Random insertion and deletion is not what FIFO queues like Pulsar are
designed for.  The closest thing I can think of with Pulsar is to build an
in-mem priority queue in a Pulsar function, feed it from an input topic and
publish the top of the queue into a separate output topic.  In fact the
entire logic proposed in PIP-26 can be done outside the broker in a Pulsar
function.

For a small scale setup, these distinctions do not matter - you can use a
database as a queue and a queue as a database. But at any larger scale, a
streaming system is not the correct solution for a priority queue use case,
whether it's Pulsar or some other streaming system. So far I have not seen
any mention of the target scale for the design, or the specific use case
requirements

-joe


On Sat, Jan 19, 2019 at 6:43 PM PengHui Li  wrote:

> Hi All,
>
> Actually, I also prefer to simplify at broker side.
>
> If pulsar support set arbitrary timeout on each message, if not cluster
> failure or consumer failure,
> it needs to behave normally(on time). Otherwise, user need to understand
> how pulsar dispatching
> messages and how limit of unacked messages change the delay message
> behavior. This may
> lead users to hesitate, this feature may be misused when the user does not
> fully understand how it works.
>
> When user depends arbitrary timeout message feature, users just need to
> keep producer and consumer
> is work well, and administrator of pulsar need to keep pulsar cluster work
> well.
>
> I don't think pulsar is very necessary to support this feature(arbitrary
> timeout message),
> In most scenarios, #3155 can work well, In a few cases, even if support
> arbitrary timeout message in
> client side, i believe that still can not meet the requirement of all
> delayed messages.
>
> To me, i’m not against support arbitrary timeout on each message on client
> side, maybe this is useful
> for other users. In some of our scenarios, we also need a more functional
> alternative(a task service).
>
> Of course, If we can integrate a task service, we can use pulsar to
> guaranteed delivery of messages,
> task service guaranteed send message to pulsar success. Or pulsar broker
> support filter server.
> This way users can implement their own task services.
>
> Ezequiel Lovelle  于2019年1月20日周日 上午12:28写道:
>
> > > If the goal is to minimize the amount of redeliveries from broker ->
> > client, there are multiple ways to achieve that with the client based
> > approach
> > (eg. send message id and delay time instead of the full payload to
> > consumers
> > as Ivan proposed).
> >
> > But the main reason to put this logic on client side was not adding delay
> > related logic on broker side, in order to do this optimisations the
> broker
> > must be aware of delayed message and only send message id and delay time
> > without payload.
> >
> > > I don't necessarily agree with that. NTP is widely available
> > and understood. Any application that's doing anything time-related would
> > have
> > to make sure the clocks are reasonably synced.
> >
> > Yep, that's true, but from my point of view a system that depends on
> client
> > side clock is weaker than a system that does this kind of calculation at
> > a more controlled environment aka backend. This adds one more factor that
> > depends on the user doing things right, which is not always the case.
> >
> > One possible solution might be the broker send periodically its current
> > epoch time and the client do the calculations with this data, or send
> epoch
> > time initially at subscription and do the rest of calculations doing
> delta
> > of
> > time using the initial time from broker as a base (time flows equally for
> > both
> > the important thing is which one is positioned at the very present time).
> >
> > Anyway this mentioned approach sound like an a hack just from the fact of
> > not doing the time calculations in the backend.
> >
> > > Lastly, i do agree client side approaches have better scalability than
> > server side approaches in most cases. However I don’t believe that it is
> > the case here. And I don’t see anyone have a clear explanation on why a
> > broker approach is less scalable than the client side approach.
> >
> > Yes, I agree with this. At least for fixed time delay at pr #3155.
> >
> > The only remained concern to me would be Gc usage of stored positions
> > next to be expired, anyw

Re: [DISCUSS] Skip tests for documentation related changes

2019-01-28 Thread Joe F
+1 to Sanjeev's suggestion

On Mon, Jan 28, 2019 at 12:15 PM Sanjeev Kulkarni 
wrote:

> If developers are in charge of checking the checkbox, it might lead to
> errors. Any way to make it automatic? Since docs are restricted to certain
> areas of repo, maybe we can have some rules around that?
>
> On Mon, Jan 28, 2019 at 12:12 PM Sijie Guo  wrote:
>
> > Hi all,
> >
> > Currently for every documentation change, we have run 3 precommit jobs,
> > java, c++ and integrationt tests. None of them is actually testing the
> > documentation change and it is wasting jenkins resources and make the
> merge
> > process for documentation changes take much longer time.
> >
> > So I am proposing :
> >
> > - add a separate precommit job for documentation-only changes. e.g.
> > `Jenkins: Documentation Tests`
> >
> > - provide a checkbox in description
> >   * [ ] documentation-only change
> >   * [ ] code-only change
> >
> > - if `[x] documentation-only` is checked, then java, c++ and integration
> > tests will be skipped.
> > - if `[x] code-only` is checked, then documentation tests will be
> skipped.
> >
> >
> > I would like to see what other people think about this.
> >
> > - Sijie
> >
>


Re: PIP 28: Improve pulsar proxy log messages

2019-01-30 Thread Joe F
I run Pulsar proxy in production, and the same concern here.

I don't think we can get any of these metrics unless we start parsing
protocol, and definitely its going to make everything slower, and create
additional memory and GC pressures.

Joe


On Wed, Jan 30, 2019 at 11:08 AM Matteo Merli 
wrote:

> Missed to comment on this :)
>
> One issue might arise from the fact that proxy is not actually parsing
> each and every request.
>
> The proxy only "speaks" Pulsar protocol initial Connect/Connected
> handshake, in which the proxy forwards the client credentials and
> route it through the appropriate broker.
>
> After the initial handshake, the proxy is essentially degrading itself
> into a 1-1 TCP proxy, patching 2 TCP connections through
> without checking the commands anymore. That's the reason we only have
> metrics around bytes/s and "operation/s" where
> operation maps to a "buffer" we're getting from a socket (with no
> direct relation to IP packets).
>
>
> --
> Matteo Merli
> 
>
> On Wed, Jan 30, 2019 at 9:19 AM Sijie Guo  wrote:
> >
> > Yao,
> >
> > Thank you for your proposal! The proposal looks good to me +1.
> >
> > In general, the ASF is using lazy consensus for a lot of things, like
> > adopting PIPs. basically, if there is no objection coming up within a
> > period (typically 1~2 days), you are good to pull the trigger and send
> PRs
> > :-)
> >
> > - Sijie
> >
> > On Sun, Jan 27, 2019 at 11:39 AM sun yao  .invalid>
> > wrote:
> >
> > >
> > >
> > > Hi folks,
> > >   Pulsar Proxy is almost a gateway for all pulsar requests, it
> would
> > > be helpful if it can record more details for the traffic, like source,
> > > target, session id, response time(different stage) for each request,
> even
> > > for message body.   I am proposing an improve for pulsar proxy, for
> > > more information : PIP:
> > >
> https://github.com/apache/pulsar/wiki/PIP-28%3A-Improve-pulsar-proxy-log-messages
> > > Feel free to let me know any ideas or suggestions for
> this
> > > feature.  Thanks.
> > >
>


Re: [DISCUSSION] Delayed message delivery

2019-02-13 Thread Joe F
Delayed subscription is simpler, and probably worth doing in the broker IF
done right.

How is this different from a subscription running behind?  Why does
supporting that require this complex a change in the dispatcher, when we
already support backlogged subscribers?

I am extremely wary of changes in the dispatcher. The recent changes made
to support DLQ caused major problems with garbage collection, broker
failure  and service interruptions for us. Even though we ARE NOT using the
DLQ feature. Not a pleasant experience.

This is a very performance sensitive piece of code, and it should be
treated as such.

Joe



On Wed, Feb 13, 2019 at 3:58 PM Sijie Guo  wrote:

> Hi all,
>
> I am going to wrap up the discussion regarding delayed delivery use cases.
>
> For arbitrary delayed delivery, there are a few +1s to doing PIP-26 in
> functions. I am assuming that we will go down this path, unless there are
> other proposals.
>
> However there is a use case Lovelle pointed out about "Fixed Delayed
> Message". More specifically it is
> https://github.com/apache/pulsar/pull/3155
> (The caption in #3155 is a bit misleading). IMO it is a "delayed
> subscription", basically all messages in the subscription is delayed to
> dispatch in a given time interval. The consensus of this feature is not yet
> achieved. Basically, there will be two approaches for this:
>
> a) DONT treat "fixed delayed message" as a different case. Just use the
> same approach as in PIP-26.
> b) treat "fixed delayed message" as a different case, e.g. we can better
> call it "delayed subscription" or whatever can distinguish it from general
> arbitrary delayed delivery. Use the approach proposed/discussed in #3155.
>
> I would like the community to discuss this and also come to an agreement.
> So Lovelle can move forward with the approach agreed by the community.
>
> Thanks,
> Sijie
>
> On Tue, Jan 29, 2019 at 6:30 AM Ezequiel Lovelle <
> ezequiellove...@gmail.com>
> wrote:
>
> > "I agree, but that is *not what #3155 tries to achieve."
> >
> > This typo made this phrase nonsense, sorry!
> >
> > On Mon, 28 Jan 2019, 16:44 Ezequiel Lovelle  > wrote:
> >
> > > > What exactly is the delayed delivery use case?
> > >
> > > This is helpful on systems relaying on pulsar for persistent guarantees
> > > and using it for synchronization or some sort of checks, but on such
> > > systems is common to have some overhead committing data on persistent
> > > storage maybe due to buffered mechanism or distributing the data across
> > > the network before being available.
> > >
> > > Surely would be more use cases I don't came across right now.
> > >
> > > > Random insertion and deletion is not what FIFO queues like Pulsar are
> > > designed for.
> > >
> > > I agree, but that is now what #3155 tries to achieve. #3155 is just a
> > > fixed delay for all message in a consumer, that's the reason that the
> > > implementation of #3155 is quite trivial.
> > >
> > > +1 from me for doing PIP-26 in functions.
> > >
> > > --
> > > *Ezequiel Lovelle*
> > >
> > >
> > > On Sat, 26 Jan 2019 at 09:57, Yuva raj  wrote:
> > >
> > >> Considering the way pulsar is built +1 for doing PIP-26 in functions.
> I
> > am
> > >> more of thinking in a way like publish it pulsar we will make it
> > available
> > >> in a different queuing system if you need priority and delay messages
> > >> support. Pulsar functions would go enough for this kind of use cases.
> > >>
> > >> On Fri, 25 Jan 2019 at 22:29, Ivan Kelly  wrote:
> > >>
> > >> > > Correct. PIP-26 can be implemented in Functions. I believe the
> last
> > >> > > discussion in PIP-26 thread kind of agree on functions approach.
> > >> > > If the community is okay with PIP-26 in functions, I think that is
> > >> > probably
> > >> > > a good approach to start.
> > >> >
> > >> > +1 for doing it in functions.
> > >> >
> > >> > -Ivan
> > >> >
> > >>
> > >>
> > >> --
> > >> *Thanks*
> > >>
> > >> *Yuvaraj L*
> > >>
> > >
> >
>


Re: [DISCUSS] PIP 33: Replicated subscriptions

2019-04-29 Thread Joe F
I have suggestions for an alternate solution.

If source message-ids were known for replicated messages, a composite
cursor can be maintained for replicated subscriptions as an n-tuple.  Since
messages are ordered from a source, it would be possible to restart from a
known cursor n-tuple in any cluster by  a combination of cursor
positioning  _and_ filtering

A simple way to approximate this is for each cluster to insert its own
ticker marks into the topic. A ticker carries the messsage id as the
message body. The ticker mark can be inserted every 't ' time interval or
every 'n' messages as needed.

The n-tuple of the tickers from each cluster is a well-known state  that
can be re-started anywhere by proper positioning and filtering

That is a simpler solution for users to understand and trouble-shoot.  It
would be resilient to cluster failures, and does NOT require all clusters
to be up, to determine cursor position. No cross-cluster
communication/ordering is needed.

 But it will require skipping messages from specific sources as needed, and
storing  the n-tuple as part of cursor state

Joe

On Mon, Mar 25, 2019 at 10:24 PM Sijie Guo  wrote:

> On Mon, Mar 25, 2019 at 4:14 PM Matteo Merli  wrote:
>
> > On Sun, Mar 24, 2019 at 9:54 PM Sijie Guo  wrote:
> > >
> > > Ivan, Matteo, thank you for the writeup!
> > >
> > > I have a few more questions
> > >
> > > - How does this handle ack individual vs ack cumulatively? It seems it
> is
> > > ignored at this moment. But it is good to have some discussions around
> > how
> > > to extend the approach to support them and how easy to do that.
> >
> > Yes, it's stated that current proposal only replicates the mark-delete
> > position.
> > (Will clarify it better in the wiki)
> >
> > Of course the markers approach works well with cumulative acks (since
> they
> > just moves the mark-delete position), but it will work with
> > individual-acks too
> > in most of the scenarios.
> >
> > Keep in mind that in all cases a cluster failover will involve some
> number
> > of duplicates (configurable with the frequency of the snapshot).
> >
> > With individual acks, if all messages are acked within a short amount of
> > time,
> > for example, 1 second, comparable to the snapshot frequency, then there
> > will be no problem and no practical difference from the cumulative ack
> > scenario.
> >
> > Conversely, if some messages can stay unacked for much longer amount
> > of time,  while other messages are being acked, that will lead to a
> larger
> > amount of duplicates during cluster failover.
> >
> > Regarding at how support this case better, I replied below in the
> > "alternative
> > design" answer.
> >
> > > - Do we need to change the dispatcher, and what are the changes?
> >
> > This approach does not require any change in the dispatcher code. The
> > only change in the consumer handling is to filter out the marker messages
> > since they don't need to go back to consumers.
> >
>
>
> How does this
>
>
> >
> > > - If a region goes down, the approach can't take any snapshots. Does it
> > > mean "acknowledge" will be kind of suspended until the the region is
> > > brought back? I guess it is related to how dispatcher is changed to
> react
> > > this snapshot.  It it unclear to me from the proposal. It would be good
> > if
> > > we have more clarifications around it.
> >
> > First off, to clarify, this issue is only relevant when there are 3 or
> > more clusters
> > in the replication set.
> >
> > If one of the cluster is not reachable, the snapshots will not be
> > taken. A consumer
> > will still keep acknowledging locally but these acks won't be
> > replicated in the other
> > clusters. Therefore in case of a cluster failover, the subscription
> > will be rolled back
> > to a much earlier position.
>
>
> > This is not a problem with 2 clusters since if the other cluster is down,
> > we
> > we cannot failover to it anyway.
> >
>
> The question here will be more about how to fail back. If a snapshot is not
> taken, then nothing is *exchanged*
> between the clusters. How does this proposal handle failing back?
>
> In other words, what are the sequences for people to do failover and
> failback?
> It would be good to have an example to demonstrate the sequences, so that
> users will have a clear picture on how to use this feature.
>
>
> >
> > When we have 3+ clusters, though, we can only sustain 1 cluster
> > failure because, after that,
> > the snapshot will not make progress.
> >
> > Typically, though, the purpose of replicated subscriptions is to have
> > the option to fail out
> > of a failed cluster, which in this case it will work.
> >
> > What it won't work would be failing over from A to C when cluster B is
> > down. To define "won't work"
> > is that consumer will go to C but will find the cursor to an older
> > position. No data loss, but potentially
> > a big number of dups.
> >
> > This is to protect for the case of messages exchanged between B and C
> > clusters before

Re: [DISCUSS] PIP 33: Replicated subscriptions

2019-05-02 Thread Joe F
Let me try to explain my suggestion better.

First, about positions in an ordered stream:  Consider a simple stream  of
events when there is no identifier  on each event about its relative
position in the stream. But after every 'n' events there is a ticker event
carrying a monotonically increasing sequence id.

  For eg: after, say, every 4th event,  the stream
generator inserts a ticker  into the stream.
  Then the stream will be like a, a, a,  a,  (a1),  a,
a, a, a, (a2), a, a, a, a (a3) a  and so on.
A reader reading can establish it's position based on
these "ticker" events. (like freeway mile markers)

Assertion 1: A ticker position in the stream is deterministic across all
copies of the stream, if all copies have the same event order .
This means reading can be  resumed across copies of the same stream , since
positions are deterministic. For eg: if reader on copy X says I am at the
ticker (a2), then the readers position is at (a2) in every other copy, Y or
Z  . The reader can   stop at reading at (a8) in copy X and resume at (a8)
in copy Y,  and do so without loss of events.

 Second, consider a  merge operator that merges 'n'  _ordered_ input
streams and produce one output stream.

The operator can be modeled as being fed with with 'n' input readers and
emitting one output. There is no buffering, If the operator gets an input,
it has to write it to the output before it accepts another input from  any
of its feeders.

This merge operator has 2 properties
  (1)Input order: the merge operator maintains input order in the
output
   i.e. if input A had (a-x) preceding (a-y), then
the output of the merge operator has (a-x) preceding (a-y).
  (2)No output order: different merge operators can produce
arbitrary output orders across the same input feeds
i.e. No assumptions can be made that, in the
output,  (a-x) will precede (b-y),   [..or (b-z)  or  (c-y) or (c-z) or ..]

Assertion 2: The merge output can then be  represented as  an n-tuple of
'n' individual input  positions;  Since each input is an ordered stream,
the position  within that input sub-stream  is deterministic, and a
combination of positions on all inputs is deterministic.

It follows that (1) the set of input positions can be transferred from one
operator to another, and (2) the output will not lose events across such
transfers  and (3) output order may change across such transfer

Note that there is no assumption or assertions about the _output_ of the
merge operator. We are only asserting this about the input.

Example
--
For eg: readers P, Q R, are each  reading output of different merge
operators. They all process the same three event streams, one generated
from A, one from B, and one from C.

Then P's merge operator can be represented as three input stream readers
Pa, Pb, Pc who feed into  the merge operator P.   The operator for P may
produce a different output than the operator for Q,  (say because input
readers may progress at different speeds in each operator), but each  input
stream reader position is deterministic (by Assertion 1)

If P has a position at [ Pa(a8), Pb(b1), Pc(c3)] ,  Q has an equivalent
position [Qa(a8), Qb(b1), Qc(c1)] for its input readers.
  Reader on Q can set up input readers .. Qa to a(8), Qb to b(1)
and Qc to c(3), to start  feeding into Q

---

These two assertions are the invariants in my suggestion. The rest is about
solving two implementation issues,

(1)  How to map the  input n-tuple of the merge operator into  a  specific
position in the output of the merge operator
eg: how to map [Pa(x), Pb(y) Pc(z)]  ===>  P(j)

(2) How to resume a reader  across  the _outputs_ of  two merge operators,
without loss of events (and as little duplication),  when there are fed
with the same input, but at different rates.
 eg: If  [Pa(x), Pb(y) Pc(z)] ==> P(j) , then find
 position Q(r) < [Pa(x), Pb(y) Pc(z)],

And my thinking is that these two things can be solved similar to the
existing proposal.

On Wed, May 1, 2019 at 4:10 PM Matteo Merli  wrote:

> On Mon, Apr 29, 2019 at 1:57 PM Joe F  wrote:
> >
> > I have suggestions for an alternate solution.
> >
> > If source message-ids were known for replicated messages, a composite
> > cursor can be maintained for replicated subscriptions as an n-tuple.
> Since
> > messages are ordered from a source, it would be possible to restart from
> a
> > known cursor n-tuple in any cluster by  a combination of cursor
> > positioning  _and_ filtering
>
> Knowing the source message id alone is not enough to establish the
> order relationship across all the clusters. I think that would only
> work in the 2 clusters scenario.
>
> In gene

Re: PIP 36: Max message size

2019-05-09 Thread Joe F
 I think 5MB is too large  and it should be reduced. :-) . I am with Kafka
on this one.. 1MB is good enough.

Have we completely ruled out a split/join abstraction on the Pulsar
produce/consumer APIs?  No backward compatibility, no  client-server
capability mismatches, now that we have dedup, split/join should even be
resumable on an n/w interruption ..


Joe

On Thu, May 9, 2019 at 10:43 AM Matteo Merli  wrote:

> Thanks  Yong for starting the work on this.
>
> I think there should be more conditions specified in the PIP and we should
> have
> a clear documentation on what the behavior will be and what kind of errors
> an
> application will see. Especially we should cover how this will
> interact with proxy.
>
> > On client side, client should set max message size in configuration and
> it should be smaller than server support.
>
> If we just let the broker/proxy decide the max size, then we can avoid
> the configuration
> in the client. That removes a lot of complexity in how to handle
> situation when client max-size
> is smaller than broker max-size and broker needs to deliver a message
> to that consumer.
>
> Also, it becomes much easier to make the change from ops perspective,
> 1 single system instead of having to
> configure all the client applications.
>
> --
> Matteo Merli
> 
>
>
> On Wed, May 8, 2019 at 8:34 PM Sijie Guo  wrote:
> >
> > Thanks Yong for the PIP.
> >
> > I moved your gist to Pulsar wiki page:
> > https://github.com/apache/pulsar/wiki/PIP-36%3A-Max-Message-Size
> >
> > The proposal looks good to me. +1
> >
> > - Sijie
> >
> > On Mon, May 6, 2019 at 6:04 PM Yong Zhang 
> > wrote:
> >
> > > hi all,
> > >
> > >
> > > Currently `MaxMessageSize` is hardcoded in Pulsar and it can’t be
> modified
> > > in server configuration. So there is no way when user want to modify
> the
> > > limit to transfer larger size message.
> > >
> > > Hence i propose adding a `MaxMessageSize` config in `broker.conf` to
> solve
> > > this problem. Because broker server will decide how much message size
> will
> > > be received so client need know how much message client can be sent.
> > >
> > > for details track following pip:
> > > https://gist.github.com/zymap/08ec1bb688d2da16e9cd363780480e7a
> > >
>


Re: PIP 49: Permission levels and inheritance

2019-10-30 Thread Joe F
It is good that we are looking at revamping the system.  But the proposal
as it is, is thin.

First I would like this proposal to be split into two. One for inheritance
and another for changes from existing controls. They are completely
orthogonal and independent issues. Second, both of them (inheritance and
changes) need to have a clear rationales.

There are 2 key underlying principles in Pulsar,. The cluster operators,
i.e. Pulsar admins (super-users) manage the system and system resources,
Their role is in operating the cluster and managing system resources (CPU,
storage, n/w bandwidth etc) and keeping the system healthy.   Tenant admins
mange the tenant resources  allocated to them) .Second, Pulsar has a model
of disintermediation. Owning, Producing and Consuming are separate concerns
and are abstracted from each other.

Many changes in this proposal  does not fit the model.  So I would like to
see rationale for each permission change from existing permissions

  For eg:  the proposed change for in namespace permissions for
   set-clusters   tenant-admin ==>  super-user
breaks the resource model.  A tenant is allocated resources in X,Y,Z
clusters. And it's up top the tenant to manage it - whether to use
resources in X, or in X& Y etc, and to which resources. There is no reason
for the super-user to get involved.

Another example, unloading a namespace is a system operation that moves
load in the cluster around. Allowing namespace owners to interfere in
system operations is never a good idea.

I see many such changes here which does not fit the resource  model

I also see that this breaks the existing model of topic level override of
permissions. Again, this does not seem to be well thought-out on that side
either.

So I would like to see rationales that align with   1)resource management
principles  2) the separation of concerns between owner, producer  and
consumer  2) zero trust (unless explicitly granted nothing is given)4) zero
discovery avenues - no information, however harmless it is,  is revealed to
entities that don't need it. (eg: There is no reason for a namespace owner
to do get-dispatch-rate)

I agree that we Pulsar can improve on what we have today, but this proposal
needs additional work and understanding of the complex underlying issue of
resource management related to granting control.

Joe

On Wed, Oct 30, 2019 at 4:49 AM xiaolong ran 
wrote:

> Hello all:
>
> When using pulsar-admin, I found that the current permission verification
> mechanism has some problems.
>
> We are starting a proposal about permission levels and inheritance in
> Apache Pulsar.
>
> The proposal is in:
>
> https://github.com/apache/pulsar/wiki/PIP-49%3A-Permission-levels-and-inheritance
> <
> https://github.com/apache/pulsar/wiki/PIP-49:-Permission-levels-and-inheritance
> >
>
> To ensure compatibility, these changes will be made in v4's admin api.
> About the v4 admin API, see:
>
> https://github.com/apache/pulsar/wiki/PIP-48%3A-hierarchical-admin-api <
> https://github.com/apache/pulsar/wiki/PIP-48:-hierarchical-admin-api>
>
> Looking forward to any feedback.
>


Re: PIP 49: Permission levels and inheritance

2019-11-07 Thread Joe F
This proposal has the same issues as the previous one . In general it is
not correct to think of commands  as being controlled by owner of the
object  on which the command operates. Eg: It will be an error to assing
control of all namespace commands to the namespace owner

For eg:  set subscription-dispatch-rate operates on a subscription rate and
set-max-producers-per-topic These operate on namespaces. A naive approach
is to think that this would be controlled by the namespace owner.   But
essentially both these are resource management  operations. That should be
controlled by the system admin, unless you want to deal with every
namespace owner having the power to destroy your SLAs fo the cluster.

I just picked 2 examples to indicate the drawback of approaching
permissions in the proposed  manner.

There are many such cases in the proposal, too many to list out here. I
suggest completely ditching this approach of equating objects with  admin
capability of object owner

Joe


On Wed, Nov 6, 2019 at 2:14 AM xiaolong ran 
wrote:

> Hello all committers:
>
> About this PIP, do you have any other good ideas or propose.
>
>
>  --
>
> Thanks
> Xioalong Ran
>
>
> > 在 2019年10月30日,下午7:49,xiaolong ran  写道:
> >
> > Hello all:
> >
> > When using pulsar-admin, I found that the current permission
> verification mechanism has some problems.
> >
> > We are starting a proposal about permission levels and inheritance in
> Apache Pulsar.
> >
> > The proposal is in:
> >
> https://github.com/apache/pulsar/wiki/PIP-49%3A-Permission-levels-and-inheritance
> <
> https://github.com/apache/pulsar/wiki/PIP-49:-Permission-levels-and-inheritance
> >
> >
> > To ensure compatibility, these changes will be made in v4's admin api.
> About the v4 admin API, see:
> >
> > https://github.com/apache/pulsar/wiki/PIP-48%3A-hierarchical-admin-api <
> https://github.com/apache/pulsar/wiki/PIP-48:-hierarchical-admin-api>
> >
> > Looking forward to any feedback.
>
>


Re: PIP 49: Permission levels and inheritance

2019-11-07 Thread Joe F
There are no simple answers here other than to understand the effect of the
command.

The resource limit/control command always addresses an object. But  giving
control of that command to the object owner, just because the command
addresses an object, will break all controls about resource utilization in
a shared system.  A filesystem like model may look simple and elegant but
is fraught with all kind of unintended consequences if implemented in
Pulsar  (Even in the filesystem rm, mv and chown, while all addressing a
file, treats the file owner differently)

This is a  complex, and complicated effort, and my concern is that this PIP
does not  even come close to the scope of  the work that is required. Some
of them may be impossible to do (as allocating certain resources to
namespace as compared to tenant) till such capabilities are added to the
system.  And some begs fundamental questions - when you start managing a
namespace like a tenant, does it even make sense to have that namespace?
Should't that namespace be  a tenant in the first place?   Why have a
namespace admin at all?

Joe



On Thu, Nov 7, 2019 at 8:16 PM Dave Fisher  wrote:

> I’m not diving in but thinking about the logical implication of this
> dichotomy. For any object’s attributes some ought to be controlled by
> object level permissions and others by sysadmin permissions. How can
> developers tell?
>
> Best Regards,
> Dave
>
>
> Sent from my iPhone
>
> > On Nov 7, 2019, at 8:02 PM, Joe F  wrote:
> >
> > This proposal has the same issues as the previous one . In general it is
> > not correct to think of commands  as being controlled by owner of the
> > object  on which the command operates. Eg: It will be an error to assing
> > control of all namespace commands to the namespace owner
> >
> > For eg:  set subscription-dispatch-rate operates on a subscription rate
> and
> > set-max-producers-per-topic These operate on namespaces. A naive approach
> > is to think that this would be controlled by the namespace owner.   But
> > essentially both these are resource management  operations. That should
> be
> > controlled by the system admin, unless you want to deal with every
> > namespace owner having the power to destroy your SLAs fo the cluster.
> >
> > I just picked 2 examples to indicate the drawback of approaching
> > permissions in the proposed  manner.
> >
> > There are many such cases in the proposal, too many to list out here. I
> > suggest completely ditching this approach of equating objects with  admin
> > capability of object owner
> >
> > Joe
> >
> >
> >> On Wed, Nov 6, 2019 at 2:14 AM xiaolong ran 
> >> wrote:
> >>
> >> Hello all committers:
> >>
> >> About this PIP, do you have any other good ideas or propose.
> >>
> >>
> >> --
> >>
> >> Thanks
> >> Xioalong Ran
> >>
> >>
> >>>> 在 2019年10月30日,下午7:49,xiaolong ran  写道:
> >>>
> >>> Hello all:
> >>>
> >>> When using pulsar-admin, I found that the current permission
> >> verification mechanism has some problems.
> >>>
> >>> We are starting a proposal about permission levels and inheritance in
> >> Apache Pulsar.
> >>>
> >>> The proposal is in:
> >>>
> >>
> https://github.com/apache/pulsar/wiki/PIP-49%3A-Permission-levels-and-inheritance
> >> <
> >>
> https://github.com/apache/pulsar/wiki/PIP-49:-Permission-levels-and-inheritance
> >>>
> >>>
> >>> To ensure compatibility, these changes will be made in v4's admin api.
> >> About the v4 admin API, see:
> >>>
> >>> https://github.com/apache/pulsar/wiki/PIP-48%3A-hierarchical-admin-api
> <
> >> https://github.com/apache/pulsar/wiki/PIP-48:-hierarchical-admin-api>
> >>>
> >>> Looking forward to any feedback.
> >>
> >>
>
>


Re: PIP 49: Permission levels and inheritance

2019-11-11 Thread Joe F
>Is there any appetite for considering a "namespace admin"?

I think this is where there is an issue. As Sijie aptly pointed out, this
namespace "admin", as a concept, does not exist in Pulsar for resource
allocation.   It will upend the whole design of how resources are managed
in Pulsar.  While we started this PIP as a discussion on permissions, it is
not possible do many of these asks as simply permissions.  Implicitly, it's
reworking the whole resource management model in Pulsar.  And I think that
if we are attempting to re-work resource management, it  should be
discussed as such, and not occur as an unnoticed side effect of attempting
to manage permissions .

>I think that set of permissions (with some deletions or adjustments) could
go a long way to making the current permissions model work a fair bit
better for orgs trying to deploy pulsar.

Ultimately, when you give someone access to a resource, they are going to
use it. And that use has a cost. Who pays the cost, and how is it managed
and accounted? Power to manage access is power to spend.  To what level do
we do resource controls?

I am a little lost on the idea of sub-tenants (namespace admins). This is
the point where I would think that the system model would be better off
having those namespace as first-class tenants, and not as namespaces.
Pulsar is, not as it is, designed to have sub-tenants.

Joe




On Mon, Nov 11, 2019 at 9:35 AM Addison Higham  wrote:

> Is there any appetite for considering a "namespace admin"? We could really
> have use of this as currently we have to give more people than we would
> want tenant admin to ease administration, when many users basically just
> need permissions to create topics and grant permissions to those topics.
>
> From my reading of this chain, it is obvious that it probably won't be
> given a ton of permissions and it may be complex to add, but it seems like
> starting with a conservative approach of permissions for a namespace admin
> initially could be done fairly quickly and safely.
>
> I would say the following could make a a lot of sense to be in a namespace
> admin permissions:
>
> Topics
> - compaction/compaction status
> - offload/offload-status
> - create/delete non partitioned topics
> - create/delete partitioned topics
> - list/grant/revoke permissions
> - create/delete/peek/seek subscriptions
> - stats calls
>
> Namespaces
> - list/grant/revoke permissions
> - set/get clusters (maybe not?)
> - get/set ttl
> - get/set persistence (maybe not?)
>
> I think that set of permissions (with some deletions or adjustments) could
> go a long way to making the current permissions model work a fair bit
> better for orgs trying to deploy pulsar.
>
> Long term, it feels like the correct answer to this problem is probably to
> introduce some sort of policy language with a role being tied to some
> policy that can grant a wide range of individual permissions, with the
> existing superuser/tenant-admin/etc being redefined in terms of some of
> these policies (perhaps using something like https://casbin.org/en/). If
> it
> seems like the focus should be on the more general solution, then I am fine
> leaving this for now, but this will be something we need to tackle in the
> next 6 months or so for my org, so I would love to see any sort of
> direction of where things should be heading so we can plan to that and
> perhaps help where possible!
>
>
>
> On Mon, Nov 11, 2019 at 2:24 AM xiaolong ran 
> wrote:
>
> > Hello everyone:
> >
> > I reorganize the relevant content of this PIP. PTAL again.
> >
> >
> https://github.com/apache/pulsar/wiki/PIP-49:-Permission-levels-and-inheritance
> > <
> >
> https://github.com/apache/pulsar/wiki/PIP-49:-Permission-levels-and-inheritance
> > >
> >
> > In this PIP, I only fix the unreasonable permissions in the current
> > command and shown in bold.
> > About specific permission of command, if there are different opinions
> > about this, pls let me know, thanks.
> >
> > --
> >
> > Thanks
> > Xiaolong Ran
> >
> >
> > > 在 2019年11月8日,下午4:41,xiaolong ran  写道:
> > >
> > > Thanks Sijie, Joe F and Addison Highham feedback.
> > >
> > > As Joe said:
> > >
> > > > There are no simple answers here other than to understand the effect
> > of the command.
> > >
> > > So in here, we can use sijie proposal, we only need to fix the
> > unreasonable permissions in the current command.
> > >
> > > I will reorganize this PIP, what do you think about this?
> > >
> > > --
> > >
> > > Thanks
> > > Xiaolong Ran
> > 

Re: [DISCUSS] PIP-321 Split the responsibilities of namespace replication-clusters

2023-12-03 Thread Joe F
>if users want to change the replication policy for
topic-n and do not change the replication policy of other topics, they need
to change all the topic policy under this namespace.

This PIP unfortunately  flows from  attempting to solve bad application
design

A namespace is supposed to represent an application, and the namespace
policy is an umbrella for a similar set of policies  that applies to all
topics.  The exceptions would be if a topic had a need for a deficit, The
case of one topic in the namespace sticking out of the namespace policy
umbrella is bad  application design in my opinion

-Joe.



On Sun, Dec 3, 2023 at 6:00 PM Xiangying Meng  wrote:

> Hi Rajan and Girish,
> Thanks for your reply. About the question you mentioned, there is some
> information I want to share with you.
> >If anyone wants to setup different replication clusters then either
> >those topics can be created under different namespaces or defined at topic
> >level policy.
>
> >And users can anyway go and update the namespace's cluster list to add the
> >missing cluster.
> Because the replication clusters also mean the clusters where the topic can
> be created or loaded, the topic-level replication clusters can only be the
> subset of namespace-level replication clusters.
> Just as Girish mentioned, the users can update the namespace's cluster list
> to add the missing cluster.
> But there is a problem because the replication clusters as the namespace
> level will create a full mesh replication for that namespace across the
> clusters defined in
> replication-clusters if users want to change the replication policy for
> topic-n and do not change the replication policy of other topics, they need
> to change all the topic policy under this namespace.
>
> > Pulsar is being used by many legacy systems and changing its
> >semantics for specific usecases without considering consequences are
> >creating a lot of pain and incompatibility problems for other existing
> >systems and let's avoid doing it as we are struggling with such changes
> and
> >breaking compatibility or changing semantics are just not acceptable.
>
> This proposal will not introduce an incompatibility problem, because the
> default value of the namespace policy of allowed-clusters and
> topic-policy-synchronized-clusters are the replication-clusters.
>
> >Allowed clusters defined at tenant level
> >will restrict tenants to create namespaces in regions/clusters where they
> >are not allowed.
> >As Rajan also mentioned, allowed-clusters field has a different
> meaning/purpose.
>
> Allowed clusters defined at the tenant level will restrict tenants from
> creating namespaces in regions/clusters where they are not allowed.
> Similarly, the allowed clusters defined at the namespace level will
> restrict the namespace from creating topics in regions/clusters where they
> are not allowed.
> What's wrong with this?
>
> Regards,
> Xiangying
>
> On Fri, Dec 1, 2023 at 2:35 PM Girish Sharma 
> wrote:
>
> > Hi Xiangying,
> >
> > Shouldn't the solution to the issue mentioned in #21564 [0] mostly be
> > around validating that topic level replication clusters are subset of
> > namespace level replication clusters?
> > It would be a completely compatible change as even today the case where a
> > topic has a cluster not defined in namespaces's replication-clusters
> > doesn't really work.
> > And users can anyway go and update the namespace's cluster list to add
> the
> > missing cluster.
> >
> > As Rajan also mentioned, allowed-clusters field has a different
> > meaning/purpose.
> > Regards
> >
> > On Thu, Nov 30, 2023 at 10:56 AM Xiangying Meng 
> > wrote:
> >
> > > Hi, Pulsar Community
> > >
> > > I drafted a proposal to make the configuration of clusters at the
> > namespace
> > > level clearer. This helps solve the problem of geo-replication not
> > working
> > > correctly at the topic level.
> > >
> > > https://github.com/apache/pulsar/pull/21648
> > >
> > > I'm looking forward to hearing from you.
> > >
> > > BR
> > > Xiangying
> > >
> >
> >
> > --
> > Girish Sharma
> >
>


Re: [DISCUSS] PIP-321 Split the responsibilities of namespace replication-clusters

2023-12-05 Thread Joe F
Girish,

Thank you for making my point much better than I did ..

-Joe

On Tue, Dec 5, 2023 at 1:45 AM Girish Sharma 
wrote:

> Hello Xiangying,
>
> I believe what Joe here is referring to as "application design" is not the
> design of pulsar or namespace level replication but the design of your
> application and the dependency that you've put on topic level replication.
>
> In general, I am aligned with Joe from an application design standpoint. A
> namespace is supposed to represent a single application use case, topic
> level override of replication clusters helps in cases where there are a few
> exceptional topics which do not need replication in all of the namespace
> clusters. This helps in saving network bandwidth, storage, CPU, RAM etc
>
> But the reason why you've raised this PIP is to bring down the actual
> replication semantics at a topic level. Yes, namespace level still exists
> as per your PIP as well, but is basically left only to be a "default in
> case topic level is missing".
> This brings me to a very basic question - What's the use case that you are
> trying to solve that needs these changes? Because, then what's stopping us
> from bringing every construct that's at a namespace level (bundling,
> hardware affinity, etc) down to a topic level?
>
> Regards
>
> On Tue, Dec 5, 2023 at 2:52 PM Xiangying Meng 
> wrote:
>
> > Hi Joe,
> >
> > You're correct. The initial design of the replication policy leaves room
> > for improvement. To address this, we aim to refine the cluster settings
> at
> > the namespace level in a way that won't impact the existing system. The
> > replication clusters should solely be used to establish full mesh
> > replication for that specific namespace, without having any other
> > definitions or functionalities.
> >
> > BR,
> > Xiangying
> >
> >
> > On Mon, Dec 4, 2023 at 1:52 PM Joe F  wrote:
> >
> > > >if users want to change the replication policy for
> > > topic-n and do not change the replication policy of other topics, they
> > need
> > > to change all the topic policy under this namespace.
> > >
> > > This PIP unfortunately  flows from  attempting to solve bad application
> > > design
> > >
> > > A namespace is supposed to represent an application, and the namespace
> > > policy is an umbrella for a similar set of policies  that applies to
> all
> > > topics.  The exceptions would be if a topic had a need for a deficit,
> The
> > > case of one topic in the namespace sticking out of the namespace policy
> > > umbrella is bad  application design in my opinion
> > >
> > > -Joe.
> > >
> > >
> > >
> > > On Sun, Dec 3, 2023 at 6:00 PM Xiangying Meng 
> > > wrote:
> > >
> > > > Hi Rajan and Girish,
> > > > Thanks for your reply. About the question you mentioned, there is
> some
> > > > information I want to share with you.
> > > > >If anyone wants to setup different replication clusters then either
> > > > >those topics can be created under different namespaces or defined at
> > > topic
> > > > >level policy.
> > > >
> > > > >And users can anyway go and update the namespace's cluster list to
> add
> > > the
> > > > >missing cluster.
> > > > Because the replication clusters also mean the clusters where the
> topic
> > > can
> > > > be created or loaded, the topic-level replication clusters can only
> be
> > > the
> > > > subset of namespace-level replication clusters.
> > > > Just as Girish mentioned, the users can update the namespace's
> cluster
> > > list
> > > > to add the missing cluster.
> > > > But there is a problem because the replication clusters as the
> > namespace
> > > > level will create a full mesh replication for that namespace across
> the
> > > > clusters defined in
> > > > replication-clusters if users want to change the replication policy
> for
> > > > topic-n and do not change the replication policy of other topics,
> they
> > > need
> > > > to change all the topic policy under this namespace.
> > > >
> > > > > Pulsar is being used by many legacy systems and changing its
> > > > >semantics for specific usecases without considering consequences are
> > > > >creating a lot of pain and incompatibility problems for other
> existin

Re: (Apache committer criteria) [ANNOUNCE] New Committer: Asaf Mesika

2024-03-06 Thread Joe F
I have not seen any specific asks or examples that give rise to these
gripes, so it's all hypotheticals as of now. Please, as Enrico said, bring
up specific issues.

(1) How many reviewers (less than 1) have been involved in PIP review
outside of the streamnative provider and how many of them have experience
with Pulsar for more than 2 years (less than 1 or 2)?
This means that not all of the reviewers are aware of the entire system or
its history. If their opinions do not match the proposed solutions, then it
means that a contributor who is not a streamnative provider cannot
contribute to Pulsar and have the desired feature


(2) How long it takes for streamnative contributors to move forward with
improvements (less than 2 weeks) and how long it takes for other
contributors (more than a couple of months or even forever). ..


If I were to summarize your listed issues, they are about
(1)meritocracy and (2) community.

(1)People who have worked more on something know more about it.  There is
simply nothing that can be done to make a newcomer on the same footing as a
veteran when it comes to expertise, unless  the newcomer is willing to put
in the time to learn and gain the understanding.  There is a big difference
between every idea being considered  vs every idea being accepted.  If one
has an idea/contribution that can stand on its own merits, and has the
technical justifications to back it up, then it would get accepted.  I
would look for other reasons, before accusing people

(2) This is a community. Nobody owes anyone anything . Including providing
help or support.  I do not think "that guy/those guys are not helping me"
or "I am not getting timely help  with reviews"  is a ground for griping.
Everyone volunteers their time, and it is not infinite.   Would it be nice
if everyone helped everyone else out and had the time to do everything?
Yes.  Does it work that way?  Not perfectly.  Can it be better? Yes.  But
you are not going to get there blaming people. One needs to build
relationships and  collaboration with one's peers. That is not something
that can be ordered from your peers in the community. They are all
volunteers. There are ASF community rules, but rules never built a
community.   People and relationships do.

>There are many such examples
> that we can see, where contributors have requested pluggable features such
> as security, rate limiting, etc. These requests were legitimate, but only
> streamnative contributors will have the right to control Pulsar.
>

As someone who has repeatedly said no to  poorly thought out features in
Pulsar, what one user considers "legitimate" is debatable.  Due to a lack
of a clearly spelled out architectural vision, it is common to get feature
requests that don't align well with Pulsar,  or not well thought out,  or
just a niche use case for some specific user which does not fit with a
general streaming platform.  And for that matter, I have been ignored too -
that's community for you

Again, I don't know what your PIPs were, but I would not automatically
assume that everything submitted as a PIP should be accepted,  nor it is
incumbent upon the community to make every PIP evolve, mature and get
released.  People volunteer   time and talent - and for what interests them
in Pulsar.  It is entirely up to them to prioritize as they wish.

-joe

On Wed, Mar 6, 2024 at 12:49 PM Kalwit S  wrote:

> Thanks for your reply. But I wasn’t going to go after a specific person. I
> just wanted to point out this is an example of what we’ve been seeing for a
> while now. And I wanted to point out why we’re not going to go with Pulsar.
> Because if Streamnative is on the same trajectory as Confluent, then
> there’s not a lot of value for either of us to put in a lot of time and
> energy into migrating our systems.
> I’ll share what we’ve seen since we started using Pulsar. Other
> contributors can comment if they don’t agree.
>
> (1) How many reviewers (less than 1) have been involved in PIP review
> outside of the streamnative provider and how many of them have experience
> with Pulsar for more than 2 years (less than 1 or 2)?
> This means that not all of the reviewers are aware of the entire system or
> its history. If their opinions do not match the proposed solutions, then it
> means that a contributor who is not a streamnative provider cannot
> contribute to Pulsar and have the desired feature. On the other hand, if
> the same feature is provided by a streamnative provider, then the same
> enhancement could easily be added to Pulsar. There are many such examples
> that we can see, where contributors have requested pluggable features such
> as security, rate limiting, etc. These requests were legitimate, but only
> streamnative contributors will have the right to control Pulsar.
>
> (2) How long it takes for streamnative contributors to move forward with
> improvements (less than 2 weeks) and how long it takes for other
> contributors (more than a couple of months or ev

Re: (Apache committer criteria) [ANNOUNCE] New Committer: Asaf Mesika

2024-03-06 Thread Joe F
>
> know I’m not trying to be disrespectful, but it’s not respectful to be
> biased and act like an expert during the reviews, while you’ve contributed
> just for documentation PRs. When I talk about experience, I’m talking about
> reviewers who don’t contribute to the project, they ask questions to get to
> know Pulsar’s internals during the PIP, and then they give judgment based
> on their limited understanding, which is rude.


This is a very negative and corrosive way of looking at things.  Anyone -
anyone - who takes the time and effort to review a change or PIP  is
helping you.  Here you are, complaining about your PIPs and PRs not getting
support,  and  at the same time belittling someone who does take the time
to help you in moving things forward.

Asking questions, understanding the  changes , seeking explanations ..   is
all part of the process. One hopes that a reviewer does poke some holes and
finds weaknesses.  So that the end result is better than the original,
because of the process, not just the person.  By no means is that process
'rude'.

I can even cite a few examples from recent times from different users
> (PIP-337, PIP-338, PIP-332, PIP-310, etc) to illustrate how some
> improvements are simply ignored


This is strange, as all of those PIPs have comments and questions.
Discussions and voting need to be championed by the proposer. People have
multiple claims on their time. There is no uber person dictating what get's
attention or who should do what.  You need to canvass  people and  push for
your changes all the way through.

  There are many examples
> (PIP-321) where it was developed by SN contributors, and while there is no
> consensus, they will still be a part of the system.


As for PIP-321 getting in without a  consensus, I was one who had concerns
with it (and still think poorly of it),   but I don't think it was decided
in violation to the rules.



On Wed, Mar 6, 2024 at 10:14 PM Girish Sharma 
wrote:

> On Thu, Mar 7, 2024 at 11:38 AM Yunze Xu  wrote:
>
> > Regarding PIP-332 and PIP 310, similar to PIP-337, there is no
> > discussion mail in the dev mail list. David left a comment [1] in
> >
>
> There is for 310 -
> https://lists.apache.org/thread/13ncst2nc311vxok1s75thl2gtnk7w1t
>
>
> Regards
> --
> Girish Sharma
>


Re: [DISCUSS] Planning for Apache Pulsar 3.0

2022-10-10 Thread Joe F
I would prefer that we avoid using the term “breaking changes”, which is
too vague to convey any specific meaning. So let me try to bring some
clarity.


There have been many changes to implementations, APIs and data storage
formats in Pulsar (and book keeper also). I have deployed many of these
changes to production. And I know  that Matteo and Rajan  (and others too,
about  whom I’m not up to date  on) have implemented and deployed many such
changes.  But  none of those changes ever required taking the system
offline. NONE.


Pulsar was developed as a 24x7x365 system, and rolling upgrades and
rollbacks were a given. Like “this is water”,  there was no special callout
needed for declaring this reality. No change, including enhancements to
wire protocols, broke client compatibility.  Existing clients continued to
work; they may not be able to use all the new features. Use of new features
would require the app to be rebuilt anyway.  (Checksums, e2e encryption are
examples)


We have even succeeded in getting Pulsar adopted for some use cases,  just
because the complexity of upgrading from K’s old clients to new ones were
costly enough to allow consideration of an alternative like Pulsar.  The
business cost of forcing a client upgrade can be significant,  to the point
of this being unviable for business.   That just cannot be hand-waved over


There have also been changes in storage formats(the ZK metadata change from
text to binary is an example). But through all such changes, compatibility
and upgradeability has been a given. There has never been a situation where
a live Pulsar upgrade was not possible, and   a coordinated  client upgrade
was mandatory.


So the question should not  be about whether “signifcant”  changes  should
be made or not.  Changes can be made and released in a way that breaks
*business*, or  they can be made in a way that lets businesses sail
smoothly through that change. So the question is about  how such changes
gets rolled out.


And to that question, my strong opinion is that any change that does not
allow a live/rolling upgrade or rollback, or anything that forces a client
to upgrade just to continue functioning,   is a non-starter.   All changes
can be made in a compatible, phased manner, and in a way that does not
penalise older versions ( older versions doing worse  on new releases is
also not an acceptable way of making changes)  Changes can be made in a
manner that make l A/B testing possible by the user, with limited risk, and
then choosing to a not go back. It has all been done in Pulsar before.


Would that be harder than just breaking stuff? Yes.  But that is  far more
preferable than forcing users to take a hit.


-joe

On Sat, Oct 8, 2022 at 1:25 PM Rajan Dhabalia  wrote:

> I would say first we should gather a list of changes which we want to
> target and find out which improvements really need major version release.
> We can take the Pulsar-1.0 to Pulsar-2.0 upgrade example to avoid major
> interruption and impact on existing systems and still achieve our goal. So,
> the first step is discovery of such features and then we can discuss how to
> introduce them in Pulsar with minimum impact on existing systems.
>
> Thanks,
> Rajan
>
> On Sat, Oct 8, 2022 at 1:05 PM Devin Bost  wrote:
>
> > I'm noticing some pushback on the idea of pre-emptively proposing any
> kind
> > of breaking upgrade that would necessitate cutting a 3.0 release.
> > I do understand the concern about introducing a breaking change... For a
> > distributed messaging application like Pulsar, if clients needed to be
> > simultaneously upgraded with brokers, that could be extremely difficult
> or
> > infeasible for companies to coordinate without treating it like a
> migration
> > to a new technology.
> >
> > At the same time, do we want to be completely closed to the possibility
> > that a breaking change could be required at some point in the future? If
> a
> > circumstance like that appears, those are the kinds of situations that
> can
> > lead to a fork. Are there certain kinds of breaking changes that are more
> > acceptable than others?
> >
> > Also, if the forward looking plan is to never introduce breaking changes,
> > when *would* we ever cut a Pulsar 3.x release?  Do we have any criteria
> on
> > what kinds of changes would necessitate cutting a new major release but
> > would still be considered acceptable by the community?
> >
> > --
> > Devin Bost
> > Sent from mobile
> > Cell: 801-400-4602
> >
> > On Sat, Oct 8, 2022, 2:14 PM Rajan Dhabalia 
> wrote:
> >
> > > This sounds like the current state of Apache Pulsar has a lot of issues
> > and
> > > it requires fundamental design changes to make it promising which is
> > > definitely not true and I disagree with it. And I would be careful
> > > comparing with Kafka as I still don't think the Kafka release has
> > anything
> > > to do with Pulsar's improvement. I would still recommend to list down
> all
> > > the changes at one place so we can b

Re: [DISCUSSION] Redesign the MessageId interface

2022-11-08 Thread Joe F
>Maybe this design is to hidden some details, but if
users don't know the details like ledger id and entry id, how could
you know what does "0:0:-1:0" mean?

 Abstractions exist for a reason. Ledgerid and entryid are  implementation
details, and an application should not be interpreting that at all.
-j


On Tue, Nov 8, 2022 at 3:43 AM Yunze Xu 
wrote:

> I didn't look into these two methods at the moment. But I think it's
> possible to
> retain only the `fromByteArray`.
>
> Thanks,
> Yunze
>
> On Tue, Nov 8, 2022 at 7:02 PM Enrico Olivelli 
> wrote:
> >
> > Il giorno mar 8 nov 2022 alle ore 11:52 Yunze Xu
> >  ha scritto:
> > >
> > > Hi Enrico,
> > >
> > > > We also need a way to represent this as a String or a byte[]
> > >
> > > We already have the `toByteArray` method, right?
> >
> > Yes, correct. So we are fine. I forgot about it and I answered too
> quickly.
> >
> > I am not sure if this can be in the scope of this initiative, but we
> > should somehow get rid of
> > stuff like "fromByteArrayWithTopic" vs "fromByteArray".
> >
> > Thanks
> > Enrico
> >
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Tue, Nov 8, 2022 at 6:43 PM Enrico Olivelli 
> wrote:
> > > >
> > > > Il giorno mar 8 nov 2022 alle ore 11:25 Yunze Xu
> > > >  ha scritto:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Currently we have the following 5 implementations of MessageId:
> > > > >
> > > > > - MessageIdImpl: (ledger id, entry id, partition index)
> > > > >   - BatchMessageIdImpl: adds (batch index, batch size, acker),
> where
> > > > > acker is a wrapper of a BitSet.
> > > > >   - ChunkMessageIdImpl: adds another MessageIdImpl that represents
> > > > > the first MessageIdImpl of a BitSet.
> > > > >   - MultiMessageIdImpl: adds a map that maps the topic name to the
> > > > > MessageId.
> > > > > - TopicMessageIdImpl: adds the topic name and the partition name
> > > > >
> > > > > These implementations are such a mess. For example, when users get
> a
> > > > > MessageId from `Producer#send`:
> > > > >
> > > > > ```java
> > > > > var id = producer.send("msg");
> > > > > ```
> > > > >
> > > > > There is no getter to get some specific fields like ledger id. You
> can
> > > > > only see a representation from `toString` method and got some
> output
> > > > > like "0:0:-1:0". Maybe this design is to hidden some details, but
> if
> > > > > users don't know the details like ledger id and entry id, how could
> > > > > you know what does "0:0:-1:0" mean? What if `MessageId#toString`'s
> > > > > implementation changed? Should it be treated as a breaking change?
> > > > >
> > > > > The original definition of the underlying MessageIdData is much
> more
> > > > > clear:
> > > > >
> > > > > ```proto
> > > > > message MessageIdData {
> > > > > required uint64 ledgerId = 1;
> > > > > required uint64 entryId  = 2;
> > > > > optional int32 partition = 3 [default = -1];
> > > > > optional int32 batch_index = 4 [default = -1];
> > > > > repeated int64 ack_set = 5;
> > > > > optional int32 batch_size = 6;
> > > > >
> > > > > // For the chunk message id, we need to specify the first
> chunk message id.
> > > > > optional MessageIdData first_chunk_message_id = 7;
> > > > > }
> > > > > ```
> > > > >
> > > > > IMO, MessageId should be a wrapper of MessageIdData. It's more
> natural
> > > > > to have an interface like:
> > > > >
> > > > > ```java
> > > > > interface MessageId {
> > > > > long ledgerId();
> > > > > long entryId();
> > > > > Optional partition();
> > > > > Optional batchIndex();
> > > > > // ...
> > > > > ```
> > > >
> > > > This is very good for client applications.
> > > > We also need a way to represent this as a String or a byte[], this
> way
> > > > client applications have a standard way to store
> > > > message offsets into an external system (for instance when you want
> to
> > > > user the Reader API and keep track of the position by yourself)
> > > >
> > > > Enrico
> > > >
> > > > >
> > > > > Additionally, there are many places that use only the triple of
> > > > > (ledger id, entry id, batch index) as the key to represent the
> position.
> > > > > Currently, they are done by adding a conversion from
> > > > > BatchMessageIdImpl to MessageIdImpl. However, it's more intuitive
> to
> > > > > write something like:
> > > > >
> > > > > ```java
> > > > > class MessageIdPosition implements Comparable {
> > > > > private final MessageId messageId;
> > > > > // TODO: compare only the triple (ledger, entry, batch)
> > > > > ```
> > > > >
> > > > > Therefore, I'm going to write a proposal to redesign the MessageId
> > > > > interface only by adding some getters. Regarding the 5 existing
> > > > > implementations, I think we can drop them because they are a part
> > > > > of `pulsar-client`, not `pulsar-client-api`.
> > > > >
> > > > > Please feel free to share your points.
> > > > >
> > > > > Thanks,
> > > > > Yunze
>


Re: [DISCUSSION] Redesign the MessageId interface

2022-11-09 Thread Joe F
Messageid is an identifier which identifies a message.  How that id is
constructed, or what it contains should not  matter to an application,  and
an application should not assume anything about the implementation of that
id.

>What about the partition index? We have a `TopicMetadata` interface that
returns the number of partitions.

Partitioning is a first class concept, and is  designed to be used by
application.  How a partition is implemented  should not be used by the
application .

 [ People violate this all the time, and I regret that Pulsar did not
provide get_Nth_topicpartion(), which led to people hardcoding it  as
topicname-N. and using that directly.  Now we are stuck with it.]

 Similarly batch index and batch size. Those are all logical concepts
exposed to the user.  For eg: batch size is something the app is allowed to
tune

>Even for ledger id and entry id, this pair represents a logic storage
position like the offset concept in Kafka
These are not equivalent.   In Pulsar these are implementation details,
while in Kafka those are logical concepts.

One might think that these are logical concepts in Pulsar, because if you
reverse engineer the current msgid implementation, you observe some
"properties".

Ledger id/entry id are logical concepts in __Bookkeeper__ , not  in Pulsar.
There is the Managed Ledger abstraction on top of BK, and then there is
Pulsar on top of ML. You will break two levels of abstraction to expose
ledger/entryid to an application

An application  should only care about the  operations that  can be done
with a messageId

- getmsgid() to return the message id  as an opaque object

[Operators   using  one messageId ]
-serde,   like tostring(). for storage/retrieval of message identifier
-getter/setter on logical properties of the message (partition id etc...)
-increment/decrement

[Operators that take multiple messageIds]
-comparator
-range

Those are the kind of operators Pulsar should provide to a user.
Applications should not implement these operators on their own by reverse
engineering the msgId. No application should be directly using ledgerid or
entryid for doing anything (math or logic),

  As long as Pulsar provides  these operations  with msgid to the
application,  it should not care whether it's represented as "0:1:-1:-1"
or  "a:b:-b-b", or   "#xba4231!haxcy1826923f" or as a serialized binary
object or..whatever it may be.

>>But it would be harder to know a tuple like "0:1:-1:-1" means.

A user shouldn't have to know what this means. That's the point.

Pulsar itself changed the messageId multiple times as it added
partitioning, batching and so on, and it might do so again. And bookkeeper
could change its representation of  ledgers, (for eg,  to uuids and byte
offsets)  ML could replace BK with something else  (for eg.  a table in a
db.)  Anything is possible - Pulsar would then just have to change the
implementation of the operator functions, and no application needs to be
rewritten.

-j

On Tue, Nov 8, 2022 at 6:05 PM Yunze Xu 
wrote:

> Hi Joe,
>
> Then what would we expect users to do with the MessageId? It should only
> be passed to Consumer#seek or ReaderBuilder#startMessageId?
>
> What about the partition index? We have a `TopicMetadata` interface that
> returns
> the number of partitions. If the partition is also "implementation
> details", should we expose
> this interface? Or should we support customizing a MessageRouter because it
> returns the partition index?
>
> What about the batch index and batch size? For example, we have an
> enableBatchIndexAcknowledgment method to enable batch index ACK. If batch
> index is also "implementation details", how could users know what does
> "batch
> index ack" mean?
>
> Even for ledger id and entry id, this pair represents a logic storage
> position like the offset
> concept in Kafka (though each offset represents a message while each
> entry represents
> a batch). If you see the Message API, it also exposes many attributes.
> IMO, for the
> MessageIdData, only the ack_set (a long array serialized from the
> BitSet) is the implementation
> detail.
>
> The MessageId API should be flexible, not an abstract one. If not, why
> do we still implement
> the toString() method? We should not encourage users to print the
> MessageId. It would
> be easy to know what "ledger is 0, entry id is 1" means, users only
> need to know the concepts
> of ledger id and entry id. But it would be harder to know a tuple like
> "0:1:-1:-1" means.
>
> Thanks,
> Yunze
>
> On Tue, Nov 8, 2022 at 11:16 PM Joe F  wrote:
> >
> > >Maybe this design is to hidden some details, but if
> > users don't know the details like ledger id and entry id, how could
> > you know 

Re: [DISCUSS] PIP-221: Make TableView support read the non-persistent topic

2022-11-14 Thread Joe F
I am not sure about the semantics of TableView on a non-persistent topic.

 Exactly how does this work?

 What happens if the client crashes?  What is the base state for the table?

What exactly can I expect as a user from this?

Joe

On Sun, Nov 13, 2022 at 8:57 PM Kai Wang  wrote:

> Hi, pulsar-dev community,
>
> Since the non-persistent topic support doesn't require API changes. I have
> pushed a PR to implement it, which has already been merged.
>
> See: https://github.com/apache/pulsar/pull/18375
>
> And this PIP title has been changed to `Make TableView support TTL`.
>
> PIP link: https://github.com/apache/pulsar/issues/18229
>
> Thanks,
> Kai
>
> On 2022/11/04 02:28:41 Kai Wang wrote:
> > Hi, pulsar-dev community,
> >
> > I’ve opened a PIP to discuss : PIP-221: Make TableView support read the
> non-persistent topic.
> >
> > PIP link: https://github.com/apache/pulsar/issues/18229
> >
> > Thanks,
> > Kai
> >
>


Re: [DISCUSS] PIP-221: Make TableView support read the non-persistent topic

2022-11-15 Thread Joe F
Introducing new system concepts as code change PR is really not the way to
go about making  these changes.  The semantics and use case discussion
should come first, so that they can be discussed and the implications
understood.  It's not a good experience to put this out there without
providing a meaningful explanation of  the general concept, semantics  and
use cases that applies to a general Pulsar user.  And how is this put out
as a change with doc-not-needed ?

>The current use case is to use the non-persistent topic to store the load
data used by the new load manager.

How will Pulsar users  understand this?. Non-persistent topics are a
well-defined concept.- lossy,  totally unreliable, and no guarantee of
anything, Table View is another well-defined concept - a much more
reliable construct with compacted keys. As a general Pulsar user, what
exactly does a Table View on a non-persistent topic conceptually mean? What
are the semantics?  And then there is the TTL - how does this all fit
together?

> The current use case is to use the non-persistent topic to store the load
data used by the new load manager.
So is the idea that no external user should use this new NP-TableView?
 This is a public API

 A Table with  a  totally random selection of events in it?   How does this
work for a general use case?   With this construct it is entirely possible
that two clients A & B will have totally different data in their table
views at the same time - what does that mean?You have introduced state
into what is essentially a stateless concept (N-P topic) in Pulsar.

I am not debating the  merits/demerits of the change. It's really about how
we go about doing things.

Ideally you want to make a case for N-P TableViews, what's the semantics,
what are the use cases, limitations and anti-patterns for this change. The
community then can provide meaningful feedback, discuss the merits and
suggest improvements. You will get a better feature, with good user
documentation and clarity about how and when applications should
adopt/reject that in their solution design.

That  benefits everyone - the contributor, the community, Pulsar users, and
Pulsar architecture


On Mon, Nov 14, 2022 at 8:59 PM Kai Wang  wrote:

> Hi Joe,
>
> > I am not sure about the semantics of TableView on a non-persistent topic.
> > What happens if the client crashes?  What is the base state for the
> table?
>
> If users use a non-persistent topic as the TableView topic, when the
> client crashes,
> the TableViews data will be lose.
>
> The current use case is to use the non-persistent topic to store the load
> data used by the new load manager. It doesn't require strong consistency
> ensure, and no need persistence.
>
>
> Thanks,
> Kai
>
> On 2022/11/14 23:03:13 Joe F wrote:
> > I am not sure about the semantics of TableView on a non-persistent topic.
> >
> >  Exactly how does this work?
> >
> >  What happens if the client crashes?  What is the base state for the
> table?
> >
> > What exactly can I expect as a user from this?
> >
> > Joe
> >
> > On Sun, Nov 13, 2022 at 8:57 PM Kai Wang  wrote:
> >
> > > Hi, pulsar-dev community,
> > >
> > > Since the non-persistent topic support doesn't require API changes. I
> have
> > > pushed a PR to implement it, which has already been merged.
> > >
> > > See: https://github.com/apache/pulsar/pull/18375
> > >
> > > And this PIP title has been changed to `Make TableView support TTL`.
> > >
> > > PIP link: https://github.com/apache/pulsar/issues/18229
> > >
> > > Thanks,
> > > Kai
> > >
> > > On 2022/11/04 02:28:41 Kai Wang wrote:
> > > > Hi, pulsar-dev community,
> > > >
> > > > I’ve opened a PIP to discuss : PIP-221: Make TableView support read
> the
> > > non-persistent topic.
> > > >
> > > > PIP link: https://github.com/apache/pulsar/issues/18229
> > > >
> > > > Thanks,
> > > > Kai
> > > >
> > >
> >
>


Re: [DISCUSS] The use of consumer redeliverUnacknowledgedMessages method

2022-11-23 Thread Joe F
I am not familiar with all the changes since it's original implementation,
nor can I speak to for all the changes that went after

The original concept was simple and rigorous. For shared sub, all unacked
messages will be redelivered, and for Exclusive subs, the cursor was
rewound and everything after the  rewind point. was redelivered to preserve
order.

>but in failover and exclusive subType, if we don't get the response,
the user will receive the message from the `incomingQueue` then the
order of the message will be broken.

This "brokenness" is not clear to me.  The sequence 3,4,5,6,7,8,9,10,11
12,13,14,15, 16 ,9,10,11,12,13,14,15,16,17, 18, 19, 20 ...does not break
the ordering guarantees of Pular

On Tue, Nov 22, 2022 at 5:47 PM PengHui Li  wrote:

> Hi, Bo
>
> Thanks for starting the discussion.
>
> I have no idea about the initial motivation for supporting message
> redelivery for
> Failover or Exclusive subscription. The redelivered messages will go to the
> same
> consumer under a single active consumer subscription mode.
>
> Or maybe it is only designed for the Shared subscription?
>
> It's better to get some feedback from Matteo, Joe, or anyone who knows the
> background
> about this part.
>
> Thanks,
> Penghui
>
> On Tue, Nov 22, 2022 at 7:36 PM Yubiao Feng
>  wrote:
>
> > Hi Congbo
> >
> > I think it is a goog idea.
> >
> > Thanks
> > Yubiao
> > Yu
> >
> > On Mon, Nov 21, 2022 at 9:04 PM 丛搏  wrote:
> >
> > > Hello, Pulsar community:
> > >
> > > Now client consumer `void redeliverUnacknowledgedMessages();` is an
> > > async interface, but it doesn't have the return value. only
> > > `writeAndFlush` the redeliver command then finishes.
> > >
> > > `ConsumerImpl`:
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java#L1907-L1909
> > >
> > > `MultiTopicsConsumerImpl`:
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java#L667-L677
> > >
> > > in the shared subType, I think it doesn't need the response of the
> > > `void redeliverUnacknowledgedMessages()`, and naming the
> > > `redeliverUnacknowledgedMessages` is ok.
> > >
> > > but in failover and exclusive subType, if we don't get the response,
> > > the user will receive the message from the `incomingQueue` then the
> > > order of the message will be broken.  If the
> > > `redeliverUnacknowledgedMessages` timeout, we should try again. but
> > > `redeliverUnacknowledgedMessages` doesn't throw any exception or
> > > retry. and the `redeliverUnacknowledgedMessages` name is not accurate
> > > for failover and exclusive subType. it is named `rewind` is more
> > > suitable.
> > >
> > > So I suggest `redeliverUnacknowledgedMessages` be deprecated under
> > > failover and exclusive subType and add a new similar async and sync
> > > method called `rewind` for failover and exclusive subType.
> > >
> > > Please leave your comments or suggestions, thanks!
> > >
> > > Thanks,
> > > bo
> > >
> >
>


Re: [DISCUSS] PIP-240 A new API to unload subscriptions

2023-01-17 Thread Joe F
Inclined to agree with Enrico.  If it's a hard problem, it will repeat, and
this is not helping.  If it's some race on the client, it will occur
randomly and rarely, and this unload sub will get programmed in as a way of
life.

>If you don't think unloading the subscription can't help anything.
Unloading
the topic should be the same. From my experience, most of the unloading
topic operations are to mitigate the problems related to message
consumption.

Comparisons with unloading a topic are not the bar here, as that is a first
class broker utility that is needed for operational reasons outside of
"fixing"  consumer side issues . The side effect of using "unload topic" is
a loss of transient topic state. I will fully agree that this side-effect
has been  pervasively abused for fixing problems (ala Ctlrl-Alt-Del) , but
that's not the rationale for having an unload topic utility.

What kind of problems is this trying to fix?
And why cannot that be solved by client-side fixes?

In shared sub issues, it's hard to  pinpoint which consumer/where
the problem lies, and to reset that one at the client. The totality of
state spread between the brokers and all the consumers of the shared sub
needs to be put together .  Is that why we are doing this?


On Tue, Jan 17, 2023 at 5:30 PM PengHui Li  wrote:

> I agree that if we encounter a stuck consumption issue, we should continue
> to find the root cause of the problem.
>
> Subscription unloading is just an option to mitigate the impact first.
> Maybe it can mitigate the issue for 1 hour sometimes. Especially in
> key_shared subscription. Sometimes it's not a BUG from Pulsar.
> But users need time to fix the issue. But it doesn't make sense to let
> the impaction continues until the fix is applied.
>
> I also helped many people to troubleshoot the stuck consumption
> issue related to key_shared subscriptions and transactions etc.
> In most cases, unloading the topic can mitigate the impact.
> For example, due to the un-catched exception, the dispatch thread
> stopped reading messages from the managed-ledger. The exception
> is a very infrequent occurrence. Unloading the topic is the best choice for
> now, right?
>
> If you don't think unloading the subscription can't help anything.
> Unloading
> the topic should be the same. From my experience, most of the unloading
> topic operations are to mitigate the problems related to message
> consumption.
>
> Best,
> Penghui
>
> On Tue, Jan 17, 2023 at 11:09 PM Enrico Olivelli 
> wrote:
>
> > Il giorno lun 16 gen 2023 alle ore 11:58 r...@apache.org
> >  ha scritto:
> > >
> > > I agree with @Enrico @Bo, if we encounter a subscribe stuck situation,
> we
> > > must continue to spend more time to locate and fix this problem, which
> is
> > > what we have been doing.
> > >
> > > But let's think about this problem from another angle. At this time, a
> > user
> > > in the production environment encounters a consumer stuck situation,
> what
> > > should we do? For a user in a production environment, our first
> reaction
> > > when encountering a problem is how to quickly recover and how to
> quickly
> > > reduce user losses. Even at this point in time, we don't think about
> > > whether this is a bug on the Broker side, a bug on the SDK side, or a
> bug
> > > used by the user himself? In the process of fast recovery, our most
> > common
> > > method is to quickly re-establish the connection between the broker and
> > the
> > > client through the topic specified by unload. In this process, we try
> to
> > > retain as much context as possible to assist us in the subsequent
> > > continuous positioning and repair of this problem.
> > >
> > > So I don't think these two things conflict. Why we expose the admin CLI
> > of
> > > the unload topic is why we expect to expose the unload subscribe. If we
> > > stand from the perspective of a developer, we definitely want to
> > completely
> > > fix the problem that caused the stuck. If we think about this issue
> from
> > > the perspective of the user, when a scenario such as consumer stuck
> > occurs
> > > to the user, the user does not care about the specific cause of the
> > > problem, but expects the business to recover quickly in the shortest
> > > possible time to avoid further loss.
> > >
> > > I admit that this is a relatively hacky way, but it can indeed solve
> the
> > > problems we are currently encountering, and at the same time, it will
> not
> > > cause a major conflict with Pulsar's existing logic. So I still insist
> on
> > > agreeing with yubiao's point of view.
> >
> >
> >
> > Usually when a subscription is "stuck" even if you unload the topic
> > it returns to the "stuck" state again if you don't solve the problem.
> >
> > This is a very common issue with Pulsar users, I am spending much time
> > helping users to troubleshoot their production problems and unloading the
> > topic
> > is never a solution, it can give you seconds, minutes or hours of
> > "working state",
> > then th

Re: [DISCUSS] PIP-247: Notifications for partitions update

2023-02-23 Thread Joe F
Why is this needed when we have notifications on regex sub changes? Aren't
the partition names a well-defined regex?

Joe

On Thu, Feb 23, 2023 at 8:52 PM houxiaoyu  wrote:

> Hi Asaf,
> thanks for your reminder.
>
> ## Changing
> I have updated the following changes to make sure the notification arrived
> successfully:
> 1. The watch success response `CommandWatchPartitionUpdateSuccess` will
> contain all the concerned topics of this watcher
> 2. The notification `CommandPartitionUpdate` will always contain all the
> concerned topics of this watcher.
> 3. The notification `CommandPartitionUpdate`contains a monotonically
> increased version.
> 4. A map `PartitonUpdateWatcherService#inFlightUpdate Pair>` will keep track of the updating
> 5. A timer will check the updating timeout through `inFlightUpdate`
> 6. The client acks `CommandPartitionUpdateResult` to broker when it
> finishes updating.
>
> ## Details
>
> The following mechanism could make sure the newest notification arrived
> successfully, copying the description from GH:
>
> A new class, `org.apache.pulsar.PartitonUpdateWatcherService` will keep
> track of watchers and will listen to the changes in the metadata. Whenever
> a topic partition updates it checks if any watchers should be notified and
> sends an update for all topics the watcher concerns through the ServerCnx.
> Then we will record this request into a map,
> `PartitonUpdateWatcherService#inFlightUpdate long/*timestamp*/>>`.  A timer will check this update timeout through
> inFlightUpdate .  We will query all the concerned topics's partition if
> this watcher has sent an update timeout and will resend it.
>
> The client acks `CommandPartitionUpdateResult` to broker when it finishes
> updating.  The broker handle `CommandPartitionUpdateResult` request:
>  - If CommandPartitionUpdateResult#version <
> PartitonUpdateWatcherService#inFlightUpdate.get(watcherID).version, broker
> ignores this ack.
>  -  If CommandPartitionUpdateResult#version ==
> PartitonUpdateWatcherService#inFlightUpdate.get(watcherID).version
> - If CommandPartitionUpdateResult#success is true,  broker just removes
> the watcherID from inFlightUpdate.
> - If CommandPartitionUpdateResult#success is false,  broker removes the
> watcherId from inFlightUpdate, and queries all the concerned topics's
> partition and resend.
>  - If CommandPartitionUpdateResult#version >
> PartitonUpdateWatcherService#inFlightUpdate.get(watcherID).version, this
> should not happen.
>
>  ## Edge cases
> - Broker restarts or crashes
> Client will reconnect to another broker, broker responses
> `CommandWatchPartitionUpdateSuccess` with watcher concerned topics's
> partitions.  We will call `PartitionsUpdateListener` if the connection
> opens.
> - Client acks fail or timeout
> Broker will resend the watcher concerned topics's partitions either client
> acks fail or acks timeout.
> - Partition updates before client acks.
> `CommandPartitionUpdate#version` monotonically increases every time it is
> updated. If Partition updates before client acks, a greater version will be
> put into `PartitonUpdateWatcherService#inFlightUpdate`.  The previous acks
> will be ignored because the version is less than the current version.
>
>
> Asaf Mesika  于2023年2月22日周三 21:33写道:
>
> > How about edge cases?
> > In Andra's PIP he took into account cases where updates were lost, so he
> > created a secondary poll. Not saying it's the best situation for your
> case
> > of course.
> > I'm saying that when a broker sends an update CommandPartitionUpdate, how
> > do you know it arrived successfully? From my memory, there is no ACK in
> the
> > protocol, saying "I'm the client, I got the update successfully" and only
> > then it removed the "dirty" flag for that topic, for this watcher ID.
> >
> > Are there any other edge cases we can have? Let's be exhaustive.
> >
> >
> >
> > On Wed, Feb 22, 2023 at 1:14 PM houxiaoyu  wrote:
> >
> > > Thanks for your great suggestion Enrico.
> > >
> > > I agreed with you. It's more reasonable to add a
> > > `supports_partition_update_watchers`  in `FeatureFlags`  to detect that
> > the
> > > connected broker supporting this feature , and add a new broker
> > > configuration property `enableNotificationForPartitionUpdate` with
> > default
> > > value true, which is much like PIP-145.
> > >
> > > I have updated the descriptions.
> > >
> > > Enrico Olivelli  于2023年2月22日周三 17:26写道:
> > >
> > > > I support this proposal.
> > > > Coping here my comments from GH:
> > > >
> > > > can't we enable this by default in case we detect that the connected
> > > > Broker supports it ?
> > > > I can't find any reason for not using this mechanism if it is
> > available.
> > > >
> > > > Maybe we can set the default to "true" and allow users to disable it
> > > > in case it impacts their systems in an unwanted way.
> > > >
> > > > Maybe It would be useful to have a way to disable the mechanism on
> the
> > > > broker side as well
> > > >
> > > > Enrico
>