Re: [DISCUSS] PIP-267: Support multi-topic messageId deserialization to ack messages

2023-06-21 Thread Asaf Mesika
I'll continue this on Slack #dev and write the summary here.

Just to clarify any misunderstanding: My intention is to make Pulsar PIP
readable by anyone, which means: Adding the required background information
and explaining your idea in a way people can understand.

In light of this goal, I've introduced a PIP template to make it clear what
is missing and also switched to PRs to make discussion easier than in the
mailing list, thus making participation easier for everyone, which means
more feedback ==> clearer proposals.



On Tue, Jun 20, 2023 at 11:17 PM Rajan Dhabalia 
wrote:

> Hi Asaf,
>
> I really don't know what's your concern but it seems you don't have much
> understanding about Pulsar client/server protocol or you really would like
> to block the PIP. I tried to answer your concerns but let me try again to
> add more context about the implementation if that something can help you:
> this PIP makes change only in protobuf of message-id which is in
> implementation named as MessageIdData and it uses to serialize and
> deserialize messageId for the users. and this PIP is adding a new field to
> support messageId deserialization for partition-topic or multi-consumer
> topics.
> Now, does it impact wire protocol and will the client start sending this
> newly added field topic-name to broker? then answer is no because while
> sending ack command to broker client creates messageID where it doesn't set
> this field [1] and this new field only used during message
> serialization/deserialization when client app calls
> toByteArray()/fromByteArray() methods. so, this should not add any n/w
> overhead for the payload when client sends ack command to broker.
> [1]
>
> https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/protocol/Commands.java#L1018
>
> I am not sure if that helps you to answer the question or I should try to
> talk about Pulsar client-server protocol implementation here but we can
> help you in slack#dev channel if you have more implementation questions.
>
> Thanks,
> Rajan
>
>
>
> On Tue, Jun 20, 2023 at 11:31 AM Asaf Mesika 
> wrote:
>
> > On Tue, Jun 20, 2023 at 9:39 AM Rajan Dhabalia 
> > wrote:
> >
> > > > So you say in that sentence that you will add the topic name into
> > > MessageIdData. MessageIdData is defined in PulsarApi.proto and is
> > > transferred over the wire, so how can you add the topic to this class
> > > without changing the wire protocol?
> > > Yes, the client creates a separate MessageId while creating a
> serialized
> > > payload for acking where it doesn't set or send topicname and it won't
> > > change the payload.
> > >
> > >
> > But it contradicts what you wrote in the design doc. I'm sorry, but I
> don't
> > get it.
> > Can you please help me understand this by elaborating so anyone,
> including
> > me, can fully understand it?
> > Preferably all your answers should be injected into the document, of
> > course.
> >
> > Thanks!
> >
> > Asaf
> >
> >
> >
> > > Thanks,
> > > Rajan
> > >
> > > On Mon, Jun 19, 2023 at 5:45 AM Asaf Mesika 
> > wrote:
> > >
> > > > First, let me add some data that should be added to the Background
> > > section
> > > > of the PIP since I had to reverse engineer the code to understand
> that,
> > > > which is the opposite of the goal of a design document.
> > > >
> > > > 
> > > > Pulsar Broker has a binary protocol, which allows the client to
> consume
> > > > messages, acknowledge them, and much more. The protocol comprises
> > > Commands
> > > > containing the data needed to apply that Command on the broker side.
> > Many
> > > > commands allow a consumer (client) to acknowledge messages, among
> them:
> > > > CommandSendReceipt, CommandSend, CommandAck, and more. All those
> > commands
> > > > use the message type MessageIdData to specify the details of the
> > message
> > > to
> > > > acknowledge.
> > > >
> > > > Here's what this data structure looks like:
> > > > message MessageIdData {
> > > > required uint64 ledgerId = 1;
> > > > required uint64 entryId = 2;
> > > > optional int32 partition = 3 [default = -1];
> > > > optional int32 batch_index = 4 [default = -1];
> > > > repeated int64 ack_set = 5;
> > > > optional int32 batch_size = 6;
> > > >
> > > > // For the chunk message id, we need to specify the first chunk
> message
> > > id.
> > > > optional MessageIdData first_chunk_message_id = 7;
> > > > }
> > > >
> > > > The key fields are the ledgerID at which the message is contained and
> > > > entryId, which indicates the offset inside the ledger (message
> number).
> > > >
> > > > The client uses a class named MessageIdData which is the
> auto-generated
> > > > code representing the message MessageIdData.
> > > > -
> > > >
> > > > Now, in the design, you wrote:
> > > >
> > > > > Thefore, we need to add topic-name into MessageIdData and allow
> > > > > multi-topic/partitioned topic to deserialize message correctly so,
> > API
> > > > like
> > > > > acknowledge can perfor

Re: Has anyone EVER gotten a Python function to work with Avro??

2023-06-21 Thread Pengcheng Jiang
Hello Devin,

The support for the avro scheme in Python function is just added and
released in v3.0.0

There is an example of using avro in Python function:
https://github.com/apache/pulsar/blob/660525e57ed35b74cb9204521d1fba02cc08c542/pulsar-functions/python-examples/avro_schema_test_function.py

And we can submit the test function via the following:

```
bin/pulsar-admin functions create --name test-avro-py --tenant public
--namespace default \
--inputs persistent://public/default/test-input \
--output persistent://public/default/test-output \
--py avro_schema_test_function.py \
--className avro_schema_test_function.AvroSchemaTestFunction \
--schema-type avro \
--input-type-class-name avro_schema_test_function.AvroTestObject \
--output-type-class-name avro_schema_test_function.AvroTestObject
```

Sincerely
Pengcheng Jiang

Devin Bost  于2023年6月21日周三 08:35写道:

> After many of my own attempts, research, digging through source code, and
> speaking with folks in various channels in the community, I'm starting to
> wonder if *anyone* has *ever* successfully gotten Avro to work with Python
> Functions.
>
> (I don't just mean ingesting a byte array with fastavro but actually using
> the built-in schema support that Pulsar Python functions are intended to
> support - hence the purpose of combining built-in Avro internals with
> multi-language support.) This capability is a core part of the community
> offering to support Python, and as we've standardized on Avro internals,
> I'm concerned we may have a gap in our ability to support this combination
> of technologies, which can impact adoption in organizations that have a
> heavy investment in both Python and Java (such as for different teams) when
> Avro has already been standardized on.
>
> I've brought this question up in various places/groups for almost 3 years
> now, and I'm starting to wonder if *nobody* has actually done it.
>
> I've seen examples of using Python producers and consumers with Avro, but
> the interaction is different because those interfaces allow the Schema to
> be explicitly specified. It's not clear from the source code how (or if)
> this can be done currently with the Python Functions API.
>
> If there's a feature gap here, then we need to decide if it's a priority to
> address. This is becoming increasingly important as the Python userbase is
> growing significantly, but I'd like to hear thoughts from others,
> especially since Lari recently asked if we should be considering wider
> changes to the Function API internals.
>
> Devin G. Bost
>


Re: [DISCUSS] Pluggable Pulsar Functions runtime to support new runtimes

2023-06-21 Thread Asaf Mesika
Lari, would it be possible to explain in more detail the paint points
you're describing?

You say processing messages individually is slow; hence, processing them in
batches is better. I guess it's especially useful if you need to group a
batch based on a key. What I don't understand is how the framework today
limits you from using something like a reactive client which does the
batching inside.

On Tue, Jun 20, 2023 at 10:33 AM Lari Hotari  wrote:

> Dear Pulsar Community Members,
>
> I would like to initiate a discussion on making the Pulsar Functions
> runtime "pluggable". In doing so, we can ensure that the addition of new
> runtime types becomes more straightforward.
>
> This use case will allow us to add support for Pulsar Functions based on
> various platforms such as:
>
> * Pulsar Client Reactive
> * Node.js / JavaScript
> * WebAssembly (WASM)
> * Spring Pulsar & Reactive Spring
>
> One of the weak points in the current Pulsar Functions runtime is the
> default handling of messages individually. Individual message processing
> can be slow and inefficient in cases where the main function of the
> Pulsar Function (or Sink) is to do backend API calls.
>
> Although pipelining (processing multiple in-flight messages) is possible
> in current Pulsar Functions and Sinks, it often leads to complex and
> error-prone solutions, especially when there's a need to combine
> key-based ordered processing with retry and backoff implementations.
>
> The Reactive Pulsar Client provides an inbuilt solution for implementing
> pipelining. With its ReactiveMessagePipelineBuilder, we can configure
> concurrency levels with key-ordered processing support. This capability
> could potentially eliminate the need to use key-shared subscriptions to
> scale Pulsar processing. If a reactive Pulsar Function were primarily to
> serve as a router for API calls, we could adjust the concurrency level
> to hundreds or even thousands, provided the backend could handle the
> load.
>
> With a pluggable Pulsar Functions runtime, we could introduce new
> runtime types without the need for implementing each type in the
> upstream project. This strategy could likely lead to new opportunities
> for innovative ideas and contributions in this field.
>
> I am interested to know your thoughts on making the Pulsar Functions
> runtime pluggable so that we can add new runtime types.
>
> Best Regards,
>
> -Lari
>


Re: New pip process reminder

2023-06-21 Thread Zixuan Liu
I think we can reference https://www.apache.org/foundation/voting.html

> Votes on code modifications follow a different model. In this scenario, a 
> negative vote constitutes a veto , which the voting group (generally the PMC 
> of a project) cannot override. Again, this model may be modified by a lazy 
> consensus declaration when the request for a vote is raised, but the 
> full-stop nature of a negative vote does not change. Under normal (non-lazy 
> consensus) conditions, the proposal requires three positive votes and no 
> negative votes in order to pass; if it fails to garner the requisite amount 
> of support, it doesn't. Then the proposer either withdraws the proposal or 
> modifies the code and resubmits it, or the proposal simply languishes as an 
> open issue until someone gets around to removing it.

It seems that there is no need for three binding votes for code
modifications. If I am wrong, please point it out.

Thanks,
Zixuan

Asaf Mesika  于2023年6月21日周三 14:59写道:
>
> I'm not a committer or PMC member, so I can't comment on this.
>
> I am curious to know the difference between other Apache projects and other
> foundation projects, such as CNCF, if you know about it.
> Do you think the Apache Foundation's view on individuals, not part of a
> commercial entity, does not live up to today's state of affairs?
>
> On Tue, Jun 20, 2023 at 10:40 PM Rajan Dhabalia 
> wrote:
>
> > Hi,
> >
> > > (" a lazy majority of at least 3 binding +1s votes")
> >
> > I don't think it's fair at this stage where majority Pulsar committers are
> > mostly part of one enterprise and only their PIP/PRs are moving forward and
> > PR/PIP created by other community members get blocked or not reviewed
> > without any major reasons. I can list down many different examples but I
> > don't want to start that destructive discussion for now but I strongly ask
> > to help other community members to let them contribute to Pulsar so, we can
> > grow Pulsar community and let Pulsar be at the stage where it has
> > committers from various different institutions and we have good number of
> > reviewers to review PIP/PR on time.
> > Right now, there are many examples where PRs are sitting unreviewed for a
> > long time and we have to fix it first by encouraging and having more
> > committers/reviewers across multiple organizations as a part of the Pulsar
> > community. So, this is not the right time to restrict and this is
> > indirectly making it difficult for many Pulsar committers and contributors
> > who don't belong to specific enterprises.
> >
> > Thanks,
> > Rajan
> >
> >
> >
> >
> > On Tue, Jun 20, 2023 at 12:14 PM Asaf Mesika 
> > wrote:
> >
> > > Hi,
> > >
> > > This is just a reminder that PMC/Committers can only merge the PIP PR
> > when
> > > the vote thread is concluded and in a positive manner, as described (" a
> > > lazy
> > > majority of at least 3 binding +1s votes")
> > >
> > > So please, before clicking that merge button, double-check those two
> > > conditions
> > >
> > > Thanks!
> > >
> > > Asaf
> > >
> >


Re: New pip process reminder

2023-06-21 Thread Asaf Mesika
On Wed, Jun 21, 2023 at 10:27 AM Zixuan Liu  wrote:

> I think we can reference https://www.apache.org/foundation/voting.html
>
> > Votes on code modifications follow a different model. In this scenario,
> a negative vote constitutes a veto , which the voting group (generally the
> PMC of a project) cannot override. Again, this model may be modified by a
> lazy consensus declaration when the request for a vote is raised, but the
> full-stop nature of a negative vote does not change. Under normal (non-lazy
> consensus) conditions, the proposal requires three positive votes and no
> negative votes in order to pass; if it fails to garner the requisite amount
> of support, it doesn't. Then the proposer either withdraws the proposal or
> modifies the code and resubmits it, or the proposal simply languishes as an
> open issue until someone gets around to removing it.
>
> It seems that there is no need for three binding votes for code
> modifications. If I am wrong, please point it out.
>
> I believe you may be wrong.

Lazy Consensus is described here
 as:

Lazy consensus is simply an announcement of 'silence gives assent.' When
> someone wants to determine the sense of the community this way, they might
> do so with a mail message such as:
> "The patch below fixes bug #8271847; if no-one objects within three
> days, I'll assume lazy consensus and commit it."
> You cannot apply lazy consensus to code changes when the
> review-then-commit
>  policy
> is in effect.


My understanding is that for the PIP process, we are using a
review-then-commit policy, which actually means we can't use lazy consensus.

The definition of a Lazy Consensus defined here
 is:

A decision-making policy which assumes general consent if no responses are
> posted within a defined period. For example, "I'm going to commit this by
> lazy consensus if no-one objects within the next three days." Also see 
> Consensus
> Approval
>  , Majority
> Approval
>  , and
> the description of the voting process
> .



So if I summarize, a PIP needs to follow the "the proposal requires three
positive votes and no negative votes in order to pass;"


> Thanks,
> Zixuan
>
> Asaf Mesika  于2023年6月21日周三 14:59写道:
> >
> > I'm not a committer or PMC member, so I can't comment on this.
> >
> > I am curious to know the difference between other Apache projects and
> other
> > foundation projects, such as CNCF, if you know about it.
> > Do you think the Apache Foundation's view on individuals, not part of a
> > commercial entity, does not live up to today's state of affairs?
> >
> > On Tue, Jun 20, 2023 at 10:40 PM Rajan Dhabalia 
> > wrote:
> >
> > > Hi,
> > >
> > > > (" a lazy majority of at least 3 binding +1s votes")
> > >
> > > I don't think it's fair at this stage where majority Pulsar committers
> are
> > > mostly part of one enterprise and only their PIP/PRs are moving
> forward and
> > > PR/PIP created by other community members get blocked or not reviewed
> > > without any major reasons. I can list down many different examples but
> I
> > > don't want to start that destructive discussion for now but I strongly
> ask
> > > to help other community members to let them contribute to Pulsar so,
> we can
> > > grow Pulsar community and let Pulsar be at the stage where it has
> > > committers from various different institutions and we have good number
> of
> > > reviewers to review PIP/PR on time.
> > > Right now, there are many examples where PRs are sitting unreviewed
> for a
> > > long time and we have to fix it first by encouraging and having more
> > > committers/reviewers across multiple organizations as a part of the
> Pulsar
> > > community. So, this is not the right time to restrict and this is
> > > indirectly making it difficult for many Pulsar committers and
> contributors
> > > who don't belong to specific enterprises.
> > >
> > > Thanks,
> > > Rajan
> > >
> > >
> > >
> > >
> > > On Tue, Jun 20, 2023 at 12:14 PM Asaf Mesika 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > This is just a reminder that PMC/Committers can only merge the PIP PR
> > > when
> > > > the vote thread is concluded and in a positive manner, as described
> (" a
> > > > lazy
> > > > majority of at least 3 binding +1s votes")
> > > >
> > > > So please, before clicking that merge button, double-check those two
> > > > conditions
> > > >
> > > > Thanks!
> > > >
> > > > Asaf
> > > >
> > >
>


Re: New pip process reminder

2023-06-21 Thread tison
> mostly part of one enterprise and only their PIP/PRs are moving forward

No. PIPs are processed in a vendor natural bias. At least the difference
between lazy consensus and at least 3 +1 binding votes won't change it.

> help other community members to let them contribute to Pulsar so

I'm doing so and you can do so. As you may know I help with nudging some of
your PRs and send an email about promoting client contribution to the PMC
list in the last day. It's a separate topic that community members can
contribute to. Someone spends time to try to formalize the PIP process and
make these proposals clear and workable. It won't block "help other
community members". If you ask a person _not_ to do A, it doesn't mean they
_will_ do B.

> there are many examples where PRs are sitting unreviewed for a long time

At the end of the last year, there are over 400 open PRs. And now it's
about 250+. I'm actively handling them and with several discussions with
other committers I know it's not easy to _ask_ people to spend time
reviewing others' patches. But following a tit-for-tat strategy I'm now
happy to cooperate with Jiwei, Yunzu, Lari, Michael, Zixuan, and so on.
There should not be a blocker; if so, you can discuss it on the private@
mailing list - that's a certain issue we should handle for a better
community.

> indirectly making it difficult for many Pulsar committers and contributors

>From my observation, Asaf is actively reviewing almost all PIPs following
the rule he proposed so the bandwidth effectively grows instead of
decreases. 3 binding +1 votes are not a far more strict rule than lazy
consensus with any proposer. According to the proposal you made, it
requires spending more time on community traffic to help contributions make
progress and it should be in the same way as the current PIP process
direction (more eyes). "Let's skip voting and let it pass" won't help in
delivering high-quality design implementation which are PIPs aimed at.

Best,
tison.


Asaf Mesika  于2023年6月21日周三 15:59写道:

> On Wed, Jun 21, 2023 at 10:27 AM Zixuan Liu  wrote:
>
> > I think we can reference https://www.apache.org/foundation/voting.html
> >
> > > Votes on code modifications follow a different model. In this scenario,
> > a negative vote constitutes a veto , which the voting group (generally
> the
> > PMC of a project) cannot override. Again, this model may be modified by a
> > lazy consensus declaration when the request for a vote is raised, but the
> > full-stop nature of a negative vote does not change. Under normal
> (non-lazy
> > consensus) conditions, the proposal requires three positive votes and no
> > negative votes in order to pass; if it fails to garner the requisite
> amount
> > of support, it doesn't. Then the proposer either withdraws the proposal
> or
> > modifies the code and resubmits it, or the proposal simply languishes as
> an
> > open issue until someone gets around to removing it.
> >
> > It seems that there is no need for three binding votes for code
> > modifications. If I am wrong, please point it out.
> >
> > I believe you may be wrong.
>
> Lazy Consensus is described here
>  as:
>
> Lazy consensus is simply an announcement of 'silence gives assent.' When
> > someone wants to determine the sense of the community this way, they
> might
> > do so with a mail message such as:
> > "The patch below fixes bug #8271847; if no-one objects within three
> > days, I'll assume lazy consensus and commit it."
> > You cannot apply lazy consensus to code changes when the
> > review-then-commit
> > 
> policy
> > is in effect.
>
>
> My understanding is that for the PIP process, we are using a
> review-then-commit policy, which actually means we can't use lazy
> consensus.
>
> The definition of a Lazy Consensus defined here
>  is:
>
> A decision-making policy which assumes general consent if no responses are
> > posted within a defined period. For example, "I'm going to commit this by
> > lazy consensus if no-one objects within the next three days." Also see
> Consensus
> > Approval
> >  ,
> Majority
> > Approval
> >  , and
> > the description of the voting process
> > .
>
>
>
> So if I summarize, a PIP needs to follow the "the proposal requires three
> positive votes and no negative votes in order to pass;"
>
>
> > Thanks,
> > Zixuan
> >
> > Asaf Mesika  于2023年6月21日周三 14:59写道:
> > >
> > > I'm not a committer or PMC member, so I can't comment on this.
> > >
> > > I am curious to know the difference between other Apache projects and
> > other
> > > foundation projects, such as CNCF, if you know about it.
> > > Do you think the Apache Foundation's view o

Re: [DISCUSS] Pluggable Pulsar Functions runtime to support new runtimes

2023-06-21 Thread Lari Hotari
On 2023/06/20 09:12:28 Enrico Olivelli wrote:
> > I am interested to know your thoughts on making the Pulsar Functions
> > runtime pluggable so that we can add new runtime types.
> 
> I see that RuntimeFactory [1] is already customizable.
> What can we do more ?
> Are you talking about providing alternative implementations for
> JavaInstanceRunnable [2] ?

My intention was to first focus on the use case before getting into the details 
of how it would exactly be implemented. With a pluggable solution, I mean 
having a solution in place where you could possibly add .nar files to some 
directory and add support for new runtime types by implementing some plugin 
specification. The current solution doesn't contain this property.
A pluggable solution would make it easier for contributing new runtime types. 
Let's say if we would want to add support for these technologies:
* functions written in Node.js / JavaScript
* functions using WebAssembly (WASM), for example implemented in Rust

Makes sense?

-Lari


Re: [DISCUSS] Pluggable Pulsar Functions runtime to support new runtimes

2023-06-21 Thread Lari Hotari
On 2023/06/21 07:21:31 Asaf Mesika wrote:
> Lari, would it be possible to explain in more detail the paint points
> you're describing?
 
Well the point of the pluggable Function runtime types is to support other 
technologies. Let's forget the reactive messaging solution for a moment.
With a pluggable solution, I mean having a solution in place where you could 
possibly add .nar files to some directory and add support for new runtime types 
by implementing some plugin specification. The current solution doesn't contain 
this property.
A pluggable solution would make it easier for contributing new runtime types.
Let's say if we would want to add support for these technologies:
* functions written in Node.js / JavaScript
* functions using WebAssembly (WASM), for example implemented in Rust that also 
compiles to WASM.

> You say processing messages individually is slow; hence, processing them in
> batches is better. I guess it's especially useful if you need to group a
> batch based on a key. What I don't understand is how the framework today
> limits you from using something like a reactive client which does the
> batching inside.

I didn't say anything about batches. It's about pipelining. That means that you 
have multiple messages "in flight". That is different than batching. The most 
well known example of pipelining is HTTP pipelining [1].
Pulsar Functions already supports async functions which are functions that have 
a method that returns a CompetableFuture type. To limit the amount of messages 
"in flight", the worker config includes a setting "maxPendingAsyncRequests" [2] 
which defaults to 1000. It is odd that the setting is at worker config level 
and not at the function level.
Reactive Streams is not about batching. One of the clear benefits over plain 
async programming is that there's a well defined way for handling backpressure. 
For any high scale system handling backpressure (== flow control) is one of the 
core concerns.

In this case, if there was a pluggable Pulsar Functions runtime, it would be 
possible to add a runtime type optimized for Reactive Pulsar. That could also 
enable using Spring Pulsar in Reactive mode with the rest of Reactive Spring. 

The current .nar plugin packaging is a mess. If you take a look of what goes 
inside a .nar file, it is a mess. There are classes that shouldn't be there. 
The .nar plugin creation is a very slow and inefficient. I can provide details 
if you are interested to know. 

With pluggable Pulsar Functions runtime, it would also be possible to create a 
cleaner packaging for JVM functions. Packaging for different ecosystems like 
Quarkus and Spring Boot could be optimized for those ecosystems and not the 
other way around where Pulsar's outdated .nar packaging is dictating the 
options.

In addition, the Pulsar Functions have a missing piece in how functions are 
mapped to instances. It's not very efficient to even run each and every 
function as a separate deployable entity. The cost of each independent JVMs is 
high. It would be also better to have a model where where could be a group of 
functions that are provided by one instance and always run together. Having 
this option could bring down the cost and also improve the developer 
experience. The framework shouldn't require the developer that each individual 
function is deployed in a separate .jar file which gets run in a separate JVM. 

So you asked if there is pain with Pulsar Functions. There definitely is. 
Instead of causing more fragmentation in the ecosystem with multiple pluggable 
infrastructure layers, we should make the core upstream offering better. 

I'd also like to see a deployment option for Pulsar Functions where you could 
choose to not deploy Pulsar Functions with pulsar-admin and instead package the 
functions in an application that you deploy in Kubernetes with helm or whatever 
way you choose to do that.
This could also be taken into account when designing the pluggable Pulsar 
Functions runtime. 

StreamNative's Function Mesh [3] takes a different approach to Pulsar Function 
life cycle management. That might be a good fit in many cases. 
However, we should have a way where Pulsar Functions could be deployed without 
any central management solution, as ordinary applications. 

Perhaps everyone is happy with the current way Pulsar Functions are. If 
everyone is already satisfied, things won't improve. Do we want to make Pulsar 
more popular and easier for our users? Do we care about supporting node.js / 
Javascript / Typescript or new languages like Rust? If we do, we better start 
thinking of adding that support. I would like to propose that we make adding 
new runtime types easy by making it "pluggable". That could mean multiple 
things and that's why we are having this discussion. I hope others could also 
chime in.

-Lari

[1] https://en.wikipedia.org/wiki/HTTP_pipelining
[2] 
https://github.com/apache/pulsar/blob/f7c0b3c49c9ad8c28d0b00aa30d727850eb8bc04/pulsar-func

Re: New pip process reminder

2023-06-21 Thread tison
Looking into the discussion and reviewing our membership list, from module
experts' distribution perspective, I agree that specific modules can have
fewer PMC members overseeing.

According to my experience in the Flink community[1] and the natural that
improvement proposals are mainly about development, I suggest we regard
committers (PMC members are committers) vote as binding vote for a PIP.

In this way, we can trade off the more eye target with frequent development.

Best,
tison.

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws#FlinkBylaws-Actions


tison  于2023年6月21日周三 18:41写道:

> > mostly part of one enterprise and only their PIP/PRs are moving forward
>
> No. PIPs are processed in a vendor natural bias. At least the difference
> between lazy consensus and at least 3 +1 binding votes won't change it.
>
> > help other community members to let them contribute to Pulsar so
>
> I'm doing so and you can do so. As you may know I help with nudging some
> of your PRs and send an email about promoting client contribution to the
> PMC list in the last day. It's a separate topic that community members can
> contribute to. Someone spends time to try to formalize the PIP process and
> make these proposals clear and workable. It won't block "help other
> community members". If you ask a person _not_ to do A, it doesn't mean they
> _will_ do B.
>
> > there are many examples where PRs are sitting unreviewed for a long
> time
>
> At the end of the last year, there are over 400 open PRs. And now it's
> about 250+. I'm actively handling them and with several discussions with
> other committers I know it's not easy to _ask_ people to spend time
> reviewing others' patches. But following a tit-for-tat strategy I'm now
> happy to cooperate with Jiwei, Yunzu, Lari, Michael, Zixuan, and so on.
> There should not be a blocker; if so, you can discuss it on the private@
> mailing list - that's a certain issue we should handle for a better
> community.
>
> > indirectly making it difficult for many Pulsar committers and
> contributors
>
> From my observation, Asaf is actively reviewing almost all PIPs following
> the rule he proposed so the bandwidth effectively grows instead of
> decreases. 3 binding +1 votes are not a far more strict rule than lazy
> consensus with any proposer. According to the proposal you made, it
> requires spending more time on community traffic to help contributions make
> progress and it should be in the same way as the current PIP process
> direction (more eyes). "Let's skip voting and let it pass" won't help in
> delivering high-quality design implementation which are PIPs aimed at.
>
> Best,
> tison.
>
>
> Asaf Mesika  于2023年6月21日周三 15:59写道:
>
>> On Wed, Jun 21, 2023 at 10:27 AM Zixuan Liu  wrote:
>>
>> > I think we can reference https://www.apache.org/foundation/voting.html
>> >
>> > > Votes on code modifications follow a different model. In this
>> scenario,
>> > a negative vote constitutes a veto , which the voting group (generally
>> the
>> > PMC of a project) cannot override. Again, this model may be modified by
>> a
>> > lazy consensus declaration when the request for a vote is raised, but
>> the
>> > full-stop nature of a negative vote does not change. Under normal
>> (non-lazy
>> > consensus) conditions, the proposal requires three positive votes and no
>> > negative votes in order to pass; if it fails to garner the requisite
>> amount
>> > of support, it doesn't. Then the proposer either withdraws the proposal
>> or
>> > modifies the code and resubmits it, or the proposal simply languishes
>> as an
>> > open issue until someone gets around to removing it.
>> >
>> > It seems that there is no need for three binding votes for code
>> > modifications. If I am wrong, please point it out.
>> >
>> > I believe you may be wrong.
>>
>> Lazy Consensus is described here
>>  as:
>>
>> Lazy consensus is simply an announcement of 'silence gives assent.' When
>> > someone wants to determine the sense of the community this way, they
>> might
>> > do so with a mail message such as:
>> > "The patch below fixes bug #8271847; if no-one objects within three
>> > days, I'll assume lazy consensus and commit it."
>> > You cannot apply lazy consensus to code changes when the
>> > review-then-commit
>> > 
>> policy
>> > is in effect.
>>
>>
>> My understanding is that for the PIP process, we are using a
>> review-then-commit policy, which actually means we can't use lazy
>> consensus.
>>
>> The definition of a Lazy Consensus defined here
>>  is:
>>
>> A decision-making policy which assumes general consent if no responses are
>> > posted within a defined period. For example, "I'm going to commit this
>> by
>> > lazy consensus if no-one objects within the next three days." Also see
>> Consensus
>> > Approval
>> > <

Re: [VOTE] PIP-267: Support multi-topic messageId deserialization to ack messages

2023-06-21 Thread 徐昀泽
+1 (binding)

Though I agree with Asaf that this proposal itself is not clear, I think the 
design
Is easy to understand from the PR that a new field is added only for 
serialization
and deserialization for a MessageId.

Thanks,
Yunze

> On Jun 21, 2023, at 03:08, Asaf Mesika  wrote:
> 
> -1 (non-binding)
> 
> The reason I'm asking all these questions on the DISCUSS is that I still
> haven't managed to understand how you plan to solve the pain described.
> Not to mention the lack of information in the design document I mentioned
> in my replies to the discussion
> 
> This DISCUSS thread is not resolved yet from my point of view.
> The design document is not clear to me at all.
> 
> Hence I would like to continue to understand it in the discussion thread.
> 
> 
> On Tue, Jun 20, 2023 at 10:00 AM Rajan Dhabalia 
> wrote:
> 
>> Hi.
>> 
>> Pulsar api provides MessageId interface which is generally used by producer
>> and consumer applications to manage topic offset. Sometimes, these
>> applications would like to serialize and deserialize messageIds,
>> specifically consumer app which would like to persist messageId and ack
>> with those messageIds by deserializing them. However, right now Pulsar
>> doesn't support correct deserialization of multi-topic or partitioned-topic
>> because of that 1acknowledge` API call fails for those topics with below
>> error:
>> "Only TopicMessageId is allowed to acknowledge for a multi-topics consumer"
>> 
>> Please visit PIP for any suggestions:
>> https://github.com/apache/pulsar/issues/20221
>> 
>> This PIP is created with PR: https://github.com/apache/pulsar/pull/19944
>> 
>> Thanks,
>> Rajan
>> 



Re: [VOTE] PIP-267: Support multi-topic messageId deserialization to ack messages

2023-06-21 Thread PengHui Li
Hi Rajan,

I think we discuss the newly added field in the
PulsarApi.proto at
https://github.com/apache/pulsar/pull/19944#discussion_r1153963425
But the proposal doesn't mention it.

Although I know why we need to add that field to the proto file
to avoid introducing many changes to the client side since the client
had mixed the public API and internal data structure. But, in order
to give everyone who wants to know why this field was added in the future.
It should be clearly explained. The PIP is not only for
Pulsar's experienced veterans.
Only when we provide the context more comprehensively and keep it
transparent enough,
contributors get involved in the PIP review rather than wait until he
thinks he is
familiar enough with Pulsar to participate.

I will cast +0 here. I agree with the motivation and the solution because I
have the context
from the previous discussion under the pull request. But the proposal could
be more friendly to
community users and contributors.

Thanks,
Penghui

On Thu, Jun 22, 2023 at 12:08 AM 徐昀泽  wrote:

> +1 (binding)
>
> Though I agree with Asaf that this proposal itself is not clear, I think
> the design
> Is easy to understand from the PR that a new field is added only for
> serialization
> and deserialization for a MessageId.
>
> Thanks,
> Yunze
>
> > On Jun 21, 2023, at 03:08, Asaf Mesika  wrote:
> >
> > -1 (non-binding)
> >
> > The reason I'm asking all these questions on the DISCUSS is that I still
> > haven't managed to understand how you plan to solve the pain described.
> > Not to mention the lack of information in the design document I mentioned
> > in my replies to the discussion
> >
> > This DISCUSS thread is not resolved yet from my point of view.
> > The design document is not clear to me at all.
> >
> > Hence I would like to continue to understand it in the discussion thread.
> >
> >
> > On Tue, Jun 20, 2023 at 10:00 AM Rajan Dhabalia 
> > wrote:
> >
> >> Hi.
> >>
> >> Pulsar api provides MessageId interface which is generally used by
> producer
> >> and consumer applications to manage topic offset. Sometimes, these
> >> applications would like to serialize and deserialize messageIds,
> >> specifically consumer app which would like to persist messageId and ack
> >> with those messageIds by deserializing them. However, right now Pulsar
> >> doesn't support correct deserialization of multi-topic or
> partitioned-topic
> >> because of that 1acknowledge` API call fails for those topics with below
> >> error:
> >> "Only TopicMessageId is allowed to acknowledge for a multi-topics
> consumer"
> >>
> >> Please visit PIP for any suggestions:
> >> https://github.com/apache/pulsar/issues/20221
> >>
> >> This PIP is created with PR:
> https://github.com/apache/pulsar/pull/19944
> >>
> >> Thanks,
> >> Rajan
> >>
>
>


Re: [VOTE] PIP-268: Add support of topic stats/stats-internal using

2023-06-21 Thread PengHui Li
> However, stats retrieval over HTTP API doesn’t work well in use cases
when users would like to access this API at a higher scale when a large
number of application nodes would like to use it over HTTP which could
overload brokers and sometimes makes broker irresponsive and impact admin
API performance. It’s also become difficult when Pulsar is deployed in the
cloud behind the SNI proxy and applications also want to access large-scale
stats information periodically over different HTTP port instead it would be
better if applications can fetch stats over on same binary protocol for
scalability and accessibility reasons.

>From the motivation of the proposal, resolving the performance issue with
the REST API
is the goal. Sorry, I haven't run a benchmark for the REST API, have you
tested it or could
you please figure out where is the bottleneck with the REST API solution?
And if users use
the new approach, what is the performance expectation here, get a 30% or
50% improvement?

And I'm not sure why it is difficult to fetch the stats behind the SNI
proxy, could you please explain
more? It will be helpful for the reviewers to understand the issue we want
to resolve.

For the client side API changes. You have InternalStatsOption as a param of
the method, but
I don't see any definition for it. It should be enum? And why not have a
pojo for it? It will be
more easier for users to know what should be set and what should not.

For the response data. The JSON string is not good for compatibility. And I
saw the discussion
in the DISCUSSION thread from you and Enrico. But it's not a guaranteed
solution from the API's
perspective. And for any other clients, cpp, go, rust, they must build
their own pojo based on
the REST API.

Thanks,
Penghui

On Tue, Jun 20, 2023 at 3:06 PM Rajan Dhabalia  wrote:

> Hi,
>
> I would like to start VOTE for :
> https://github.com/apache/pulsar/issues/20265
>
> Thanks,
> Rajan
>


Re: [VOTE] PIP-268: Add support of topic stats/stats-internal using

2023-06-21 Thread Rajan Dhabalia
Please find the response inline.

On Wed, Jun 21, 2023 at 5:53 PM PengHui Li  wrote:

> > However, stats retrieval over HTTP API doesn’t work well in use cases
> when users would like to access this API at a higher scale when a large
> number of application nodes would like to use it over HTTP which could
> overload brokers and sometimes makes broker irresponsive and impact admin
> API performance. It’s also become difficult when Pulsar is deployed in the
> cloud behind the SNI proxy and applications also want to access large-scale
> stats information periodically over different HTTP port instead it would be
> better if applications can fetch stats over on same binary protocol for
> scalability and accessibility reasons.
>
> From the motivation of the proposal, resolving the performance issue with
> the REST API
> is the goal. Sorry, I haven't run a benchmark for the REST API, have you
> tested it or could
> you please figure out where is the bottleneck with the REST API solution?
> And if users use
> the new approach, what is the performance expectation here, get a 30% or
> 50% improvement?
>
>> Main reasons are performance (you will see improvement due to http
overhead compare to tcp, limited threads on http-server side and definitely
performance compare to netty-tcp vs http-jetty server), user feasibility
(same reason why we have producer/consumer stats using binary protocol
where user can get the stats from the same producer/consumer entity without
having additional client creation for the stats), scale (allowing large
number of requests over binary protocol compare to http is more scalable).


> And I'm not sure why it is difficult to fetch the stats behind the SNI
> proxy, could you please explain
> more? It will be helpful for the reviewers to understand the issue we want
> to resolve.
>
>> SNI is layer-4 tcp layer protocol and does not support over the http.


> For the client side API changes. You have InternalStatsOption as a param of
> the method, but
> I don't see any definition for it. It should be enum? And why not have a
> pojo for it? It will be
> more easier for users to know what should be set and what should not.
>
I have added that param map after our last discussion which will be enum
to pass additional flags.


> For the response data. The JSON string is not good for compatibility. And I
> saw the discussion
> in the DISCUSSION thread from you and Enrico. But it's not a guaranteed
> solution from the API's
> perspective. And for any other clients, cpp, go, rust, they must build
> their own pojo based on
> the REST API.
>
Well JSON would be the better option to avoid data conversion and copy for
complex stats/internal-stats complex data-structures at broker and client
side, maintaining consistency between stats/internal-stats pojo-schema, and
accessing stats with consistent pojo definition.


> Thanks,
> Penghui
>
> On Tue, Jun 20, 2023 at 3:06 PM Rajan Dhabalia 
> wrote:
>
> > Hi,
> >
> > I would like to start VOTE for :
> > https://github.com/apache/pulsar/issues/20265
> >
> > Thanks,
> > Rajan
> >
>