[GitHub] [pulsar-adapters] eolivelli merged pull request #40: Update to Pulsar 2.11.0

2023-03-01 Thread via GitHub


eolivelli merged PR #40:
URL: https://github.com/apache/pulsar-adapters/pull/40


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-adapters] cbornet merged pull request #39: enable Reproducible Builds

2023-03-01 Thread via GitHub


cbornet merged PR #39:
URL: https://github.com/apache/pulsar-adapters/pull/39


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-adapters] cbornet merged pull request #37: [improve] Kafka adaptor - Handle partition topic.

2023-03-01 Thread via GitHub


cbornet merged PR #37:
URL: https://github.com/apache/pulsar-adapters/pull/37


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-adapters] cbornet merged pull request #36: [pulsar-kafka] Fixed blockIfQueueFull config

2023-03-01 Thread via GitHub


cbornet merged PR #36:
URL: https://github.com/apache/pulsar-adapters/pull/36


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-adapters] RobertIndie opened a new pull request, #41: Only send notifications to commits@ mailing list

2023-03-01 Thread via GitHub


RobertIndie opened a new pull request, #41:
URL: https://github.com/apache/pulsar-adapters/pull/41

   
   
   ### Motivation
   
   See https://github.com/apache/pulsar-client-cpp/pull/42 and 
https://github.com/apache/pulsar-client-go/pull/861, The goal is to decrease 
unnecessary notifications on the dev@mailling list.
   
   
   ### Modifications
   
   * Update the notifications to match the apache/pulsar repo
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-adapters] tisonkun commented on pull request #40: Update to Pulsar 2.11.0

2023-03-01 Thread via GitHub


tisonkun commented on PR #40:
URL: https://github.com/apache/pulsar-adapters/pull/40#issuecomment-1449628442

   > That's weird. It compiles fine on my machine... The Spark version used is 
very old and there could be an incompatibility with JDK17
   
   @cbornet May I ask how this issue resolved finally? Since the change is 
force pushed it cannot be referred now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pulsar-adapters] tisonkun merged pull request #41: Only send notifications to commits@ mailing list

2023-03-01 Thread via GitHub


tisonkun merged PR #41:
URL: https://github.com/apache/pulsar-adapters/pull/41


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] Why not split `memoryLimit` into `consumerMemoryLimit ` and `producerMemoryLimit `

2023-03-01 Thread Zike Yang
Hi, JIaqi

Thanks for initiating this discussion.

Do you have any cases that need separate limit control for producers/consumers?

We have max pending queue size and max message size for the producer,
and receiver queue size for the consumer. Although they hardly control
the limit as the byte-granular, they seem to be sufficient for the
current needs.

If there is a strong case, I think we can consider this feature.

Thanks,
Zike Yang


Zike Yang

On Tue, Feb 28, 2023 at 10:25 AM Jiaqi Shen  wrote:
>
> Context.
> - PIP-74
> https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits
> - PR-8965 https://github.com/apache/pulsar/pull/8965
> - PR-15216 https://github.com/apache/pulsar/pull/15216
>
> Hello, community:
>
> There are some questions about PIP-74 I want to figure out.
>
> PIP-74 and its implementation specifie how to limit *client* memory. But in
> our scenario, the *client* is usually reused. It is more intuitive to limit
> a single *producer* or *consumer*. So why not let the producer/consumer
> have their own MemoryLimitController? And shoud we split the
> memoryLimit setting
> into consumerMemoryLimit and producerMemoryLimit?  Are there any other
> considerations to limit *client* memory?
>
> If you know why it needs to be designed like this, please leave your
> comment. Thanks!
>
> Thanks,
> Jiaqi Shen


Re: [DISCUSS] Why not split `memoryLimit` into `consumerMemoryLimit ` and `producerMemoryLimit `

2023-03-01 Thread ZhangJian He
Producers and Consumers share many things including memory limit. In some
strong cases, maybe use two pulsar clients, one for producer and other for
consumer, might be a good choice.

Thanks
ZhangJian He


On Wed, 1 Mar 2023 at 17:15, Zike Yang  wrote:

> Hi, JIaqi
>
> Thanks for initiating this discussion.
>
> Do you have any cases that need separate limit control for
> producers/consumers?
>
> We have max pending queue size and max message size for the producer,
> and receiver queue size for the consumer. Although they hardly control
> the limit as the byte-granular, they seem to be sufficient for the
> current needs.
>
> If there is a strong case, I think we can consider this feature.
>
> Thanks,
> Zike Yang
>
>
> Zike Yang
>
> On Tue, Feb 28, 2023 at 10:25 AM Jiaqi Shen 
> wrote:
> >
> > Context.
> > - PIP-74
> >
> https://github.com/apache/pulsar/wiki/PIP-74%3A-Pulsar-client-memory-limits
> > - PR-8965 https://github.com/apache/pulsar/pull/8965
> > - PR-15216 https://github.com/apache/pulsar/pull/15216
> >
> > Hello, community:
> >
> > There are some questions about PIP-74 I want to figure out.
> >
> > PIP-74 and its implementation specifie how to limit *client* memory. But
> in
> > our scenario, the *client* is usually reused. It is more intuitive to
> limit
> > a single *producer* or *consumer*. So why not let the producer/consumer
> > have their own MemoryLimitController? And shoud we split the
> > memoryLimit setting
> > into consumerMemoryLimit and producerMemoryLimit?  Are there any other
> > considerations to limit *client* memory?
> >
> > If you know why it needs to be designed like this, please leave your
> > comment. Thanks!
> >
> > Thanks,
> > Jiaqi Shen
>


Re: [DISCUSS] Apache Pulsar Adapters 2.11.0 release

2023-03-01 Thread Christophe Bornet
Thanks for your feedback.
I'll proceed with the release candidate.

Le ven. 24 févr. 2023 à 21:39, Dave Fisher  a écrit :
>
> There is a pulsar-connectors repository, but it was last changed in Nov 2020. 
> I think that Sijie intended to move out the connectors to there … that effort 
> should be either restarted or it should be made read only after leaving a 
> README.
>
> There is a pulsar-presto repository which was abandoned at the same time.
>
> This is turning into a different DISCUSSION
>
> Best,
> Dave
>
> Sent from my iPhone
>
> > On Feb 24, 2023, at 12:22 PM, Enrico Olivelli  wrote:
> >
> > +1
> > thanks
> >
> > I wonder if there are other sub-projects that we should release
> > together with 2.10.0
> >
> > Enrico
> >
> >> Il giorno ven 24 feb 2023 alle ore 17:33 Christophe Bornet
> >>  ha scritto:
> >>
> >> Hi everyone,
> >>
> >> The last release of the Pulsar Adapters was 2.8.0 in July 2021.
> >> Even though there is not much activity on this repo, there has been
> >> some bug fixes since then including one for the famous Log4Shell CVE.
> >> So I'd like to propose myself as a release manager for a Pulsar
> >> Adapters 2.11.0 release. I'll also update the dependency on Pulsar to
> >> use 2.11.
> >> Please tell if you have things you want to do, issues you want to see
> >> fixed or PRs you'd like to see merged before this release.
> >>
> >> Best regards.
> >>
> >> Christophe Bornet


[DISCUSS] PIP-252: Configurable compact topic retention

2023-03-01 Thread Elliot West
https://github.com/apache/pulsar/issues/19665

-- 

Elliot West

Senior Platform Engineer

elliot.w...@streamnative.io

streamnative.io






Re: [DISCUSS] Release Pulsar 2.10.4

2023-03-01 Thread Xiangying Meng
Hello, Pulsar community:

The cherry-pick of 2.10.4 is primarily completed.
Contains 92 PRs [0].
If you have some PRs that must be included in release-2.10.4, you can reply
to me in the email.
I will wait for these PRs to be completed before releasing 2.10.4.
If you have a PR that urgently needs to enter 2.10.4 but have no time to
cherry-pick it, you can also ping me to help you cherry-pick it.

Sincerely,
Xiangying
[0] -
https://github.com/apache/pulsar/pulls?q=is%3Amerged+is%3Apr+label%3Arelease%2F2.10.4+label%3Acherry-picked%2Fbranch-2.10


On Fri, Feb 3, 2023 at 5:28 PM Nicolò Boschi  wrote:

> +1
>
> There will be ~70 commits compared to 2.10.3 which I think it's a good
> amount of changes.
>
> Thanks,
> Nicolò Boschi
>
>
> Il giorno gio 2 feb 2023 alle ore 07:53 Haiting Jiang <
> jianghait...@gmail.com> ha scritto:
>
> > +1, It's about 3 months since the discussion of the 2.10.3 release.
> >
> > Haiting
> >
> > On Wed, Feb 1, 2023 at 11:26 AM Xiangying Meng 
> > wrote:
> > >
> > > Hello, Pulsar community:
> > >
> > > I'd like to propose releasing Apache Pulsar 2.10.4. It's been about one
> > > month since 2.10.3 was released.
> > >
> > > There are 45 PRs [0] needed to cherry-pick in branch-2.10. I will
> > > cherry-pick these PRs for branch-2.10. Exclude some PRs that merge
> > directly
> > > into branch-2.10.
> > >
> > > There are 21 PRs [1] opened. I'll follow up on each of those PRs to see
> > if
> > > they will be completed soon or will need to be pushed to 2.10.4
> > >
> > > If you have any important fixes or any questions, please reply to this
> > > email, and we will evaluate whether to include them in 2.10.4
> > >
> > > Thanks,
> > > Xiangying
> > > [0] -
> > >
> >
> https://github.com/apache/pulsar/pulls?q=is%3Amerged+is%3Apr+label%3Arelease%2F2.10.4+-label%3Acherry-picked%2Fbranch-2.10+
> > > [1] -
> > >
> >
> https://github.com/apache/pulsar/pulls?q=is%3Aopen+is%3Apr+label%3Arelease%2F2.10.4+-label%3Acherry-picked%2Fbranch-2.10+
> >
>


[DISCUSS] Release Pulsar Go Client 0.10.0

2023-03-01 Thread Zike Yang
Hi everyone,

I would like to propose releasing the Pulsar Go Client 0.10.0.

It has been several months since the last release. And there are
several new features and bug fixes in the master branch[0]. It’s time
to release a new version.

Please let me know if you have any PRs that need to be included in 0.10.0

[0] https://github.com/apache/pulsar-client-go/compare/v0.9.0...master

BR,
Zike Yang


Re: [DISCUSS] Release Pulsar Go Client 0.10.0

2023-03-01 Thread Zike Yang
I will include this PR
https://github.com/apache/pulsar-client-go/pull/968 to this release
since it's an important performance improvement.

BR,
Zike Yang

On Wed, Mar 1, 2023 at 8:25 PM Zike Yang  wrote:
>
> Hi everyone,
>
> I would like to propose releasing the Pulsar Go Client 0.10.0.
>
> It has been several months since the last release. And there are
> several new features and bug fixes in the master branch[0]. It’s time
> to release a new version.
>
> Please let me know if you have any PRs that need to be included in 0.10.0
>
> [0] https://github.com/apache/pulsar-client-go/compare/v0.9.0...master
>
> BR,
> Zike Yang


Re: [DISCUSS] Release Pulsar Go Client 0.10.0

2023-03-01 Thread Baodi Shi
Hi, zike.

The current pulsar-client-go master branch has some flay-test. There may be
some internal bugs, I think we need to wait for them to be fixed.

   - https://github.com/apache/pulsar-client-go/issues/971


Thanks,
Baodi Shi


在 2023年3月1日 20:26:10 上,Zike Yang  写道:

> I will include this PR
> https://github.com/apache/pulsar-client-go/pull/968 to this release
> since it's an important performance improvement.
>
> BR,
> Zike Yang
>
> On Wed, Mar 1, 2023 at 8:25 PM Zike Yang  wrote:
>
>
> Hi everyone,
>
>
> I would like to propose releasing the Pulsar Go Client 0.10.0.
>
>
> It has been several months since the last release. And there are
>
> several new features and bug fixes in the master branch[0]. It’s time
>
> to release a new version.
>
>
> Please let me know if you have any PRs that need to be included in 0.10.0
>
>
> [0] https://github.com/apache/pulsar-client-go/compare/v0.9.0...master
>
>
> BR,
>
> Zike Yang
>
>


Re: [DISCUSS] Release Pulsar Go Client 0.10.0

2023-03-01 Thread Yunze Xu
Please wait for a performance fix for the case when batch index ACK is
enabled, I'm working on it. Currently, the throughput cannot exceed
even 20MB/s when it's enabled.

Thanks,
Yunze

On Wed, Mar 1, 2023 at 8:39 PM Baodi Shi  wrote:
>
> Hi, zike.
>
> The current pulsar-client-go master branch has some flay-test. There may be
> some internal bugs, I think we need to wait for them to be fixed.
>
>- https://github.com/apache/pulsar-client-go/issues/971
>
>
> Thanks,
> Baodi Shi
>
>
> 在 2023年3月1日 20:26:10 上,Zike Yang  写道:
>
> > I will include this PR
> > https://github.com/apache/pulsar-client-go/pull/968 to this release
> > since it's an important performance improvement.
> >
> > BR,
> > Zike Yang
> >
> > On Wed, Mar 1, 2023 at 8:25 PM Zike Yang  wrote:
> >
> >
> > Hi everyone,
> >
> >
> > I would like to propose releasing the Pulsar Go Client 0.10.0.
> >
> >
> > It has been several months since the last release. And there are
> >
> > several new features and bug fixes in the master branch[0]. It’s time
> >
> > to release a new version.
> >
> >
> > Please let me know if you have any PRs that need to be included in 0.10.0
> >
> >
> > [0] https://github.com/apache/pulsar-client-go/compare/v0.9.0...master
> >
> >
> > BR,
> >
> > Zike Yang
> >
> >


Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-01 Thread Asaf Mesika
On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu  wrote:

> >
> > I read it and they look identical. What's the difference between them?
>
> Current avro,json, and protobuf schemas are all implemented based on AVRO.
> > What do you mean, they are all implemented based on Avro? You mean the
> > protobuf schema is converted into an Avro Schema, and then you use Avro
> > compatibility validation?
>
>
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> They all extends `AvroSchemaBasedCompatibilityCheck`, the
> `checkCompatible()` is the same implementation with `AVRO`.
>

Can you please explain how a Protobuf Schema descriptor can be validated
for backward compatibility check using Avro based compatibility rules?
Doesn't it expect the schema to be Avro, but it is actually a Protobuf
descriptor?
Is there some translation happening?



>
>
> I think you should structure the validation rules differently:
>
>
> The Compatibility check strategy is described on the website
>
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can process
> data written by producers using the last schema version V2. So V2 is
> "writtenSchema" and V3 is "readSchema".
> 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last schema
> version V2 can process data written by producers using a new schema V3,
> even though they may not be able to use the full capabilities of the new
> schema. So V3 is "writtenSchema" and V2 is "readSchema".
> 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward
> compatible.
> Schema can evolve. The old version schema and the new version schema should
> be well understood.
>
>
I'm sorry - I don't understand.
I understand the different compatibility check strategies.
If you just spell them out here, then as you say, just translate the
Protobuf Descriptor into an Avro schema and run the Avro
compatibility validation, no?
I believe the answer is no, since you may want to verify different things
when it comes to Protobuf, which are different then Avro.

At the current state, I can't understand your design at all. Please help
clarify that.





>
> So each strategy should have its own section.
>
>
> The arguments of `canRead()` are writtenSchema and readSchema. As we've
> just described, we only need to change the order of arguments we pass to
> `canRead()`.
>
>
>
> Thanks,
> sinan
>
>
> Asaf Mesika  于2023年2月27日周一 20:49写道:
>
> > >
> > > And you can see the difference between ProtoBuf and ProtoBufNative:
> > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > >
> >  I read it and they look identical. What's the difference between them?
> >
> > Current avro,json, and protobuf schemas are all implemented based on
> AVRO.
> >
> > What do you mean, they are all implemented based on Avro? You mean the
> > protobuf schema is converted into an Avro Schema, and then you use Avro
> > compatibility validation?
> >
> >
> > > *Here are the basic compatibility rules we've defined:*
> >
> >
> > I think you should structure the validation rules differently:
> >
> > * Backward checks
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> > * Forward
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> >
> > So each strategy should have its own section.
> >
> > I'm saying this since you used "writttenSchema" word but it represents
> > something completely different if it's backward or forward check.
> >
> > Once you'll have that structure like that, I personally will be able to
> > read and understand it.
> >
> >
> > The motivation and problem statement are now good - thanks for improving
> > it.
> >
> > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu 
> wrote:
> >
> > > Hi! I updated the PIP issue again. This time I've added some background
> > and
> > > some explanations.
> > >
> > > The compatibility check rules are already written in the
> Implementation.
> > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > It does this by checking whether the schema for writing and reading is
> > > compatible. I also indicate whether the writtenSchema and readSchema of
> > the
> > > Backward, Forward, and Full strategies are the old or the new version
> of
> > > the schema.
> > >
> > > Thanks,
> > > sinan
> > >
> > > Asaf Mesika  于2023年2月26日周日 23:24写道:
> > >
> > > > I'm sorry, but this PIP lacks a lot of background knowledge, so you
> > need
> > > to
> > > > add IMO for people to understand it. You don't need to explain the
> > entire
> > > > pulsar in this PIP, but at t

Re: [DISCUSS] Using bouncycastle fips instead bouncycastle non-fips

2023-03-01 Thread Asaf Mesika
On Mon, Feb 27, 2023 at 4:35 PM Zixuan Liu  wrote:

> > users might get exceptions if they don't use specific algorithms or
> encryption schemes?
>
> Could you share more info about this?
>

Actually I was expecting that part of the discussion will specify the
difference between using FIPS compared with non-FIPS, in each BouncyCastle
usage: TLS and message encryption.

 I imagined that FIPS has a shorter list of ciphers it supports.



> Asaf Mesika  于2023年2月27日周一 18:01写道:
>
> > So if I understand you correctly, once you switch to the FIPS version of
> > Bouncy Castle, users might get exceptions if they don't use specific
> > algorithms or encryption schemes?
> > Potentially a breaking change?
> > You can't switch it off via config?
> >
> > On Wed, Feb 22, 2023 at 3:56 PM Zixuan Liu  wrote:
> >
> > > > 1. What is FIPS?
> > >
> > > FIPS (Federal Information Processing Standards) are a set of standards
> > that
> > > describe document processing, encryption algorithms and other
> information
> > > technology standards for use within non-military government agencies
> and
> > by
> > > government contractors and vendors who work with the agencies.
> > >
> > > > 2. Why is the FIPS version safer exactly?
> > >
> > > FIPS standard is strict. When using the FIPS version, this is also very
> > > strict and standard.
> > >
> > > > 3. What is bouncycastle used exactly in Pulsar?
> > >
> > > We use the bouncycastle as the TLS provider,  and used for the
> end-to-end
> > > message encryption.
> > >
> > > Thanks,
> > > Zixuan
> > >
> > > Asaf Mesika  于2023年2月22日周三 21:23写道:
> > >
> > > > Can you elaborate a bit:
> > > > 1. What is FIPS?
> > > > 2. Why is the FIPS version safer exactly?
> > > > 3. What is bouncycastle used exactly in Pulsar?
> > > >
> > > >
> > > >
> > > > On Wed, Feb 22, 2023 at 11:58 AM Zixuan Liu 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I would like to discuss using the bouncycastle fips instead of the
> > > > > bouncycastle non-fips.
> > > > >
> > > > > The bouncycastle is a Java library that complements the default
> Java
> > > > > Cryptographic Extension (JCE), which has two versions: fips version
> > and
> > > > > non-fips version.
> > > > >
> > > > > The fips version is safer than non-fips. When the security level is
> > > very
> > > > > high, many policies require the fips version, but the Pulsar
> default
> > > uses
> > > > > the non-fips version. Switch this is complex, because
> > > > > the `pulsar-client-messagecrypto-bc` module and root project
> depends
> > on
> > > > the
> > > > > non-fips, so I suggest we switch to fips version from non-fips.
> > > > >
> > > > > Reference:
> > > > > - https://www.bouncycastle.org/
> > > > > - https://www.bouncycastle.org/fips_faq.html
> > > > > -
> > > https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards
> > > > >
> > > > > Thanks,
> > > > > Zixuan
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Support processingGuarantees "EFFECTIVELY_ONCE" in python function

2023-03-01 Thread Enrico Olivelli
Perfect

Thanks
Enrico

Il giorno mer 1 mar 2023 alle ore 04:17 Rui Fu  ha scritto:
>
> +1
>
> Best,
>
> Rui Fu
> On Feb 28, 2023 at 05:35 +0800, laminar , wrote:
> > Hi all,
> >
> > I would like to discuss supporting the processingGuarantess 
> > `EFFECTIVELY_ONCE` in python function runtime.
> >
> > In this PR's(https://github.com/apache/pulsar/pull/18929) discussion, we 
> > conclude that to achieve the exactly processing guarantees for 
> > `EFFECTIVELY_ONCE`, the user needs to ensure that the following 
> > pre-requisites are met.
> >
> > 1. deduplication is enabled
> > 2. set ProcessingGuarantees to EFFECTIVELY_ONCE
> > 3. the function has only one source topic and one sink topic (both are 
> > non-partitioned)
> > 4. if partitioned topic is enabled, ensure that the number of partitions 
> > (of both source and sink topics) is the same
> >
> > Currently, neither the python function runtime nor the java function 
> > runtime can support the `EFFECTIVELY_ONCE` processing guarantee when using 
> > partitioned topics.
> >
> > So in order to make python functions support `EFFECTIVELY_ONCE` processing 
> > guarantee, I think we can introduce this feature incrementally, i.e. 
> > support the `EFFECTIVELY_ONCE` processing guarantee for non-partitioned 
> > topics first.
> >
> > Then follow up with Rui’s 
> > suggestion(https://github.com/apache/pulsar/pull/18929#issuecomment-1445977320)
> >  to improve this feature.
> >
> >


Re: [DISCUSS] Change PIP template

2023-03-01 Thread Asaf Mesika
Ok.

I'll draft a PR and link it here when I'm done. Thanks!

On Tue, Feb 28, 2023 at 7:08 AM PengHui Li  wrote:

> +1
>
> Penghui
>
> On Mon, Feb 27, 2023 at 9:24 PM Asaf Mesika  wrote:
>
> > Mails don't support things like markdown diagrams or images and are
> > generally less easy to read.
> > My proposal includes a required section called Links in which you need to
> > fill in the discussion thread in DEV mailing list and vote thread.
> >
> >
> > On Mon, Feb 27, 2023 at 3:08 PM Girish Sharma 
> > wrote:
> >
> > >  Hi Asaf,
> > > I was referring to the PIP process, as a whole, as explained in
> > > https://github.com/apache/pulsar/blob/master/wiki/proposals/PIP.md
> > > Someone looking at GitHub ticket would find and almost empty PIP GH
> issue
> > > while the same PIP has had many discussions over here in the ML.
> > > There is scope of improvement in the process where we either remove the
> > > first step to create the PIP over at GitHub and directly present the
> PIP
> > in
> > > the first mail of the thread here, or we do all discussions in GH.
> > > Both the ML and GH are searchable and linkable for tracking purposes.
> > >
> > > Regards
> > >
> > > On Mon, Feb 27, 2023 at 6:23 PM Asaf Mesika 
> > wrote:
> > >
> > > > On Sun, Feb 26, 2023 at 2:49 PM Girish Sharma <
> scrapmachi...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Good proposal Asaf.
> > > > > I've also wondered why the PIP creation and discussion process is
> so
> > > > > separated. The PIP discussion and voting starts off as a GitHub
> > issue,
> > > > but
> > > > > all of its discussion happens here on the mailing list. Is there
> > scope
> > > of
> > > > > improvement in that process as well?
> > > > >
> > > >
> > > > Not sure I follow. Can you outline the problem exactly?
> > > >
> > > >
> > > > >
> > > > > Regards
> > > > >
> > > > > On Sun, Feb 26, 2023 at 6:16 PM tison  wrote:
> > > > >
> > > > > > Hi Asaf,
> > > > > >
> > > > > > I agree that, generally, a PIP is written as a whole and paste as
> > the
> > > > > body.
> > > > > > So +1 for your proposal.
> > > > > >
> > > > > > Additionally, I'm thinking of moving the doc of procedure
> > > (wiki/PIP.md)
> > > > > to
> > > > > > the contributions guide and use the new markdown template to
> > > supersede
> > > > > the
> > > > > > wiki/PIP-template.md. Then we don't need to hold the wiki folder.
> > > > > >
> > > > > > It can be an extended version to your proposal, so let's keep on
> > your
> > > > > > proposal in this thread. Just for your reference.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Asaf Mesika  于2023年2月26日周日 19:18写道:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I would like to suggest two changes I'd like to make to the PIP
> > > > design
> > > > > > > template:
> > > > > > > 1. Remove the form - just have a markdown template fill the
> issue
> > > > body
> > > > > as
> > > > > > > it is created.
> > > > > > > 2. Change the PIP template structure
> > > > > > >
> > > > > > > == Removing the form
> > > > > > >
> > > > > > > Today, when you want to submit a PIP, you are required to fill
> > out
> > > a
> > > > > form
> > > > > > > with boxes composed of 3-4 lines length.
> > > > > > > It's not good because:
> > > > > > > * It broadcasts to the author: we want a very small PIP,
> > something
> > > > that
> > > > > > > fits those small boxes.
> > > > > > > * It makes the PIP look like a bug, where you fill out fields.
> > > > > > > * It doesn't allow having H2 headings, only H1 headings, thus
> > > > limiting
> > > > > > the
> > > > > > > structure.
> > > > > > >
> > > > > > > A PIP is a design essentially, something 1-3 pages long. Thus,
> > > > > > > people take the time to write it down. Preferably, they copy
> > paste
> > > > the
> > > > > > body
> > > > > > > of the PIP issue, and use it to fill in sections.
> > > > > > >
> > > > > > > My suggestion is to define an issue template using only
> markdown,
> > > > > > without a
> > > > > > > form.
> > > > > > >
> > > > > > > == Changing PIP Structure
> > > > > > >
> > > > > > > Today the structure of the PIP doc (pasted below), is missing a
> > > > section
> > > > > > and
> > > > > > > generally aims to jump directly into API changes / code /
> > > > > implementation.
> > > > > > > This results in lots of back and forth emails in an attempt to
> > get
> > > > the
> > > > > > > following essentials:
> > > > > > > * All required background knowledge to understand the proposal
> > > > > > > * A high level overview of the proposed solution
> > > > > > > * Understanding how this proposal will be monitored
> > > > > > > * What steps exactly I need to take if I revert to the previous
> > > > > version.
> > > > > > >
> > > > > > > The structure I propose below aims to reduce that friction and
> > get
> > > > all
> > > > > > PIP
> > > > > > > aligned to provide that information.
> > > > > > >
> > > > > > > === Today's structure
> > > > > > >
> > > > > > > # Motivation
> >

Re: [DISCUSS] Change PIP template

2023-03-01 Thread Elliot West
+1 Asaf

I'd also suggest that we encourage the submission of relevant diagrams.
This is trivial to do with the GitHub markdown editor, but I suspect is
often neglected because users do not know the feature exists.

On Wed, 1 Mar 2023 at 13:22, Asaf Mesika  wrote:

> Ok.
>
> I'll draft a PR and link it here when I'm done. Thanks!
>
> On Tue, Feb 28, 2023 at 7:08 AM PengHui Li  wrote:
>
> > +1
> >
> > Penghui
> >
> > On Mon, Feb 27, 2023 at 9:24 PM Asaf Mesika 
> wrote:
> >
> > > Mails don't support things like markdown diagrams or images and are
> > > generally less easy to read.
> > > My proposal includes a required section called Links in which you need
> to
> > > fill in the discussion thread in DEV mailing list and vote thread.
> > >
> > >
> > > On Mon, Feb 27, 2023 at 3:08 PM Girish Sharma  >
> > > wrote:
> > >
> > > >  Hi Asaf,
> > > > I was referring to the PIP process, as a whole, as explained in
> > > > https://github.com/apache/pulsar/blob/master/wiki/proposals/PIP.md
> > > > Someone looking at GitHub ticket would find and almost empty PIP GH
> > issue
> > > > while the same PIP has had many discussions over here in the ML.
> > > > There is scope of improvement in the process where we either remove
> the
> > > > first step to create the PIP over at GitHub and directly present the
> > PIP
> > > in
> > > > the first mail of the thread here, or we do all discussions in GH.
> > > > Both the ML and GH are searchable and linkable for tracking purposes.
> > > >
> > > > Regards
> > > >
> > > > On Mon, Feb 27, 2023 at 6:23 PM Asaf Mesika 
> > > wrote:
> > > >
> > > > > On Sun, Feb 26, 2023 at 2:49 PM Girish Sharma <
> > scrapmachi...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Good proposal Asaf.
> > > > > > I've also wondered why the PIP creation and discussion process is
> > so
> > > > > > separated. The PIP discussion and voting starts off as a GitHub
> > > issue,
> > > > > but
> > > > > > all of its discussion happens here on the mailing list. Is there
> > > scope
> > > > of
> > > > > > improvement in that process as well?
> > > > > >
> > > > >
> > > > > Not sure I follow. Can you outline the problem exactly?
> > > > >
> > > > >
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > On Sun, Feb 26, 2023 at 6:16 PM tison  wrote:
> > > > > >
> > > > > > > Hi Asaf,
> > > > > > >
> > > > > > > I agree that, generally, a PIP is written as a whole and paste
> as
> > > the
> > > > > > body.
> > > > > > > So +1 for your proposal.
> > > > > > >
> > > > > > > Additionally, I'm thinking of moving the doc of procedure
> > > > (wiki/PIP.md)
> > > > > > to
> > > > > > > the contributions guide and use the new markdown template to
> > > > supersede
> > > > > > the
> > > > > > > wiki/PIP-template.md. Then we don't need to hold the wiki
> folder.
> > > > > > >
> > > > > > > It can be an extended version to your proposal, so let's keep
> on
> > > your
> > > > > > > proposal in this thread. Just for your reference.
> > > > > > >
> > > > > > > Best,
> > > > > > > tison.
> > > > > > >
> > > > > > >
> > > > > > > Asaf Mesika  于2023年2月26日周日 19:18写道:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I would like to suggest two changes I'd like to make to the
> PIP
> > > > > design
> > > > > > > > template:
> > > > > > > > 1. Remove the form - just have a markdown template fill the
> > issue
> > > > > body
> > > > > > as
> > > > > > > > it is created.
> > > > > > > > 2. Change the PIP template structure
> > > > > > > >
> > > > > > > > == Removing the form
> > > > > > > >
> > > > > > > > Today, when you want to submit a PIP, you are required to
> fill
> > > out
> > > > a
> > > > > > form
> > > > > > > > with boxes composed of 3-4 lines length.
> > > > > > > > It's not good because:
> > > > > > > > * It broadcasts to the author: we want a very small PIP,
> > > something
> > > > > that
> > > > > > > > fits those small boxes.
> > > > > > > > * It makes the PIP look like a bug, where you fill out
> fields.
> > > > > > > > * It doesn't allow having H2 headings, only H1 headings, thus
> > > > > limiting
> > > > > > > the
> > > > > > > > structure.
> > > > > > > >
> > > > > > > > A PIP is a design essentially, something 1-3 pages long.
> Thus,
> > > > > > > > people take the time to write it down. Preferably, they copy
> > > paste
> > > > > the
> > > > > > > body
> > > > > > > > of the PIP issue, and use it to fill in sections.
> > > > > > > >
> > > > > > > > My suggestion is to define an issue template using only
> > markdown,
> > > > > > > without a
> > > > > > > > form.
> > > > > > > >
> > > > > > > > == Changing PIP Structure
> > > > > > > >
> > > > > > > > Today the structure of the PIP doc (pasted below), is
> missing a
> > > > > section
> > > > > > > and
> > > > > > > > generally aims to jump directly into API changes / code /
> > > > > > implementation.
> > > > > > > > This results in lots of back and forth emails in an attempt
> to
> > > get
> > > > > the
> > > > > > > > following essentials:
> 

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-01 Thread SiNan Liu
>
> Can you please explain how a Protobuf Schema descriptor can be validated
> for backward compatibility check using Avro based compatibility rules?
> Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> descriptor?
> Is there some translation happening?


1. *You can take a quick look at the previous design, the PROTOBUF uses
avro struct to store.*
https://github.com/apache/pulsar/pull/1954
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115

2. *On the broker side, protobuf and avro both use `SchemaData` converted
to `org.apache.avro.Schema`.*
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70



I'm sorry - I don't understand.
> I understand the different compatibility check strategies.
> If you just spell them out here, then as you say, just translate the
> Protobuf Descriptor into an Avro schema and run the Avro
> compatibility validation, no?
> I believe the answer is no, since you may want to verify different things
> when it comes to Protobuf, which are different then Avro.


1.
*ProtobufSchema is different from ProtobufNativeSchema in that it uses
avro-protobuf.*
https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
*ProtobufNativeSchema needs a native compatibility check, but there is no
official or third party implementation. So this PIP does not use
avro-protobuf for protobuf compatibility checking.*

2. *By the way, this is implemented in much the same way that Apache avro
does compatibility checking.*
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
`canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
*In `ValidateMutualRead.java`, the arguments of `canRead()` are
writtenSchema and readSchema. We only need to change the order of arguments
we pass to `canRead()`.*
```java
private void validateWithStrategy(Descriptors.Descriptor toValidate,
Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
{
switch (strategy) {
case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
case CanBeReadMutualStrategy -> {
canRead(toValidate, fromDescriptor);
canRead(fromDescriptor, toValidate);
}
}
}

private void canRead(Descriptors.Descriptor writtenSchema,
Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
readSchema);
}
```


Thanks,
sinan



Asaf Mesika  于2023年3月1日周三 21:19写道:
>
> On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu  wrote:
>
> > >
> > > I read it and they look identical. What's the difference between them?
> >
> > Current avro,json, and protobuf schemas are all implemented based on
AVRO.
> > > What do you mean, they are all implemented based on Avro? You mean the
> > > protobuf schema is converted into an Avro Schema, and then you use
Avro
> > > compatibility validation?
> >
> >
> >
`org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > `checkCompatible()` is the same implementation with `AVRO`.
> >
>
> Can you please explain how a Protobuf Schema descriptor can be validated
> for backward compatibility check using Avro based compatibility rules?
> Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> descriptor?
> Is there some translation happening?
>
>
>
> >
> >
> > I think you should structure the validation rules differently:
> >
> >
> > The Compatibility check strategy is described on the website
> >
> >
https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > 1. BACKWARD(CanRe

Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-01 Thread Asaf Mesika
>
> Pulsar has 2 configurations for the backlog eviction
> 
> : backlogQuotaDefaultLimitBytes and backlogQuotaDefaultLimitSecond.
> By default, backlog eviction is disabled, and also, there is a field named
> backlogQuotaMap in TopicPolicies
> 
> /NamespaceSpacePolicies
> 
>  assists
> in controlling Topic/Namespace level backlog quota.
>
> If topic backlog reaches the threshold of any item, backlog eviction will
> be triggered, Pulsar will move subscription's cursor to skip unacknowledged
> messages.
>
> Before backlog eviction happens, we don't have a metric to monitor how
> long that it can reaches the threshold.
>

I  think you should fix this explanation:

In Pulsar, a subscription maintains a state of message acknowledged. A
subscription backlog is the set of messages which are unacknowledged.
A subscription backlog size is the sum of size of unacknowledged messages
(in bytes).
A topic can have many subscriptions.
A topic backlog is defined as the backlog size of the subscription which
has the oldest unacknowledged message. Since acknowledged messages can be
interleaved with unacknowledged messages, calculating the exact size of
that subscription can be expensive as it requires I/O operations to read
from the messages from the ledgers.
For that reason, the topic backlog is actually defined to be the estimated
backlog size of that subscription. It does so by summarizing the size of
all the ledgers, starting from the current active one, up to the ledger
which contains the oldest unacknowledged message (There is actually a
faster way to calculate it, but this is the definition of the estimation).

A topic backlog age is the age of the oldest unacknowledged message (in any
subscription). If that message was written 30 minutes ago, its age is 30
minutes.

Pulsar has a feature called backlog quota (place link). It allows the user
to define a quota - in effect, a limit - which limits the topic backlog.
There are two types of quotas:
* Size based: The limit is for the topic backlog size (as we defined above).
* Time based: The limit is for the topic's backlog age (as we defined
above).

Once a topic backlog exceeds either one of those limits, an action is taken
upon messages written to the topic:
* The producer write is placed on hold for a certain amount of time before
failing.
* The producer write is failed
* The subscriptions oldest unacknowledged messages will be acknowledged in
order until both the topic backlog size or age will fall inside the limit
(quota). The process is called backlog eviction (happens every interval)

The quotas can be defined as a default value for any topic, by using the
following broker configuration keys: backlogQuotaDefaultLimitBytes ,
backlogQuotaDefaultLimitSecond. It can also be specified directly for all
topics in a given namespace using the namespace policy, or a specific topic
using a topic policy.

The user today can calculate quota used for size based limit, since there
are two metrics that are exposed today on a topic level: "
pulsar_storage_backlog_quota_limit" and "pulsar_storage_backlog_size". You
can just divide the two to get a percentage.
For the time-based limit, the only metric exposed today is quota itself , "
pulsar_storage_backlog_quota_limit_time".



I would create two metrics:

`pulsar_backlog_size_quota_used_percentage`
`pulsar_backlog_time_quota_used_percentage`

You would like to know what triggered the alert, hence two.
It's not the quota percentage, it's the quota used percentage.

--

It checks if the backlog size exceeds the threshold(
> backlogQuotaDefaultLimitBytes), and it gets the current backlog size by
> calculating LedgerInfo
> ,
> it will not lead to I/O.

This is not correct.
It checks against the topic / namespace policy, and if it doesn't exist, it
falls back on the default configuration key mentioned above.

It checks if the backlog time exceeds the threshold(
> backlogQuotaDefaultLimitSecond). If preciseTimeBasedBacklogQuotaCheck is
> set to be true, it will read an entry from Bookkeeper, but the default
> value is false, which means it gets the backlog time by calculating
> LedgerInfo
> .
> So in general, we don't need to worry about it will lead to I/O.


I'm afraid of that.
Today the quota is checked periodically, right? So that's how the operator
knows the cost in terms of I/O is limited.
 Now you are adding one additional I/O per collection, every 1 min

Re: [DISCUSS] Using bouncycastle fips instead bouncycastle non-fips

2023-03-01 Thread Zixuan Liu
> Actually I was expecting that part of the discussion will specify the
> difference between using FIPS compared with non-FIPS, in each BouncyCastle
> usage: TLS and message encryption.

Good catch! I'll check this.

Asaf Mesika  于2023年3月1日周三 21:19写道:

> On Mon, Feb 27, 2023 at 4:35 PM Zixuan Liu  wrote:
>
> > > users might get exceptions if they don't use specific algorithms or
> > encryption schemes?
> >
> > Could you share more info about this?
> >
>
> Actually I was expecting that part of the discussion will specify the
> difference between using FIPS compared with non-FIPS, in each BouncyCastle
> usage: TLS and message encryption.
>
>  I imagined that FIPS has a shorter list of ciphers it supports.
>
>
>
> > Asaf Mesika  于2023年2月27日周一 18:01写道:
> >
> > > So if I understand you correctly, once you switch to the FIPS version
> of
> > > Bouncy Castle, users might get exceptions if they don't use specific
> > > algorithms or encryption schemes?
> > > Potentially a breaking change?
> > > You can't switch it off via config?
> > >
> > > On Wed, Feb 22, 2023 at 3:56 PM Zixuan Liu  wrote:
> > >
> > > > > 1. What is FIPS?
> > > >
> > > > FIPS (Federal Information Processing Standards) are a set of
> standards
> > > that
> > > > describe document processing, encryption algorithms and other
> > information
> > > > technology standards for use within non-military government agencies
> > and
> > > by
> > > > government contractors and vendors who work with the agencies.
> > > >
> > > > > 2. Why is the FIPS version safer exactly?
> > > >
> > > > FIPS standard is strict. When using the FIPS version, this is also
> very
> > > > strict and standard.
> > > >
> > > > > 3. What is bouncycastle used exactly in Pulsar?
> > > >
> > > > We use the bouncycastle as the TLS provider,  and used for the
> > end-to-end
> > > > message encryption.
> > > >
> > > > Thanks,
> > > > Zixuan
> > > >
> > > > Asaf Mesika  于2023年2月22日周三 21:23写道:
> > > >
> > > > > Can you elaborate a bit:
> > > > > 1. What is FIPS?
> > > > > 2. Why is the FIPS version safer exactly?
> > > > > 3. What is bouncycastle used exactly in Pulsar?
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Feb 22, 2023 at 11:58 AM Zixuan Liu 
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would like to discuss using the bouncycastle fips instead of
> the
> > > > > > bouncycastle non-fips.
> > > > > >
> > > > > > The bouncycastle is a Java library that complements the default
> > Java
> > > > > > Cryptographic Extension (JCE), which has two versions: fips
> version
> > > and
> > > > > > non-fips version.
> > > > > >
> > > > > > The fips version is safer than non-fips. When the security level
> is
> > > > very
> > > > > > high, many policies require the fips version, but the Pulsar
> > default
> > > > uses
> > > > > > the non-fips version. Switch this is complex, because
> > > > > > the `pulsar-client-messagecrypto-bc` module and root project
> > depends
> > > on
> > > > > the
> > > > > > non-fips, so I suggest we switch to fips version from non-fips.
> > > > > >
> > > > > > Reference:
> > > > > > - https://www.bouncycastle.org/
> > > > > > - https://www.bouncycastle.org/fips_faq.html
> > > > > > -
> > > >
> https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards
> > > > > >
> > > > > > Thanks,
> > > > > > Zixuan
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Using bouncycastle fips instead bouncycastle non-fips

2023-03-01 Thread YuWei Sung
BC and BC-FIPS differences are the cipher suites. This is similar to TLS1.1
vs 1.2 vs 1.3. Some suites are deprecated (not secured enough due to
compute power improvement).
in TLS 1.3, client has no chance to specific weak cipher suites to connect
to server and exploit the weakness.
For BC-FIPS harden pulsar cluster, brokers should reject connections from
clients with BC (clients must use Security.provider bc-fips).
For BC non fips cluster, it should be flexible. client with bc-fips or bc
should be able to connect to pulsar (bc).




Yu Wei Sung

Sr. Solutions Engineer


streamnative.io






On Wed, Mar 1, 2023 at 10:28 AM Zixuan Liu  wrote:

> > Actually I was expecting that part of the discussion will specify the
> > difference between using FIPS compared with non-FIPS, in each
> BouncyCastle
> > usage: TLS and message encryption.
>
> Good catch! I'll check this.
>
> Asaf Mesika  于2023年3月1日周三 21:19写道:
>
> > On Mon, Feb 27, 2023 at 4:35 PM Zixuan Liu  wrote:
> >
> > > > users might get exceptions if they don't use specific algorithms or
> > > encryption schemes?
> > >
> > > Could you share more info about this?
> > >
> >
> > Actually I was expecting that part of the discussion will specify the
> > difference between using FIPS compared with non-FIPS, in each
> BouncyCastle
> > usage: TLS and message encryption.
> >
> >  I imagined that FIPS has a shorter list of ciphers it supports.
> >
> >
> >
> > > Asaf Mesika  于2023年2月27日周一 18:01写道:
> > >
> > > > So if I understand you correctly, once you switch to the FIPS version
> > of
> > > > Bouncy Castle, users might get exceptions if they don't use specific
> > > > algorithms or encryption schemes?
> > > > Potentially a breaking change?
> > > > You can't switch it off via config?
> > > >
> > > > On Wed, Feb 22, 2023 at 3:56 PM Zixuan Liu 
> wrote:
> > > >
> > > > > > 1. What is FIPS?
> > > > >
> > > > > FIPS (Federal Information Processing Standards) are a set of
> > standards
> > > > that
> > > > > describe document processing, encryption algorithms and other
> > > information
> > > > > technology standards for use within non-military government
> agencies
> > > and
> > > > by
> > > > > government contractors and vendors who work with the agencies.
> > > > >
> > > > > > 2. Why is the FIPS version safer exactly?
> > > > >
> > > > > FIPS standard is strict. When using the FIPS version, this is also
> > very
> > > > > strict and standard.
> > > > >
> > > > > > 3. What is bouncycastle used exactly in Pulsar?
> > > > >
> > > > > We use the bouncycastle as the TLS provider,  and used for the
> > > end-to-end
> > > > > message encryption.
> > > > >
> > > > > Thanks,
> > > > > Zixuan
> > > > >
> > > > > Asaf Mesika  于2023年2月22日周三 21:23写道:
> > > > >
> > > > > > Can you elaborate a bit:
> > > > > > 1. What is FIPS?
> > > > > > 2. Why is the FIPS version safer exactly?
> > > > > > 3. What is bouncycastle used exactly in Pulsar?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 22, 2023 at 11:58 AM Zixuan Liu 
> > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I would like to discuss using the bouncycastle fips instead of
> > the
> > > > > > > bouncycastle non-fips.
> > > > > > >
> > > > > > > The bouncycastle is a Java library that complements the default
> > > Java
> > > > > > > Cryptographic Extension (JCE), which has two versions: fips
> > version
> > > > and
> > > > > > > non-fips version.
> > > > > > >
> > > > > > > The fips version is safer than non-fips. When the security
> level
> > is
> > > > > very
> > > > > > > high, many policies require the fips version, but the Pulsar
> > > default
> > > > > uses
> > > > > > > the non-fips version. Switch this is complex, because
> > > > > > > the `pulsar-client-messagecrypto-bc` module and root project
> > > depends
> > > > on
> > > > > > the
> > > > > > > non-fips, so I suggest we switch to fips version from non-fips.
> > > > > > >
> > > > > > > Reference:
> > > > > > > - https://www.bouncycastle.org/
> > > > > > > - https://www.bouncycastle.org/fips_faq.html
> > > > > > > -
> > > > >
> > https://en.wikipedia.org/wiki/Federal_Information_Processing_Standards
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Zixuan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-01 Thread Enrico Olivelli
(I apologise for top posting)

Would it be possible to add a flag to fallback to the previous behaviour ?
I know that adding such flags is a burden but if the upgrade breaks
some workflows then users won't be able to upgrade.
We can add the flag in the next release and drop it in the next major release

Enrico

Il giorno mer 1 mar 2023 alle ore 15:33 SiNan Liu
 ha scritto:
>
> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
>
>
> 1. *You can take a quick look at the previous design, the PROTOBUF uses
> avro struct to store.*
> https://github.com/apache/pulsar/pull/1954
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>
> 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> to `org.apache.avro.Schema`.*
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>
>
>
> I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
>
>
> 1.
> *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> avro-protobuf.*
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> *ProtobufNativeSchema needs a native compatibility check, but there is no
> official or third party implementation. So this PIP does not use
> avro-protobuf for protobuf compatibility checking.*
>
> 2. *By the way, this is implemented in much the same way that Apache avro
> does compatibility checking.*
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> writtenSchema and readSchema. We only need to change the order of arguments
> we pass to `canRead()`.*
> ```java
> private void validateWithStrategy(Descriptors.Descriptor toValidate,
> Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
> {
> switch (strategy) {
> case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> case CanBeReadMutualStrategy -> {
> canRead(toValidate, fromDescriptor);
> canRead(fromDescriptor, toValidate);
> }
> }
> }
>
> private void canRead(Descriptors.Descriptor writtenSchema,
> Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> readSchema);
> }
> ```
>
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika  于2023年3月1日周三 21:19写道:
> >
> > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu  wrote:
> >
> > > >
> > > > I read it and they look identical. What's the difference between them?
> > >
> > > Current avro,json, and protobuf schemas are all implemented based on
> AVRO.
> > > > What do you mean, they are all implemented based on Avro? You mean the
> > > > protobuf schema is converted into an Avro Schema, and then you use
> Avro
> > > > compatibility validation?
> > >
> > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > `checkCompatible()` is the same implementation with `AVRO

[VOTE] Apache Pulsar Adapters Release 2.11.0 Candidate 1

2023-03-01 Thread Christophe Bornet
This is the first release candidate for Apache Pulsar Adapters, version 2.11.0.

It fixes the following issues:
https://github.com/apache/pulsar-adapters/milestone/4?closed=1

*** Please download, test and vote on this release. This vote will
stay open for at least 72 hours ***

Note that we are voting upon the source (tag), binaries are provided
for convenience.

Source and binary files:
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-adapters-2.11.0-candidate-1/

SHA-512 checksums:
c5fd43fbfbbc2a848f74f68da7ebcf9a958d7b327237ad7eff2f43e3b747ea5477b839a427cc89b3906ad9ed6f6d9e2268bcdda50815739b4024e311467fc943
 apache-pulsar-adapters-2.11.0-src.tar.gz

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachepulsar-1207/

The tag to be voted upon:
v2.11.0-candidate-1 (1b1104c023ff16848a69a7b1a290f4a3432cba6d)
https://github.com/apache/pulsar-adapters/releases/tag/v2.11.0-candidate-1

Pulsar's KEYS file containing PGP keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/pulsar/KEYS

Please download the source package, and follow the README to build the
Pulsar Adapters code.


Re: [Discuss] PIP-248: Add backlog eviction metric

2023-03-01 Thread PengHui Li
Ah, I forgot this one "pulsar_storage_backlog_quota_limit"
As Asaf said, users can just divide the two to get a percentage.
I think we don't need to expose more metrics for the size-based backlog
quota. And only exposing the topic-level metrics looks good to me.
Users can get the alert and then check which subscription with large
backlogs
by the Pulsar Admin.

For the estimated backlog size. It should be ok? The backlog quota policy
also performs based on the estimated backlog size.

> I'm afraid of that.
> Today the quota is checked periodically, right? So that's how the operator
> knows the cost in terms of I/O is limited.
> Now you are adding one additional I/O per collection, every 1 min by
> default. That's a lot perhaps. How long is the check interval today?
>
> Perhaps in the backlog quota check, you can persist the check result, and
> use it? Persist the age that is.

I think yes, we don't need to add additional costs here. The broker did the
backlog
check if the backlog quota was enabled. So we can just record the last
checked value
to the topic.

Follow the same way, we can just expose the time-based lag metrics. So that
users can divide the two to get a percentage.

> Regarding "slowest_subscription"
> I think the cost is too high, because the subscriptions will keep
> alternating, which can generate so many unique time series. Since
> Prometheus flush only every 2 hours, or any there TSDB, it will cost you
> too much.
>
> I suggest exposing the name via the topic stats. This way they can issue a
> REST call to grab that subscription name only when the alert fires.

Yes, I totally agree. And now we already have the information.
Just get the subscription with max backlog size.

@jiuming I think you'd better copy the context that Asaf provided to the
proposal.
It will help the reviewer to understand what problems we want to resolve.
And It will provide the opportunity for more people to join the discussion.

Regards
Penghui

On Wed, Mar 1, 2023 at 11:42 PM Asaf Mesika  wrote:

> >
> > Pulsar has 2 configurations for the backlog eviction
> > <
> https://pulsar.apache.org/docs/2.11.x/cookbooks-retention-expiry/#backlog-quotas
> >
> > : backlogQuotaDefaultLimitBytes and backlogQuotaDefaultLimitSecond.
> > By default, backlog eviction is disabled, and also, there is a field
> named
> > backlogQuotaMap in TopicPolicies
> > <
> https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/HierarchyTopicPolicies.java#L45
> >
> > /NamespaceSpacePolicies
> > <
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/common/policies/data/Policies.java#L41>
> assists
> > in controlling Topic/Namespace level backlog quota.
> >
> > If topic backlog reaches the threshold of any item, backlog eviction will
> > be triggered, Pulsar will move subscription's cursor to skip
> unacknowledged
> > messages.
> >
> > Before backlog eviction happens, we don't have a metric to monitor how
> > long that it can reaches the threshold.
> >
>
> I  think you should fix this explanation:
>
> In Pulsar, a subscription maintains a state of message acknowledged. A
> subscription backlog is the set of messages which are unacknowledged.
> A subscription backlog size is the sum of size of unacknowledged messages
> (in bytes).
> A topic can have many subscriptions.
> A topic backlog is defined as the backlog size of the subscription which
> has the oldest unacknowledged message. Since acknowledged messages can be
> interleaved with unacknowledged messages, calculating the exact size of
> that subscription can be expensive as it requires I/O operations to read
> from the messages from the ledgers.
> For that reason, the topic backlog is actually defined to be the estimated
> backlog size of that subscription. It does so by summarizing the size of
> all the ledgers, starting from the current active one, up to the ledger
> which contains the oldest unacknowledged message (There is actually a
> faster way to calculate it, but this is the definition of the estimation).
>
> A topic backlog age is the age of the oldest unacknowledged message (in any
> subscription). If that message was written 30 minutes ago, its age is 30
> minutes.
>
> Pulsar has a feature called backlog quota (place link). It allows the user
> to define a quota - in effect, a limit - which limits the topic backlog.
> There are two types of quotas:
> * Size based: The limit is for the topic backlog size (as we defined
> above).
> * Time based: The limit is for the topic's backlog age (as we defined
> above).
>
> Once a topic backlog exceeds either one of those limits, an action is taken
> upon messages written to the topic:
> * The producer write is placed on hold for a certain amount of time before
> failing.
> * The producer write is failed
> * The subscriptions oldest unacknowledged messages will be acknowledged in
> order until both the topic backlog size or age will fall insi

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

2023-03-01 Thread SiNan Liu
Hello Enrico. Thanks for your suggestion, according to my understanding of
what you said "flag".
How about we add a configuration in the next release:

protoBufNativeSchemaValidatorClassName=org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl

Use the previous implementation if the configuration is empty (check only
the name of the root message). If there is a better third-party or official
solution in the future, develop a new "
ProtobufNativeSchemaBreakValidatorImpl " to give users a choice.
What do you think of this design? If there is a better third party or
official solution in the future, do you think the current pr implementation
should be retained or deleted?


Thanks,
sinan



Enrico Olivelli  于 2023年3月2日周四 上午12:47写道:

> (I apologise for top posting)
>
> Would it be possible to add a flag to fallback to the previous behaviour ?
> I know that adding such flags is a burden but if the upgrade breaks
> some workflows then users won't be able to upgrade.
> We can add the flag in the next release and drop it in the next major
> release
>
> Enrico
>
> Il giorno mer 1 mar 2023 alle ore 15:33 SiNan Liu
>  ha scritto:
> >
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> >
> >
> > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > avro struct to store.*
> > https://github.com/apache/pulsar/pull/1954
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> >
> > 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> > to `org.apache.avro.Schema`.*
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> >
> >
> >
> > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> >
> >
> > 1.
> > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > avro-protobuf.*
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > *ProtobufNativeSchema needs a native compatibility check, but there is no
> > official or third party implementation. So this PIP does not use
> > avro-protobuf for protobuf compatibility checking.*
> >
> > 2. *By the way, this is implemented in much the same way that Apache avro
> > does compatibility checking.*
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > writtenSchema and readSchema. We only need to change the order of
> arguments
> > we pass to `canRead()`.*
> > ```java
> > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > Descriptors.Descriptor fromDescriptor) throws
> ProtoBufCanReadCheckException
> > {
> > switch (strategy) {
> > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> > case CanBeReadMutualStrategy -> {
> > canRead(toValidate, fromDescriptor);
> > canRead(fromDescriptor, toValidate);
> > }
> > }
> > }
> >
> > private void canRead(Descriptors.Descriptor writtenSchema,
> > Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> >
> ProtobufNativeS

Re: [DISCUSS] Release Pulsar Go Client 0.10.0

2023-03-01 Thread Zike Yang
Hi, Baodi, Yunze

Thanks. Sure, I will wait for them.

BR,
Zike Yang

On Wed, Mar 1, 2023 at 9:02 PM Yunze Xu  wrote:
>
> Please wait for a performance fix for the case when batch index ACK is
> enabled, I'm working on it. Currently, the throughput cannot exceed
> even 20MB/s when it's enabled.
>
> Thanks,
> Yunze
>
> On Wed, Mar 1, 2023 at 8:39 PM Baodi Shi  wrote:
> >
> > Hi, zike.
> >
> > The current pulsar-client-go master branch has some flay-test. There may be
> > some internal bugs, I think we need to wait for them to be fixed.
> >
> >- https://github.com/apache/pulsar-client-go/issues/971
> >
> >
> > Thanks,
> > Baodi Shi
> >
> >
> > 在 2023年3月1日 20:26:10 上,Zike Yang  写道:
> >
> > > I will include this PR
> > > https://github.com/apache/pulsar-client-go/pull/968 to this release
> > > since it's an important performance improvement.
> > >
> > > BR,
> > > Zike Yang
> > >
> > > On Wed, Mar 1, 2023 at 8:25 PM Zike Yang  wrote:
> > >
> > >
> > > Hi everyone,
> > >
> > >
> > > I would like to propose releasing the Pulsar Go Client 0.10.0.
> > >
> > >
> > > It has been several months since the last release. And there are
> > >
> > > several new features and bug fixes in the master branch[0]. It’s time
> > >
> > > to release a new version.
> > >
> > >
> > > Please let me know if you have any PRs that need to be included in 0.10.0
> > >
> > >
> > > [0] https://github.com/apache/pulsar-client-go/compare/v0.9.0...master
> > >
> > >
> > > BR,
> > >
> > > Zike Yang
> > >
> > >


Re: [VOTE] Pulsar Client Python Release 3.1.0 Candidate 3

2023-03-01 Thread Yunze Xu
Hi Zike,

I've fixed this crash and showed some explanations in
https://github.com/apache/pulsar-client-python/pull/99. PTAL. After
it's cherry-picked to branch-3.1, I will open the candidate 4.

Thanks,
Yunze

On Wed, Mar 1, 2023 at 2:57 PM Yunze Xu  wrote:
>
> Hi Zike,
>
> I've reproduced this issue successfully with:
> - Python 3.8 on Ubuntu 20.04
> - Python 3.10 on macOS 12
>
> It seems to be a serious bug and I'm going to figure out the reason ASAP.
>
> Thanks,
> Yunze
>
> On Tue, Feb 28, 2023 at 5:17 PM Zike Yang  wrote:
> >
> > Hi, Yunze
> >
> > It raises an exception when I run the consumer example.
> >
> > Here are my environments:
> > * macos 12.06 x86_64
> > * python 3.7
> >
> > Here are my reproduce steps:
> > * Start the pulsar standalone
> > * Start the consumer example
> > * Start the producer example
> > * The consumer can receive messages successfully.
> > * Stop the consumer, then it throws some exception:
> > ```
> > ➜  examples git:(main) ✗ python consumer.py
> > 2023-02-28 17:11:04.742 INFO  [0x113b19600] Client:87 | Subscribing on
> > Topic :my-topic
> > 2023-02-28 17:11:04.742 INFO  [0x113b19600] ClientConnection:190 |
> > [ -> pulsar://localhost:6650] Create ClientConnection,
> > timeout=1
> > 2023-02-28 17:11:04.742 INFO  [0x113b19600] ConnectionPool:97 |
> > Created connection for pulsar://localhost:6650
> > 2023-02-28 17:11:04.745 INFO  [0x73606000] ClientConnection:388 |
> > [127.0.0.1:49258 -> 127.0.0.1:6650] Connected to broker
> > 2023-02-28 17:11:04.857 INFO  [0x73606000] HandlerBase:72 |
> > [persistent://public/default/my-topic, my-subscription, 0] Getting
> > connection from pool
> > 2023-02-28 17:11:04.874 INFO  [0x73606000] ConsumerImpl:238 |
> > [persistent://public/default/my-topic, my-subscription, 0] Created
> > consumer on broker [127.0.0.1:49258 -> 127.0.0.1:6650]
> > Received message 'hello' id='(132,30,-1,0)'
> > Received message 'hello' id='(132,31,-1,0)'
> > Received message 'hello' id='(132,32,-1,0)'
> > Received message 'hello' id='(132,33,-1,0)'
> > Received message 'hello' id='(132,34,-1,0)'
> > Received message 'hello' id='(132,35,-1,0)'
> > Received message 'hello' id='(132,36,-1,0)'
> > Received message 'hello' id='(132,37,-1,0)'
> > Received message 'hello' id='(132,38,-1,0)'
> > Received message 'hello' id='(132,39,-1,0)'
> > ^CKeyboardInterrupt
> >
> > The above exception was the direct cause of the following exception:
> >
> > Traceback (most recent call last):
> >   File "consumer.py", line 32, in 
> > msg = consumer.receive()
> >   File 
> > "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pulsar/__init__.py",
> > line 1243, in receive
> > msg = self._consumer.receive()
> > SystemError
> > 2023-02-28 17:11:07.367 WARN  [0x113b19600] ConsumerImpl:126 |
> > [persistent://public/default/my-topic, my-subscription, 0] Destroyed
> > consumer which was not properly closed
> > 2023-02-28 17:11:07.367 INFO  [0x113b19600] ConsumerImpl:134 |
> > [persistent://public/default/my-topic, my-subscription, 0] Closed
> > consumer for race condition: 0
> > libc++abi: terminating with uncaught exception of type
> > std::__1::bad_weak_ptr: bad_weak_ptr
> > [1]56483 abort  python consumer.py
> > ```
> > This causes the program to crash.
> >
> > I'm using the example codes here:
> > https://github.com/apache/pulsar-client-python/tree/main/examples
> >
> > Could you take a look? Thanks.
> >
> > BR,
> > Zike Yang
> >
> > On Tue, Feb 28, 2023 at 10:37 AM PengHui Li  wrote:
> > >
> > > +1 (binding)
> > >
> > > - install on macos
> > > - start a standalone (latest master)
> > > - run the example
> > > https://github.com/apache/pulsar-client-python#running-examples
> > >
> > > Regards,
> > > Penghui
> > >
> > > On Thu, Feb 23, 2023 at 11:00 PM Enrico Olivelli 
> > > wrote:
> > >
> > > > Thank you Yunze for double checking.
> > > >
> > > > I don't have time to test the release, so I am voting +0 (and not -1)
> > > >
> > > > Enrico
> > > >
> > > > Il giorno gio 23 feb 2023 alle ore 12:57 Yunze Xu
> > > >  ha scritto:
> > > > >
> > > > > Hi Enrico,
> > > > >
> > > > > I will test more operation systems and open a discussion soon. For
> > > > > now, I just tested the TLS encryption and token authentication, and I
> > > > > found this issue does not exist as expected for both Python and
> > > > > Node.js clients.
> > > > > - Windows: Only Python client works
> > > > > - Ubuntu: Both work
> > > > > - macOS: Not tested yet
> > > > >
> > > > > But the OAuth2 authentication case doesn't work. It's caused by
> > > > > https://github.com/apache/pulsar/pull/16064 and fixed by
> > > > > https://github.com/apache/pulsar-client-cpp/pull/190.
> > > > >
> > > > > When the protocol of the issuer URL is HTTPS:
> > > > > - Before #16064, OAuth2 authentication works by skipping verifying the
> > > > > peer, it's dangerous for security reasons
> > > > > - After #16064 and before #190, there is no way to perform OAuth2
> > > > authentica

Re: [VOTE] Pulsar Node.js Client Release 1.8.1 Candidate 2

2023-03-01 Thread Yunze Xu
+1 (binding)
- Verified checksum and signature
- Build from source on Ubuntu 20.04 WSL2
- Test produce and consume with examples in this repo
- Test TLS encryption and OAuth2 authentication on Ubuntu 20.04 and
Windows 10 with https://github.com/BewareMyPower/pulsar-tls-examples

Thanks,
Yunze

On Tue, Feb 28, 2023 at 5:08 PM Nozomi Kurihara  wrote:
>
> +1 (binding)
>
> * checked license headers
> * verified checksum and signature
> * install from npm and run producer/consumer
>
> Thanks,
> Nozomi
>
> 2023年2月26日(日) 12:23 Baodi Shi :
>
> > Hi everyone,
> >
> > This is the first release candidate for Apache Pulsar Node.js client,
> > version 1.8.1.
> >
> > It fixes the following issues:
> >
> > https://github.com/apache/pulsar-client-node/pulls?q=is%3Apr+label%3Arelease%2Fv1.8.1+is%3Aclosed
> >
> > Please download the source files and review this release candidate:
> > - Download the source package, verify shasum and asc
> > - Follow the README.md to build and run the Pulsar Node.js client.
> >
> > The release candidate package has been published to the npm registry:
> > https://www.npmjs.com/package/pulsar-client/v/1.8.1-rc.2
> > You can install it by `npm i pulsar-client@1.8.1-rc.2
> > --pulsar_binary_host_mirror=
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-node/`
> >  and
> > verify the package.
> >
> > You can refer to this repository to verify tls related features:
> >
> >- https://github.com/shibd/pulsar-client-tls-test
> >
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Source files:
> >
> > https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-node/pulsar-client-node-1.8.1-rc.2/
> >
> > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > https://dist.apache.org/repos/dist/dev/pulsar/KEYS
> >
> > SHA-512 checksum:
> >
> >
> > e596fef3eba6fbd25413ccf6eee3cf0a22c24625ff699b4f6d49676ebe2a053f4864ecdee79eb4dbde4fde143e867ec5c1fe667d0a1db07370b9d2abdb806ac3
> >  apache-pulsar-client-node-1.8.1.tar.gz
> >
> > The tag to be voted upon:
> > v1.8.1-rc.2(f0a5e0b)
> > https://github.com/apache/pulsar-client-node/releases/tag/v1.8.1-rc.2
> >
> > Please review and vote on the release candidate #1 for the version 1.8.1,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> >
> > Thanks,
> > Baodi Shi
> >


RE: Re: Introducer Pulsar admin api to pulsar-client-go

2023-03-01 Thread Shen Eric
Hi Zhangjian,

I am a PM from StreamNative and we also had some internal discussions related 
to this topic. Let me share our ongoing planning:

  *   As Penghui mentioned, we will extract the pulsar admin pkg from the 
pulsarctl to a separate open repo which will be called pulsar-admin-go under 
StreamNative.
  *   We will iterate the pulsar-admin-go library by adding more tests, 
documentations and may also update or fix the existing APIs.
  *   After the pulsar-admin-go library is stable, we will contribute this 
project to Apache Foundation.

On 2023/02/17 15:24:32 ZhangJian He wrote:
> Thank for StreamNative for willing to donate this project. This means we
> don't have to develop and maintain a set of HTTP code from scratch. My idea
> aligns with Yunze's, and separating it into a standalone pulsar-admin-go
> project would be better. The **pulsarctl** repo contains bookkeeper http
> call too. Maybe we can have a project bookkeeper-admin-go ?(it's a liitle
> going off-topic )
>
> Thanks
> ZhangJian He
>
>
> On Fri, 17 Feb 2023 at 20:29, PengHui Li  wrote:
>
> > Hi Yunze,
> >
> > Yes, we can split it.
> > Both one repo with two modules or two repos works for me.
> >
> > The pulsarctl already have the admin API and CLI.
> > So I think we don’t need to develop another one.
> >
> > Best,
> > Penghui
> >
> > > On Feb 17, 2023, at 17:44, Yunze Xu 
> > wrote:
> > >
> > > Hi PengHui,
> > >
> > > Now I changed my mind a bit. Even if the pulsarctl was contributed to
> > > the Apache Foundation, I think we should also avoid adding it as the
> > > dependency. What we need is an API layer but not the CLI, while
> > > pulsarctl couples the API and CLI.
> > >
> > > At the moment, my expectation is:
> > > 1. Use a separate repo (e.g. pulsar-admin-go) to implement the admin
> > > APIs in Golang.
> > > 2. Depend this new repo in pulsarctl.
> > >
> > > Then we will have three Go projects:
> > > - pulsar-client-go: The Pulsar Go client APIs
> > > - pulsar-admin-go: The Pulsar Go admin APIs
> > > - pulsarctl: The admin CLI tool written in Go
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Fri, Feb 17, 2023 at 4:22 PM PengHui Li  wrote:
> > >>
> > >> I checked with Sijie today.
> > >> StreamNative can contribute the pulsarctl project to Apache Foundation.
> > >>
> > >> Regards,
> > >> Penghui
> > >>
> > >> On Fri, Feb 17, 2023 at 4:02 PM Enrico Olivelli 
> > wrote:
> > >>
> > >>> I agree to add an admin API to the go client, this would be very
> > helpful.
> > >>>
> > >>> Il giorno ven 17 feb 2023 alle ore 08:44 Zixuan Liu
> > >>>  ha scritto:
> > 
> >  Hi Zhangjian,
> > 
> >  This is a good idea to write the admin client by golang, but I don't
> >  suggest add the admin features to pulsar-go-client, it's better to
> > use a
> >  new repository to do that to separate dependencies.
> > 
> >  BTW, StreamNative has a pulsarctl [0] tool, which includes the admin
> > api.
> > 
> > >> It's better to reuse existing code rather than reinventing the
> > wheel.
> > 
> >  I aggred this point. If possible, we can integrate the pulsarctl to
> > this
> >  new project.
> > >>>
> > >>> We are talking about adding a client that calls a
> > >>> well defined and maintained REST API.
> > >>> It is better to have our implementation and not rely on third parties
> > >>> when it is possible.
> > >>> If there is a security issue in pulsarctl, how would we handle that ?
> > >>> Also the Pulsar community maintains the Pulsar API and this is the
> > >>> place where it is easier to keep the client up-to-date with the new
> > >>> APIs that we will develop,
> > >>> we can't wait for a third party project to implement our own APIs and
> > >>> wait for an upgrade (even if it is OSS, we cannot cut releases or have
> > >>> control over the release cycle)
> > >>>
> > >>>
> > >>> Enrico
> > >>>
> > >>>
> > >>>
> > 
> >  [0] - https://github.com/streamnative/pulsarctl
> > 
> >  Thanks,
> >  Zixuan
> > 
> > 
> >  ZhangJian He  于2023年2月17日周五 13:47写道:
> > 
> > > Separating dependencies is better. For example, I think
> > >>> Pulsar-admin-go can
> > > only have golang standard tls and http dependencies.
> > > But it seems impossible to have two go modules when publishing
> > packages
> > > using github.
> > >
> > >> Has anyone tried generating an admin client from our generated open
> > > api spec?
> > >
> > > I have attempted it, but it requires us to modify our Swagger file.
> > Our
> > > existing Swagger file can't generate HTTP clients directly. Perhaps
> > we
> > >>> can
> > > rewrite a unified and standardized Swagger file, and then generate
> > all
> > > code, including brokers, from there gradually.
> > >
> > > Thanks
> > > ZhangJian He
> > >
> > >
> > > On Fri, 17 Feb 2023 at 12:37, Yunze Xu  > >
> > > wrote:
> > >
> > >>> I notice that the Java Client and the Java Admin Client are
> > >