date:20221214

Re: [VOTE] Pulsar Release 2.11.0 Candidate-2

2022-12-14 Thread Nicolò Boschi

I believe that also this one must be included in 2.11.0
https://github.com/apache/pulsar/pull/18898

Since 2.11.0 system topics are enabled by default so I think we must
include this fix in 2.11.0.
For example, transactions in namespaces with encryption required, wouldn't
work at all.

Thanks,
Nicolò Boschi


Il giorno mar 13 dic 2022 alle ore 13:18 Yunze Xu
 ha scritto:

> I found another breaking change. Please take a look at this PR:
> https://github.com/apache/pulsar/pull/18909
>
> I've also created a discussion at dev mail list, which can be found in
> the PR description.
>
> Thanks,
> Yunze
>
> On Mon, Dec 12, 2022 at 9:22 PM guo jiwei  wrote:
> >
> > Hi
> >All the issues have been resolved and cherry-picked to branch-2.11,  I
> > will trigger the RC3 release now.
> >Thank you.
> >
> > Regards
> > Tboy
> >
> >
> > On Fri, Dec 9, 2022 at 6:04 PM Zixuan Liu  wrote:
> >
> > > I submitted https://github.com/apache/pulsar/pull/18837 to fix this
> issue.
> > >
> > > Thanks,
> > > Zixuan
> > >
> > > Zixuan Liu  于2022年12月9日周五 17:47写道：
> > >
> > > > Ok, let me make a new PR to fix this.
> > > >
> > > > Thanks,
> > > > Zixuan
> > > >
> > > > Yunze Xu  于2022年12月9日周五 17:41写道：
> > > >
> > > >> > I think when an admin hasn't permission to create the namespace,
> the
> > > >> Pulsar
> > > >> should be exited.
> > > >>
> > > >> Maybe. But it's something that requires a proposal because this
> change
> > > >> breaks many standalone deployments of other Pulsar clients.
> > > >>
> > > >> Thanks,
> > > >> Yunze
> > > >>
> > > >> On Fri, Dec 9, 2022 at 5:36 PM Zixuan Liu 
> wrote:
> > > >> >
> > > >> > I appreciate your explanation. Right now it's clear.
> > > >> >
> > > >> > I think when an admin hasn't permission to create the namespace,
> the
> > > >> Pulsar
> > > >> > should be exited.
> > > >> >
> > > >> > Thanks,
> > > >> > Zixuan
> > > >> >
> > > >> >
> > > >> > Yunze Xu  于2022年12月9日周五 17:20写道：
> > > >> >
> > > >> > > Yeah. It failed. However, the failure doesn't affect the start
> of
> > > the
> > > >> > > standalone because the exception will be caught and ignored. See
> > > >> > >
> > > >> > >
> > > >>
> > >
> https://github.com/apache/pulsar/blob/fc96e479a7a88298e59dc3c6b4cc98249eded781/pulsar-broker/src/main/java/org/apache/pulsar/PulsarStandalone.java#L412-L414
> > > >> > >
> > > >> > > The public/default namespace will be created later via the
> > > >> > > pulsar-admin command. See the deployment here:
> > > >> > >
> > > >> > >
> > > >> > >
> > > >>
> > >
> https://github.com/apache/pulsar-client-cpp/blob/2018a06e8afcde4de59971cfbc5f653a4f9d7897/build-support/start-test-service-inside-container.sh#L60-L63
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Yunze
> > > >> > >
> > > >> > >
> > > >> > > On Fri, Dec 9, 2022 at 5:02 PM Zixuan Liu 
> > > wrote:
> > > >> > > >
> > > >> > > > So how could you explain the failed cpp tests?
> > > >> > > >
> > > >> > > > I guess other PR breaks your tests, the Pulsar 2.10 standalone
> > > also
> > > >> uses
> > > >> > > > the `admin` to create the namespace.
> > > >> > > >
> > > >> > > > See
> > > >> > > >
> > > >> > >
> > > >>
> > >
> https://github.com/apache/pulsar/blob/branch-2.10/pulsar-broker/src/main/java/org/apache/pulsar/PulsarStandalone.java#L387
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Zixuan
> > > >> > > >
> > > >> > > > Yunze Xu  于2022年12月9日周五
> 16:46写道：
> > > >> > > >
> > > >> > > > > > I don't think https://github.com/apache/pulsar/pull/18755
> > > >> brings the
> > > >> > > > > breaking change.
> > > >> > > > >
> > > >> > > > > So how could you explain the failed cpp tests? These tests
> have
> > > >> run
> > > >> > > > > against a configured Pulsar standalone for a long time. Is
> it
> > > >> > > > > reasonable to say, from a specific version (e.g. 2.11.0),
> you
> > > >> should
> > > >> > > > > add some extra configurations to the server side,
> otherwise, the
> > > >> > > > > server could not start.
> > > >> > > > >
> > > >> > > > > Maybe it's true that the brokerClientAuthenticationPlugin
> and
> > > >> > > > > brokerClientAuthenticationParameters should be configured
> for
> > > the
> > > >> > > > > authentication case. But it doesn't affect the fact that
> this PR
> > > >> > > > > brings the breaking change.
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > Yunze
> > > >> > > > >
> > > >> > > > > On Fri, Dec 9, 2022 at 4:40 PM Zixuan Liu <
> node...@gmail.com>
> > > >> wrote:
> > > >> > > > > >
> > > >> > > > > > I don't think https://github.com/apache/pulsar/pull/18755
> > > >> brings the
> > > >> > > > > > breaking change. When you enable the
> > > >> authentication/authorization,
> > > >> > > you
> > > >> > > > > must
> > > >> > > > > > configure the brokerClientAuthenticationPlugin and
> > > >> > > > > > brokerClientAuthenticationParameters in the broker config
> > > file.
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Zixuan
> > > >> > > > > >
> > > >> > > > > > Yunze Xu  于2022年12月9日周五
> > > 16:18写道：

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-14 Thread Xiangying Meng

Good viewpoint, It looks like that the AUTO_PRODUCE schema is similar to
the BYTES schema in the semantic.
So can we make the BYTES schema has the features of the AUTO_PRODUCE?
There have some reasons to do this.
Firstly, it does not cause compatibility issues. Now, the topics that have
messages sent by the BYTES schema can only be consumed by the BYTES schema
and AUTO_CONSUME schema, so we make it consumable by other schemas without
affecting user logic already using the schema.
Secondly,  The BYTES schema is easier to understand than the AUTO_PRODUCE
schema.
Finally, now the BYTES schema is the same as other schemas which is an
exclusive schema. But the BYTES schema has some special logic in the pulsar
which is different from other schemas. But if we make the BYTES schema has
the features of AUTO_PORDUCE, then it will be a special schema whose
special logic is reasonable. And we can delete a seemingly unreasonable
existence, AUTO_PRODUCE.

I do not know much well. If there are any questions, feel free to point
them out.

Sincerely,
Xiangying

On Wed, Dec 14, 2022 at 3:12 PM 丛搏  wrote:

> >
> > > the user only creates one producer to send all Kafka topic data, if
> > using Pulsar schema, the user needs to create all schema producers in
> > a map
> >
> > It doesn't make sense to me. If the source topic has messages of
> > multiple schemas, why did you try to sink them into the same topic
> > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > schema to validate the source messages. But if the schema of the topic
> > evolved, the left messages from the source topic could not be sent to
> > the topic.
> >
> Let me give you an example, AvroSchema will have multi-version,
> the version(0) :
> Student {
> String name;
> }
> the version(1) :
> Student {
> String name;
> int age;
> }
> how do you can create two Student.class in one java process? and use
> the same namespace?
> It's not only the schema type changes it also will have multi-version
> schema.
> In this case, how do you create two producers with version(0) and
> version(1)?
>
> > The most confusing part is that AUTO_PRODUCE schema will perform
> > message format validation before send. It's transparent to users and
> > intuitive. IMO, it's better to call validate explicitly like
> >
> > ```java
> > producer.newMessage().value(bytes).validate().sendAsync();
> > ```
> >
> > There are two benefits:
> > 1. It's clear that the message validation happens before sending.
> > 2. If users don't want to validate before sending, they can choose to
> > send the bytes directly and validate the message during consumption.
> It only uses `schema.validate()` is enough, data validation does not
> belong to the pulsar message, and we can add a usage description in
> the schema doc.
> >
> > The performance problem of the AUTO_PRODUCE schema is that the
> > validation happens twice and it cannot be controlled.
>
> Our data verification is the behavior of the client, not the behavior
> of the broker. Therefore, we cannot effectively verify that bytes are
> generated by a specific schema. I think this is something that users
> should consider rather than something that pulsar should guarantee
> because you can't control the data sent by users that is generated by
> this schema only for client verification. so, we don't need to verify
> twice. Unless we verify in the broker, but this is an overhead, we can
> add config to control, but is it really necessary?
>
> Thanks,
> Bo
>
> Yunze Xu  于2022年12月14日周三 12:40写道：
> >
> > > the user only creates one producer to send all Kafka topic data, if
> > using Pulsar schema, the user needs to create all schema producers in
> > a map
> >
> > It doesn't make sense to me. If the source topic has messages of
> > multiple schemas, why did you try to sink them into the same topic
> > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > schema to validate the source messages. But if the schema of the topic
> > evolved, the left messages from the source topic could not be sent to
> > the topic.
> >
> > The most confusing part is that AUTO_PRODUCE schema will perform
> > message format validation before send. It's transparent to users and
> > intuitive. IMO, it's better to call validate explicitly like
> >
> > ```java
> > producer.newMessage().value(bytes).validate().sendAsync();
> > ```
> >
> > There are two benefits:
> > 1. It's clear that the message validation happens before sending.
> > 2. If users don't want to validate before sending, they can choose to
> > send the bytes directly and validate the message during consumption.
> >
> > The performance problem of the AUTO_PRODUCE schema is that the
> > validation happens twice and it cannot be controlled.
> >
> > Thanks,
> > Yunze
> >
> > On Wed, Dec 14, 2022 at 12:01 PM 丛搏  wrote:
> > >
> > > Hi, Yunze:
> > >
> > > Yunze Xu  于2022年12月14日周三 02:26写道：
> > >
> > > > First, how do you guarantee the schema can be used to encode the raw
> > > > bytes wh

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-14 Thread PengHui Li

+1 (binding)

- Penghui

On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli  wrote:

> +1 (binding)
>
> Enrico
>
> Il Ven 9 Dic 2022, 10:41 Jiaqi Shen  ha scritto:
>
> > +1(non-binding)
> >
> > Thanks,
> > Jiaqi Shen
> >
> >
> >  于2022年12月5日周一 15:23写道：
> >
> > > +1(non-binding)
> > >
> > > Best,
> > > Mattison
> > > On Dec 5, 2022, 15:09 +0800, Zike Yang , wrote:
> > > > +1(non-binding)
> > > >
> > > > Best,
> > > > Zike Yang
> > > >
> > > > On Mon, Dec 5, 2022 at 2:41 PM Baodi Shi
>  > >
> > > wrote:
> > > > >
> > > > > +1(non-binding)
> > > > >
> > > > > Thanks,
> > > > > Baodi Shi
> > > > >
> > > > > > > 2022年12月5日 12:51，Yunze Xu  写道：
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'm starting the vote for PIP-224: Introduce TopicMessageId for
> > > > > > > consumer's MessageId related APIs:
> > > > > > > https://github.com/apache/pulsar/issues/18616
> > > > > > >
> > > > > > > Here is the discussion thread:
> > > > > > >
> https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy
> > > > > > >
> > > > > > > The vote will be open for at least 3 days.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yunze
> > > > >
> > >
> >
>

Re: [DISCUSS] PIP-228: Refactor the information architecture of Pulsar client docs

2022-12-14 Thread Zike Yang

Hi Jun,

Thanks for your PIP. Overall looks good to me.

> The Client concepts topic does not introduce the basic client concepts and 
> can be enriched with content relocated from other topics. See Proposed IA for 
> more details.

The link of `Proposed IA` doesn't work. Need to grant the access permission.

Thanks,
Zike Yang

On Thu, Dec 8, 2022 at 8:47 PM Jun Ma  wrote:
>
> Hi, all,
>
> I've created PIP-228 to discuss - Refactor the Information Architecture of 
> Pulsar Client Docs.
>
>
> Motivation
>
>   *   Improve the developer experience and help them get started by offering 
> bite-sized basics in the docs.
>   *   Build a solid content structure to make it easier to increment and 
> scale.
>   *   Contribute to Pulsar adoption.
>
> For more details, please read the PIP at 
> https://github.com/apache/pulsar/issues/18822.
> I'm looking forward to hearing what you think.
>
> Best,
> Jun

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-14 Thread Yunze Xu

> how do you can create two Student.class in one java process? and use
the same namespace?

Could you give an example to show how `AUTO_PRODUCE` schema makes a difference?

But with AUTO_PRODUCE schema, the precondition is that we have a topic
that has messages of these two schemas.

For example, there is a `bytes-topic` without schema that has two messages:
- msg0: Serialized from `new Student("abc")` (schema v0)
- msg1: Serialized from `new Student("abc", 1)` (schema v1)

Then you can consume these bytes, and send the messages to **a topic
that has registered a schema**.
- If the schema is v0, it's okay to send msg0 and msg1 to the topic.
But the msg1 will lose some bytes because the schema v0 doesn't have
the `age` field.
- If the schema is v1, msg0 cannot be sent because msg0 doesn't have
the `age` field.

So which schema did you expect for this topic?

This example also shows AUTO_PRODUCE schema performs validation at
producer side.

However, if we just send msg0 and msg1 to a topic without schema. Then
it will be consumer's responsibility to determine whether the received
message is valid.

```java
var bytes = consumer.receive(); // bytes
var student = Schema.AVRO(Student.class).decode(bytes);
```

- If the `Student` is v0, msg0 and msg1 can be decoded successfully.
- If the `Student` is v1, decoding msg0 will throw an exception.

Since all messages are stored in the topic, the downstream side
(consumer) can catch the exception to discard the bytes without the
expected schema.

But if the validation fails at the producer side, there is a chance
that msg0 is lost. In addition, let's see the producer and consumer
code in this case.

```
producer.send(msg0); // validation happens at the producer side
```

```
var msg = consumer.receive();
var student = msg.getValue(); // validation happens again, though it
has already been validated before
```

Thanks,
Yunze

On Wed, Dec 14, 2022 at 3:11 PM 丛搏  wrote:
>
> >
> > > the user only creates one producer to send all Kafka topic data, if
> > using Pulsar schema, the user needs to create all schema producers in
> > a map
> >
> > It doesn't make sense to me. If the source topic has messages of
> > multiple schemas, why did you try to sink them into the same topic
> > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > schema to validate the source messages. But if the schema of the topic
> > evolved, the left messages from the source topic could not be sent to
> > the topic.
> >
> Let me give you an example, AvroSchema will have multi-version,
> the version(0) :
> Student {
> String name;
> }
> the version(1) :
> Student {
> String name;
> int age;
> }
> how do you can create two Student.class in one java process? and use
> the same namespace?
> It's not only the schema type changes it also will have multi-version schema.
> In this case, how do you create two producers with version(0) and version(1)?
>
> > The most confusing part is that AUTO_PRODUCE schema will perform
> > message format validation before send. It's transparent to users and
> > intuitive. IMO, it's better to call validate explicitly like
> >
> > ```java
> > producer.newMessage().value(bytes).validate().sendAsync();
> > ```
> >
> > There are two benefits:
> > 1. It's clear that the message validation happens before sending.
> > 2. If users don't want to validate before sending, they can choose to
> > send the bytes directly and validate the message during consumption.
> It only uses `schema.validate()` is enough, data validation does not
> belong to the pulsar message, and we can add a usage description in
> the schema doc.
> >
> > The performance problem of the AUTO_PRODUCE schema is that the
> > validation happens twice and it cannot be controlled.
>
> Our data verification is the behavior of the client, not the behavior
> of the broker. Therefore, we cannot effectively verify that bytes are
> generated by a specific schema. I think this is something that users
> should consider rather than something that pulsar should guarantee
> because you can't control the data sent by users that is generated by
> this schema only for client verification. so, we don't need to verify
> twice. Unless we verify in the broker, but this is an overhead, we can
> add config to control, but is it really necessary?
>
> Thanks,
> Bo
>
> Yunze Xu  于2022年12月14日周三 12:40写道：
> >
> > > the user only creates one producer to send all Kafka topic data, if
> > using Pulsar schema, the user needs to create all schema producers in
> > a map
> >
> > It doesn't make sense to me. If the source topic has messages of
> > multiple schemas, why did you try to sink them into the same topic
> > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > schema to validate the source messages. But if the schema of the topic
> > evolved, the left messages from the source topic could not be sent to
> > the topic.
> >
> > The most confusing part is that AUTO_PRODUCE schema will perform
> > message format

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-14 Thread Yunze Xu

> It looks like that the AUTO_PRODUCE schema is similar to
the BYTES schema in the semantic.

The only differences are:
1. AUTO_PRODUCE schema can produce messages to a topic that already
has schema (because it downloaded the schema and add it to the
CommandProducer request)
2. AUTO_PRODUCE schema will validation if the bytes can be decoded via
the downloaded schema

The 1st difference doesn't exist if you configured
`isSchemaValidationEnforced` with false, which is the default config.
[1]

The 2nd difference is terrible IMO. The validation should be performed
at the consumer side. Validating message format before producing
doesn't make sense. It tries to make sure every message sent by bytes
producers in the topic is valid.

AUTO_PRODUCE schema just hiddens the details below and is hard to understand.
1. Download the latest schema info.
2. Convert the schema info to a `Schema` instance (in
`AutoConsumeSchema#getSchema`)
3. Call `Schema#validate` to validate the messages.

However, these details are transparent and complicated to users. If
the documents have described it clearly, I'm wondering how many users
would give up the AUTO_PRODUCE schema.

For those users that really want to validate the bytes before send.
They can just use `PulsarAdmin` to achieve the same goal.

```java

var producer = client.newProducer().topic(topic).create();

var admin = PulsarAdmin.builder().serviceHttpUrl(HTTP_URL).build();
var schema = AutoConsumeSchema.getSchema(admin.schemas().getSchemaInfo(topic));
schema.validate(bytes);
producer.send(bytes);
```

They can also control when to download the schema, if `validate`
failed, they can choose to download the latest schema again to
validate it. The code is more clear that what it has done, while the
AUTO_PRODUCE schema hiddens the details and declares it to be
convenient without telling users the performance cost.

[1] https://github.com/apache/pulsar/pull/2730

Thanks,
Yunze

On Wed, Dec 14, 2022 at 4:39 PM Xiangying Meng  wrote:
>
> Good viewpoint, It looks like that the AUTO_PRODUCE schema is similar to
> the BYTES schema in the semantic.
> So can we make the BYTES schema has the features of the AUTO_PRODUCE?
> There have some reasons to do this.
> Firstly, it does not cause compatibility issues. Now, the topics that have
> messages sent by the BYTES schema can only be consumed by the BYTES schema
> and AUTO_CONSUME schema, so we make it consumable by other schemas without
> affecting user logic already using the schema.
> Secondly,  The BYTES schema is easier to understand than the AUTO_PRODUCE
> schema.
> Finally, now the BYTES schema is the same as other schemas which is an
> exclusive schema. But the BYTES schema has some special logic in the pulsar
> which is different from other schemas. But if we make the BYTES schema has
> the features of AUTO_PORDUCE, then it will be a special schema whose
> special logic is reasonable. And we can delete a seemingly unreasonable
> existence, AUTO_PRODUCE.
>
> I do not know much well. If there are any questions, feel free to point
> them out.
>
> Sincerely,
> Xiangying
>
> On Wed, Dec 14, 2022 at 3:12 PM 丛搏  wrote:
>
> > >
> > > > the user only creates one producer to send all Kafka topic data, if
> > > using Pulsar schema, the user needs to create all schema producers in
> > > a map
> > >
> > > It doesn't make sense to me. If the source topic has messages of
> > > multiple schemas, why did you try to sink them into the same topic
> > > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > > schema to validate the source messages. But if the schema of the topic
> > > evolved, the left messages from the source topic could not be sent to
> > > the topic.
> > >
> > Let me give you an example, AvroSchema will have multi-version,
> > the version(0) :
> > Student {
> > String name;
> > }
> > the version(1) :
> > Student {
> > String name;
> > int age;
> > }
> > how do you can create two Student.class in one java process? and use
> > the same namespace?
> > It's not only the schema type changes it also will have multi-version
> > schema.
> > In this case, how do you create two producers with version(0) and
> > version(1)?
> >
> > > The most confusing part is that AUTO_PRODUCE schema will perform
> > > message format validation before send. It's transparent to users and
> > > intuitive. IMO, it's better to call validate explicitly like
> > >
> > > ```java
> > > producer.newMessage().value(bytes).validate().sendAsync();
> > > ```
> > >
> > > There are two benefits:
> > > 1. It's clear that the message validation happens before sending.
> > > 2. If users don't want to validate before sending, they can choose to
> > > send the bytes directly and validate the message during consumption.
> > It only uses `schema.validate()` is enough, data validation does not
> > belong to the pulsar message, and we can add a usage description in
> > the schema doc.
> > >
> > > The performance problem of the AUTO_PRODUCE schema is that the
>

[GitHub] [pulsar] tisonkun added a comment to the discussion: [Bug] OOM on running pulsar-perf

2022-12-14 Thread GitBox



GitHub user tisonkun added a comment to the discussion: [Bug] OOM on running 
pulsar-perf

Cannot reproduce locally. I suspect it's about configuration issues or your 
local env restrictions. Moved to the Q&A form.

GitHub link: 
https://github.com/apache/pulsar/discussions/18930#discussioncomment-4400457


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org

[GitHub] [pulsar] Amiaozhou edited a discussion: OOM on running pulsar-perf

2022-12-14 Thread GitBox



GitHub user Amiaozhou edited a discussion: OOM on running pulsar-perf

### Search before asking

- [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and 
found nothing similar.


### Version

OS：Linux version 3.10.0-1160.el7.x86_64 
Pulsar： apache-pulsar-2.10.1

### Minimal reproduce step

1）benchmark test：
bin/pulsar-perf produce -r 8000 -n 1 -threads 3 -c 10 -s 1024 
persistent://my_tenant/my_tomcat_mulstag/threads-topic-08


### What did you expect to see?

always right send

### What did you see instead?

org.apache.pulsar.common.allocator.PulsarByteBufAllocator - Exiting JVM process 
for OOM error: failed to allocate 4194304 byte(s) of direct memory (used: 
28630319104, max: 28631367680)
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4194304 
byte(s) of direct memory (used: 28630319104, max: 28631367680)


### Anything else?

_No response_

### Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

GitHub link: https://github.com/apache/pulsar/discussions/18930


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-14 Thread 丛搏

Yunze Xu  于2022年12月14日周三 20:37写道：
>
> > how do you can create two Student.class in one java process? and use
> the same namespace?
>
> Could you give an example to show how `AUTO_PRODUCE` schema makes a 
> difference?

// this is Student use version0, may be data from kafka
byte[] student1 = autoConsumer.receive().getData();
// this is Student use version1, may be data from kafka
byte[] student2 = autoConsumer.receive().getData();
// send student with version0 schema date
p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
.withJsonDef("student with version0 json def").build(
.value(student1).send();

// send student with version1 schema date
p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
.withJsonDef("student with version1 json def").build(
.value(student1).send();

>
> But with AUTO_PRODUCE schema, the precondition is that we have a topic
> that has messages of these two schemas.
>
> For example, there is a `bytes-topic` without schema that has two messages:
> - msg0: Serialized from `new Student("abc")` (schema v0)
> - msg1: Serialized from `new Student("abc", 1)` (schema v1)
>
> Then you can consume these bytes, and send the messages to **a topic
> that has registered a schema**.
> - If the schema is v0, it's okay to send msg0 and msg1 to the topic.
> But the msg1 will lose some bytes because the schema v0 doesn't have
> the `age` field.
> - If the schema is v1, msg0 cannot be sent because msg0 doesn't have
> the `age` field.
>
> So which schema did you expect for this topic?
if you use AUTO_PRODUCE_BYTES, the message will have the correct schema version.
link code: 
https://github.com/apache/pulsar/blob/4129583c418dd68f8303dee601132e2910cdf8e6/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L718-L746

the msg0 will be sent with schema v0
this msg1 will be sent with schema v1
>
> This example also shows AUTO_PRODUCE schema performs validation at
> producer side.
>
> However, if we just send msg0 and msg1 to a topic without schema. Then
> it will be consumer's responsibility to determine whether the received
> message is valid.
>
> ```java
> var bytes = consumer.receive(); // bytes
> var student = Schema.AVRO(Student.class).decode(bytes);
> ```
>
> - If the `Student` is v0, msg0 and msg1 can be decoded successfully.
> - If the `Student` is v1, decoding msg0 will throw an exception.
>
> Since all messages are stored in the topic, the downstream side
> (consumer) can catch the exception to discard the bytes without the
> expected schema.
>
> But if the validation fails at the producer side, there is a chance
> that msg0 is lost. In addition, let's see the producer and consumer
> code in this case.
>
> ```
> producer.send(msg0); // validation happens at the producer side
> ```
>
> ```
> var msg = consumer.receive();
> var student = msg.getValue(); // validation happens again, though it
> has already been validated before
> ```
>
> Thanks,
> Yunze
>
> On Wed, Dec 14, 2022 at 3:11 PM 丛搏  wrote:
> >
> > >
> > > > the user only creates one producer to send all Kafka topic data, if
> > > using Pulsar schema, the user needs to create all schema producers in
> > > a map
> > >
> > > It doesn't make sense to me. If the source topic has messages of
> > > multiple schemas, why did you try to sink them into the same topic
> > > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > > schema to validate the source messages. But if the schema of the topic
> > > evolved, the left messages from the source topic could not be sent to
> > > the topic.
> > >
> > Let me give you an example, AvroSchema will have multi-version,
> > the version(0) :
> > Student {
> > String name;
> > }
> > the version(1) :
> > Student {
> > String name;
> > int age;
> > }
> > how do you can create two Student.class in one java process? and use
> > the same namespace?
> > It's not only the schema type changes it also will have multi-version 
> > schema.
> > In this case, how do you create two producers with version(0) and 
> > version(1)?
> >
> > > The most confusing part is that AUTO_PRODUCE schema will perform
> > > message format validation before send. It's transparent to users and
> > > intuitive. IMO, it's better to call validate explicitly like
> > >
> > > ```java
> > > producer.newMessage().value(bytes).validate().sendAsync();
> > > ```
> > >
> > > There are two benefits:
> > > 1. It's clear that the message validation happens before sending.
> > > 2. If users don't want to validate before sending, they can choose to
> > > send the bytes directly and validate the message during consumption.
> > It only uses `schema.validate()` is enough, data validation does not
> > belong to the pulsar message, and we can add a usage description in
> > the schema doc.
> > >
> > > The performance problem of the AUTO_PRODUCE schema is that the
> > > validation happens twice and it cannot be controlled.
> >
> > Our

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-14 Thread 丛搏

+1 (non-binding)

Thanks,
Bo

PengHui Li  于2022年12月14日周三 19:12写道：
>
> +1 (binding)
>
> - Penghui
>
> On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli  wrote:
>
> > +1 (binding)
> >
> > Enrico
> >
> > Il Ven 9 Dic 2022, 10:41 Jiaqi Shen  ha scritto:
> >
> > > +1(non-binding)
> > >
> > > Thanks,
> > > Jiaqi Shen
> > >
> > >
> > >  于2022年12月5日周一 15:23写道：
> > >
> > > > +1(non-binding)
> > > >
> > > > Best,
> > > > Mattison
> > > > On Dec 5, 2022, 15:09 +0800, Zike Yang , wrote:
> > > > > +1(non-binding)
> > > > >
> > > > > Best,
> > > > > Zike Yang
> > > > >
> > > > > On Mon, Dec 5, 2022 at 2:41 PM Baodi Shi
> >  > > >
> > > > wrote:
> > > > > >
> > > > > > +1(non-binding)
> > > > > >
> > > > > > Thanks,
> > > > > > Baodi Shi
> > > > > >
> > > > > > > > 2022年12月5日 12:51，Yunze Xu  写道：
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I'm starting the vote for PIP-224: Introduce TopicMessageId for
> > > > > > > > consumer's MessageId related APIs:
> > > > > > > > https://github.com/apache/pulsar/issues/18616
> > > > > > > >
> > > > > > > > Here is the discussion thread:
> > > > > > > >
> > https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy
> > > > > > > >
> > > > > > > > The vote will be open for at least 3 days.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yunze
> > > > > >
> > > >
> > >
> >

Re: [DISCUSS] Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

2022-12-14 Thread 丛搏

I still feel better to change compareTo directly.

1. Although using PulsarApiMessageId.campare() can reduce the
probability of developers using errors, it cannot be completely
avoided.

2. While a direct change would change the default behavior, I consider
it a bug, not a breaking change. We can explain it in the new version
release blog. Maybe some users use it, but they didn’t find the
problem, and we changed it correctly . I don't think any user will be
able to use the current compareTo() correctly. Because the current
implementation is unexpected. When the user finds out that this
problem exists, he will not use this method.

Thanks,
Bo

Yunze Xu  于2022年12月8日周四 20:43写道：
>
> Actually I'm refactoring the MessageId related code [1], whose current
> implementations are very messy from my perspective. My solution to
> this issue is adding two compare methods, one of them is the "wrong"
> implementation and used in `MessageId#compareTo` to avoid the breaking
> change. See the `legacyCompare` and `compare` methods.
>
> ```java
> // The legacy compare method, which treats the non-batched message id
> as preceding the batched message id.
> // However, this behavior is wrong because a non-batched message id
> represents an entry, while a batched message
> // represents a single message in the entry, which should precedes the
> message id.
> // Keep this implementation just for backward compatibility when users
> compare two message ids.
> static int legacyCompare(MessageIdDataInterface lhs,
> MessageIdDataInterface rhs) { /* ... */ }
>
> static int compare(MessageIdDataInterface lhs, MessageIdDataInterface
> rhs) { /* ... */ }
> ```
>
> [1] https://github.com/BewareMyPower/pulsar/pull/11/files
>
> Thanks,
> Yunze
>
> On Thu, Dec 8, 2022 at 7:22 PM 丛搏  wrote:
> >
> > Hi, Yunze:
> > If we don't change this behavior, we should pay special attention when
> > coding `pulsar-client`, because it is a point that is easy to
> > overlook. its impact may be more serious than "wrong " behavior
> > produced by the user using the current compareTo() method manually. I
> > don’t think this is a breaking change. On the contrary, it is a bug
> > that needs to be fixed. Because we cannot guarantee that everyone can
> > find the problem of compareTo() in time when writing code or reviewing
> > pr. The current implementation is Very anti-human.
> >
> > Thanks,
> > bo
> >
> > Yunze Xu  于2022年12月8日周四 18:02写道：
> > >
> > > Actually, from the user side, this comparison would never happen.
> > > Users could never receive two MessageId objects with the same ledger
> > > id, entry id while the batch index fields are different. This
> > > comparison could only exist in the `pulsar-client` implementation.
> > >
> > > If users touch the case, the MessageId object must be created
> > > manually, which is a hack. The "wrong" behavior might be used. So my
> > > perspective is that we should not change this behavior.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Thu, Dec 8, 2022 at 5:36 PM 丛搏  wrote:
> > > >
> > > > Hi, all:
> > > >
> > > > does anyone have any suggestions?
> > > >
> > > > Thanks,
> > > > bo
> > > >
> > > > 丛搏  于2022年11月21日周一 18:57写道：
> > > > >
> > > > > Hello, Pulsar community:
> > > > >
> > > > > now when `BatchMessageIdImpl` and `MessageIdImpl` with the same
> > > > > `ledgerId` and `EntryId`, one of it compare with the other, the
> > > > > `BatchMessageIdImpl` will always be greater than MessageIdImpl.
> > > > > see : 
> > > > > https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/BatchMessageIdImpl.java#L71-L74
> > > > >
> > > > > https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageIdImpl.java#L219-L228
> > > > >
> > > > > but when we use it, we may think `MessageIdImpl` is bigger than
> > > > > `BatchMessageIdImpl` with the same `ledgerId` and `EntryId`. It causes
> > > > > a lot of bugs. I think we need to change this `compareTo()` method,
> > > > > although it is a public API, I think it is not a breaking change, it
> > > > > is a bug that needs to be fixed.
> > > > > eg. : https://github.com/apache/pulsar/pull/18486, need to add the
> > > > > separate logic for compareTo().
> > > > >
> > > > > Please leave your thoughts, thanks.
> > > > >
> > > > > Thanks,
> > > > > bo

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-14 Thread Yunze Xu

Why not use the following code with a BYTES producer in your case?

```java
var schema0 = Schema.AVRO(SchemaDefinition.builder()
.withJsonDef("student with version0 json def").build();
p.newMessage(schema0).value(schema0.decode(student1)).send();
...
```

Thanks,
Yunze

On Wed, Dec 14, 2022 at 10:37 PM 丛搏  wrote:
>
> Yunze Xu  于2022年12月14日周三 20:37写道：
> >
> > > how do you can create two Student.class in one java process? and use
> > the same namespace?
> >
> > Could you give an example to show how `AUTO_PRODUCE` schema makes a 
> > difference?
>
> // this is Student use version0, may be data from kafka
> byte[] student1 = autoConsumer.receive().getData();
> // this is Student use version1, may be data from kafka
> byte[] student2 = autoConsumer.receive().getData();
> // send student with version0 schema date
> p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> .withJsonDef("student with version0 json def").build(
> .value(student1).send();
>
> // send student with version1 schema date
> p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> .withJsonDef("student with version1 json def").build(
> .value(student1).send();
>
> >
> > But with AUTO_PRODUCE schema, the precondition is that we have a topic
> > that has messages of these two schemas.
> >
> > For example, there is a `bytes-topic` without schema that has two messages:
> > - msg0: Serialized from `new Student("abc")` (schema v0)
> > - msg1: Serialized from `new Student("abc", 1)` (schema v1)
> >
> > Then you can consume these bytes, and send the messages to **a topic
> > that has registered a schema**.
> > - If the schema is v0, it's okay to send msg0 and msg1 to the topic.
> > But the msg1 will lose some bytes because the schema v0 doesn't have
> > the `age` field.
> > - If the schema is v1, msg0 cannot be sent because msg0 doesn't have
> > the `age` field.
> >
> > So which schema did you expect for this topic?
> if you use AUTO_PRODUCE_BYTES, the message will have the correct schema 
> version.
> link code: 
> https://github.com/apache/pulsar/blob/4129583c418dd68f8303dee601132e2910cdf8e6/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L718-L746
>
> the msg0 will be sent with schema v0
> this msg1 will be sent with schema v1
> >
> > This example also shows AUTO_PRODUCE schema performs validation at
> > producer side.
> >
> > However, if we just send msg0 and msg1 to a topic without schema. Then
> > it will be consumer's responsibility to determine whether the received
> > message is valid.
> >
> > ```java
> > var bytes = consumer.receive(); // bytes
> > var student = Schema.AVRO(Student.class).decode(bytes);
> > ```
> >
> > - If the `Student` is v0, msg0 and msg1 can be decoded successfully.
> > - If the `Student` is v1, decoding msg0 will throw an exception.
> >
> > Since all messages are stored in the topic, the downstream side
> > (consumer) can catch the exception to discard the bytes without the
> > expected schema.
> >
> > But if the validation fails at the producer side, there is a chance
> > that msg0 is lost. In addition, let's see the producer and consumer
> > code in this case.
> >
> > ```
> > producer.send(msg0); // validation happens at the producer side
> > ```
> >
> > ```
> > var msg = consumer.receive();
> > var student = msg.getValue(); // validation happens again, though it
> > has already been validated before
> > ```
> >
> > Thanks,
> > Yunze
> >
> > On Wed, Dec 14, 2022 at 3:11 PM 丛搏  wrote:
> > >
> > > >
> > > > > the user only creates one producer to send all Kafka topic data, if
> > > > using Pulsar schema, the user needs to create all schema producers in
> > > > a map
> > > >
> > > > It doesn't make sense to me. If the source topic has messages of
> > > > multiple schemas, why did you try to sink them into the same topic
> > > > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > > > schema to validate the source messages. But if the schema of the topic
> > > > evolved, the left messages from the source topic could not be sent to
> > > > the topic.
> > > >
> > > Let me give you an example, AvroSchema will have multi-version,
> > > the version(0) :
> > > Student {
> > > String name;
> > > }
> > > the version(1) :
> > > Student {
> > > String name;
> > > int age;
> > > }
> > > how do you can create two Student.class in one java process? and use
> > > the same namespace?
> > > It's not only the schema type changes it also will have multi-version 
> > > schema.
> > > In this case, how do you create two producers with version(0) and 
> > > version(1)?
> > >
> > > > The most confusing part is that AUTO_PRODUCE schema will perform
> > > > message format validation before send. It's transparent to users and
> > > > intuitive. IMO, it's better to call validate explicitly like
> > > >
> > > > ```java
> > > > producer.newMessage().value(bytes).validate().sendAsync();
> > > > ```
> > > >
> >

[VOTE] Pulsar Client Python Release 3.0.0 Candidate 2

2022-12-14 Thread Yunze Xu

This is the second release candidate for Apache Pulsar Client Python,
version 3.0.0.

It fixes the following issues:
https://github.com/apache/pulsar-client-python/milestone/1?closed=1

*** Please download, test and vote on this release. This vote will
stay open for at least 72 hours ***

Python wheels:
https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-python-3.0.0-candidate-2/

The supported python versions are 3.7, 3.8, 3.9 and 3.10. The
supported platforms are:
- Windows (win_amd64.whl)
- Linux
   - x86_64 architecture on glibc based distro (manylinux2014_x86_64.whl)
   - arm64 architecture on glibc based distro (manylinux2014_aarch64.whl)
   - x86_64 architecture on musl based distro (musllinux_1_1_x86_64.whl)
   - architecture on musl based distro (musllinux_1_1_aarch64.whl)
- macOS (macosx_10_15_universal2.whl)

The tag to be voted upon: v3.0.0-candidate-2
(46acc487ad16fdc0aeea9dae64484030e62c1b96)
https://github.com/apache/pulsar-client-python/releases/tag/v3.0.0-candidate-2

Pulsar's KEYS file containing PGP keys you use to sign the release:
https://dist.apache.org/repos/dist/dev/pulsar/KEYS

Please download the Python wheels and follow the README to test.

Thanks,
Yunze

Re: [VOTE] Pulsar Client Python Release 3.0.0 Candidate 2

2022-12-14 Thread Matteo Merli

+1

Great work!

Checked:
 * Signatures
 * Wheel file on Mac ARM, publishing and consuming messages
 * Wheel file on Alpine Linux, publishing and consuming messages


--
Matteo Merli


On Wed, Dec 14, 2022 at 7:52 AM Yunze Xu  wrote:
>
> This is the second release candidate for Apache Pulsar Client Python,
> version 3.0.0.
>
> It fixes the following issues:
> https://github.com/apache/pulsar-client-python/milestone/1?closed=1
>
> *** Please download, test and vote on this release. This vote will
> stay open for at least 72 hours ***
>
> Python wheels:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-python-3.0.0-candidate-2/
>
> The supported python versions are 3.7, 3.8, 3.9 and 3.10. The
> supported platforms are:
> - Windows (win_amd64.whl)
> - Linux
>- x86_64 architecture on glibc based distro (manylinux2014_x86_64.whl)
>- arm64 architecture on glibc based distro (manylinux2014_aarch64.whl)
>- x86_64 architecture on musl based distro (musllinux_1_1_x86_64.whl)
>- architecture on musl based distro (musllinux_1_1_aarch64.whl)
> - macOS (macosx_10_15_universal2.whl)
>
> The tag to be voted upon: v3.0.0-candidate-2
> (46acc487ad16fdc0aeea9dae64484030e62c1b96)
> https://github.com/apache/pulsar-client-python/releases/tag/v3.0.0-candidate-2
>
> Pulsar's KEYS file containing PGP keys you use to sign the release:
> https://dist.apache.org/repos/dist/dev/pulsar/KEYS
>
> Please download the Python wheels and follow the README to test.
>
> Thanks,
> Yunze

Re: [VOTE] Pulsar Client Python Release 3.0.0 Candidate 2

2022-12-14 Thread Dave Fisher

-1 (binding)

I don’t see a source release package in the dist.Apache.org directory. ASF 
projects release source and everything else is a convenience.

Best,
Dave

Sent from my iPhone

> On Dec 14, 2022, at 9:08 AM, Matteo Merli  wrote:
> 
> +1
> 
> Great work!
> 
> Checked:
> * Signatures
> * Wheel file on Mac ARM, publishing and consuming messages
> * Wheel file on Alpine Linux, publishing and consuming messages
> 
> 
> --
> Matteo Merli
> 
> 
>> On Wed, Dec 14, 2022 at 7:52 AM Yunze Xu  
>> wrote:
>> 
>> This is the second release candidate for Apache Pulsar Client Python,
>> version 3.0.0.
>> 
>> It fixes the following issues:
>> https://github.com/apache/pulsar-client-python/milestone/1?closed=1
>> 
>> *** Please download, test and vote on this release. This vote will
>> stay open for at least 72 hours ***
>> 
>> Python wheels:
>> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-client-python-3.0.0-candidate-2/
>> 
>> The supported python versions are 3.7, 3.8, 3.9 and 3.10. The
>> supported platforms are:
>> - Windows (win_amd64.whl)
>> - Linux
>>   - x86_64 architecture on glibc based distro (manylinux2014_x86_64.whl)
>>   - arm64 architecture on glibc based distro (manylinux2014_aarch64.whl)
>>   - x86_64 architecture on musl based distro (musllinux_1_1_x86_64.whl)
>>   - architecture on musl based distro (musllinux_1_1_aarch64.whl)
>> - macOS (macosx_10_15_universal2.whl)
>> 
>> The tag to be voted upon: v3.0.0-candidate-2
>> (46acc487ad16fdc0aeea9dae64484030e62c1b96)
>> https://github.com/apache/pulsar-client-python/releases/tag/v3.0.0-candidate-2
>> 
>> Pulsar's KEYS file containing PGP keys you use to sign the release:
>> https://dist.apache.org/repos/dist/dev/pulsar/KEYS
>> 
>> Please download the Python wheels and follow the README to test.
>> 
>> Thanks,
>> Yunze

Re: [VOTE] Pulsar Release 2.9.4 Candidate 3

2022-12-14 Thread PengHui Li

+1 (binding)

- Checked the signature
- Start standalone
- Publish and consume messages
- Verified Function and State Function
- Verified Cassandra connector
- Build from the source package

Thanks,
Penghui

On Tue, Dec 13, 2022 at 7:49 PM 丛搏  wrote:

> This is the third release candidate for Apache Pulsar, version 2.9.4.
>
>
> This release contains 319 commits by 69 contributors.
> https://github.com/apache/pulsar/compare/v2.9.3...v2.9.4-candidate-3
>
> *** Please download, test and vote on this release. This vote will stay
> open
> for at least 72 hours ***
>
> Note that we are voting upon the source (tag), binaries are provided for
> convenience.
>
> Source and binary files:
> https://dist.apache.org/repos/dist/dev/pulsar/pulsar-2.9.4-candidate-3/
>
> SHA-512 checksums:
>
> 85cd920c8fedcec2551867e1ea89052c8578634e95226f92c4114d17587e7d2821f8033ef6fc70103e0b21dd3f8f9b907c68209cdc2cb74eca08f0a3ae6bd98c
>  apache-pulsar-2.9.4-bin.tar.gz
>
> da6ee53ffc66e4d9f60c74935c3ed0d85b26f5a629cb50fdfc02f535d66492297932256e4e44c8d4a08d20a85c4f490b7d7b3e169756bc246690bedfe582892b
>  apache-pulsar-2.9.4-src.tar.gz
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachepulsar-1198/
>
> The tag to be voted upon:
> v2.9.4-candidate-3 (e949f18a20c6f8f5b6f326cd4afb814d0fb3b8be)
> https://github.com/apache/pulsar/releases/tag/v2.9.4-candidate-3
>
> Pulsar's KEYS file containing PGP keys you use to sign the release:
> https://dist.apache.org/repos/dist/dev/pulsar/KEYS
>
> Docker images:
>
> 
>
> https://hub.docker.com/layers/congbobo184/pulsar/2.9.4/images/sha256-72272e9b7ce5c568575bacbddf7565fd570d27b486f2f47cafaa0938ec56e1ef
>
> 
>
> https://hub.docker.com/layers/congbobo184/pulsar-all/2.9.4/images/sha256-c17d42831a882028996627abe56e71e067b905fdaac91ca3bdc933d51ce5b73b
>
>
> Please download the source package, and follow the README to build
> and run the Pulsar standalone service.
>

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

2022-12-14 Thread 丛搏

We also can use BYTES producer, but in BYTES schema, do not use
.newMessage(schema0), the message will not carry the schema version.
the consumer will not decode correctly.

and BYTES schema can't validate the data schema. if the data is empty
bytes array, It does not make sense to send it to the broker.

It is irresponsible behavior of the producer to leave everything to
the consumer. I think AUTO_PRODUCER simplifies the data validation
process for users.

I think what we need to do is describe the document clearly and
distinguish it from BYTES rather than delete or deprecate it.

Thanks,
Bo


Yunze Xu  于2022年12月14日周三 23:36写道：

>
> Why not use the following code with a BYTES producer in your case?
>
> ```java
> var schema0 = Schema.AVRO(SchemaDefinition.builder()
> .withJsonDef("student with version0 json def").build();
> p.newMessage(schema0).value(schema0.decode(student1)).send();
> ...
> ```
>
> Thanks,
> Yunze
>
> On Wed, Dec 14, 2022 at 10:37 PM 丛搏  wrote:
> >
> > Yunze Xu  于2022年12月14日周三 20:37写道：
> > >
> > > > how do you can create two Student.class in one java process? and use
> > > the same namespace?
> > >
> > > Could you give an example to show how `AUTO_PRODUCE` schema makes a 
> > > difference?
> >
> > // this is Student use version0, may be data from kafka
> > byte[] student1 = autoConsumer.receive().getData();
> > // this is Student use version1, may be data from kafka
> > byte[] student2 = autoConsumer.receive().getData();
> > // send student with version0 schema date
> > p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> > .withJsonDef("student with version0 json def").build(
> > .value(student1).send();
> >
> > // send student with version1 schema date
> > p.newMessage(Schema.AUTO_PRODUCE_BYTES(Schema.AVRO(SchemaDefinition.builder()
> > .withJsonDef("student with version1 json def").build(
> > .value(student1).send();
> >
> > >
> > > But with AUTO_PRODUCE schema, the precondition is that we have a topic
> > > that has messages of these two schemas.
> > >
> > > For example, there is a `bytes-topic` without schema that has two 
> > > messages:
> > > - msg0: Serialized from `new Student("abc")` (schema v0)
> > > - msg1: Serialized from `new Student("abc", 1)` (schema v1)
> > >
> > > Then you can consume these bytes, and send the messages to **a topic
> > > that has registered a schema**.
> > > - If the schema is v0, it's okay to send msg0 and msg1 to the topic.
> > > But the msg1 will lose some bytes because the schema v0 doesn't have
> > > the `age` field.
> > > - If the schema is v1, msg0 cannot be sent because msg0 doesn't have
> > > the `age` field.
> > >
> > > So which schema did you expect for this topic?
> > if you use AUTO_PRODUCE_BYTES, the message will have the correct schema 
> > version.
> > link code: 
> > https://github.com/apache/pulsar/blob/4129583c418dd68f8303dee601132e2910cdf8e6/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L718-L746
> >
> > the msg0 will be sent with schema v0
> > this msg1 will be sent with schema v1
> > >
> > > This example also shows AUTO_PRODUCE schema performs validation at
> > > producer side.
> > >
> > > However, if we just send msg0 and msg1 to a topic without schema. Then
> > > it will be consumer's responsibility to determine whether the received
> > > message is valid.
> > >
> > > ```java
> > > var bytes = consumer.receive(); // bytes
> > > var student = Schema.AVRO(Student.class).decode(bytes);
> > > ```
> > >
> > > - If the `Student` is v0, msg0 and msg1 can be decoded successfully.
> > > - If the `Student` is v1, decoding msg0 will throw an exception.
> > >
> > > Since all messages are stored in the topic, the downstream side
> > > (consumer) can catch the exception to discard the bytes without the
> > > expected schema.
> > >
> > > But if the validation fails at the producer side, there is a chance
> > > that msg0 is lost. In addition, let's see the producer and consumer
> > > code in this case.
> > >
> > > ```
> > > producer.send(msg0); // validation happens at the producer side
> > > ```
> > >
> > > ```
> > > var msg = consumer.receive();
> > > var student = msg.getValue(); // validation happens again, though it
> > > has already been validated before
> > > ```
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Wed, Dec 14, 2022 at 3:11 PM 丛搏  wrote:
> > > >
> > > > >
> > > > > > the user only creates one producer to send all Kafka topic data, if
> > > > > using Pulsar schema, the user needs to create all schema producers in
> > > > > a map
> > > > >
> > > > > It doesn't make sense to me. If the source topic has messages of
> > > > > multiple schemas, why did you try to sink them into the same topic
> > > > > with a schema? The key point of AUTO_PRODUCE schema is to download the
> > > > > schema to validate the source messages. But if the schema of the topic
> > > > > evolved, the left messages from the source topic could not be sent to
> > >

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

2022-12-14 Thread 丛搏

-1 (non-binding)
sorry, I have one question about the BatchMessageId compareTo()
method. the discussion mail :
https://lists.apache.org/thread/8n3oyk2hdsskkotnj4lnlvfnndctpqbg.
I hope it can be this issue can be discussed clearly.

I hope it can be this issue can be discussed clearly. I will retry to
vote until this issue clearly :


Thanks,
Bo

丛搏  于2022年12月14日周三 22:56写道：
>
> +1 (non-binding)
>
> Thanks,
> Bo
>
> PengHui Li  于2022年12月14日周三 19:12写道：
> >
> > +1 (binding)
> >
> > - Penghui
> >
> > On Sun, Dec 11, 2022 at 6:36 AM Enrico Olivelli  wrote:
> >
> > > +1 (binding)
> > >
> > > Enrico
> > >
> > > Il Ven 9 Dic 2022, 10:41 Jiaqi Shen  ha scritto:
> > >
> > > > +1(non-binding)
> > > >
> > > > Thanks,
> > > > Jiaqi Shen
> > > >
> > > >
> > > >  于2022年12月5日周一 15:23写道：
> > > >
> > > > > +1(non-binding)
> > > > >
> > > > > Best,
> > > > > Mattison
> > > > > On Dec 5, 2022, 15:09 +0800, Zike Yang , wrote:
> > > > > > +1(non-binding)
> > > > > >
> > > > > > Best,
> > > > > > Zike Yang
> > > > > >
> > > > > > On Mon, Dec 5, 2022 at 2:41 PM Baodi Shi
> > >  > > > >
> > > > > wrote:
> > > > > > >
> > > > > > > +1(non-binding)
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Baodi Shi
> > > > > > >
> > > > > > > > > 2022年12月5日 12:51，Yunze Xu  写道：
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I'm starting the vote for PIP-224: Introduce TopicMessageId 
> > > > > > > > > for
> > > > > > > > > consumer's MessageId related APIs:
> > > > > > > > > https://github.com/apache/pulsar/issues/18616
> > > > > > > > >
> > > > > > > > > Here is the discussion thread:
> > > > > > > > >
> > > https://lists.apache.org/thread/jhqy65cdyxzmmxnfsjm8rv9pbk76noxy
> > > > > > > > >
> > > > > > > > > The vote will be open for at least 3 days.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yunze
> > > > > > >
> > > > >
> > > >
> > >

[DISCUSS] Release Pulsar 2.10.3 cherry-pick done

2022-12-14 Thread Xiangying Meng

Hello, Pulsar community:

The cherry-pick of 2.10.3 is basically completed.
Contains 126 PRs.
link:
https://github.com/apache/pulsar/pulls?q=is%3Amerged+is%3Apr+label%3Arelease%2F2.10.3+

If you still have a pr that must be released in pulsar-2.10.3, please reply
to me or ping me on GitHub.

Thanks,
Xiangying

[ANNOUNCE] Apache Pulsar Client C++ 3.1.0 released

2022-12-14 Thread Zike Yang

The Apache Pulsar team is proud to announce Apache Pulsar Client C++
version 3.1.0.

Pulsar is a highly scalable, low latency messaging platform running on
commodity hardware. It provides simple pub-sub semantics over topics,
guaranteed at-least-once delivery of messages, automatic cursor management for
subscribers, and cross-datacenter replication.

For Pulsar Client C++ release details and downloads, visit:
https://archive.apache.org/dist/pulsar/pulsar-client-cpp-3.1.0/

Release Notes are at:
https://pulsar.apache.org/release-notes/versioned/client-cpp-3.1.0/

We would like to thank the contributors that made the release possible.

Regards,
The Pulsar Team

Re: [VOTE] Pulsar Release 2.11.0 Candidate-2

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

Re: [DISCUSS] PIP-228: Refactor the information architecture of Pulsar client docs

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

[GitHub] [pulsar] tisonkun added a comment to the discussion: [Bug] OOM on running pulsar-perf

[GitHub] [pulsar] Amiaozhou edited a discussion: OOM on running pulsar-perf

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

Re: [DISCUSS] Modify MessageIdImpl and BatchMessageIdImpl compareTo(MessageId o) method

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

[VOTE] Pulsar Client Python Release 3.0.0 Candidate 2

Re: [VOTE] Pulsar Client Python Release 3.0.0 Candidate 2

Re: [VOTE] Pulsar Client Python Release 3.0.0 Candidate 2

Re: [VOTE] Pulsar Release 2.9.4 Candidate 3

Re: [DISCUSSIONS] Should we use AUTO_PRODUCE schema?

Re: [VOTE] PIP-224: Introduce TopicMessageId for consumer's MessageId related APIs

[DISCUSS] Release Pulsar 2.10.3 cherry-pick done

[ANNOUNCE] Apache Pulsar Client C++ 3.1.0 released

20 matches

Site Navigation

Mail list logo

Footer information