Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

SiNan Liu Sun, 04 Jun 2023 02:07:12 -0700

Hi, all.

Are there any other additions and modifications to this PIP?


Thanks,
sinan


SiNan Liu <liusinan1...@gmail.com> 于2023年3月26日周日 14:25写道：

> *Hi, Asaf! This week I in the research about `SchemaRegistryServiceImpl`
> optimization, sorry to reply later.*
>
> 1.
>
>> Well, this is exactly the argument we had in this thread. This is why you
>> make 2 implementation classes ( I called them v1, v2 if you remember). The
>> user can choose between them. This is exactly while I liked the _v1 _v2
>> naming, so they will know they are not backward compatible.
>> Also, you're talking about improving right? Checking the message name is
>> outright wrong, so you're actually fixing a big mistake someone made back
>> then.
>
>
> This is just a very simple check the root message name of the two proto,
> does it really matter?
> This PIP is a more stringent check, so the previous implementation rules
> for checking the root message are also needed.
> This is a very small problem, and I don't think it's necessary to go into
> such a discussion.
>
>
>
> 2.
>
>> Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
>> --> `int foo = 2;` : same name, same type, different field number -->
>> fail
>> validation.
>> So it doesn't prove anything, or help this argument.
>
>
> If I have in readSchema
>> int32 justAnID = 1
>> int32 customerId = 2
>> and in writeSchema I have
>> int32 justAnID = 1
>> string customerId = 3
>> This is valid.
>> You will fail this validation check, since customerId has different field
>> numbers (2 and 3), but I decided to remove field number 2 and add a new
>> field with the same name, different type, with field number 3. When I
>> read,
>> field number 2 will get default value of 0, and ignore field number 3.
>> Valid.
>> If you compare by field type, then `int32 customerId = 2` to `int32
>> customerId = 3`, would fail, since that looks super strange right? Why
>> remove and add the same field exactly?
>
>
> If there exists a field named customerId with number 2 in readSchema and
> there also exists a field named customerId with number 3 in writeSchema,
> then the two schemas are incompatible.
> Because when reading data encoded with writeSchema, the reader will try to
> get the customerId with number 2, but it's actually number 3, which will
> cause the read to fail.
>
> > > (4) The writtenSchema can not change the field name of any field in
>> > > > readSchema (the field number is the same, but the field name is
>> > > > different).
>> > > This is incorrect.
>> > > Fields names are encoded into the wire. I don't see this in any best
>> > > practice.
>
>
> If the writer and reader use the same field number, Protobuf will
> correctly serialize and deserialize the data, even if the field names are
> different, because serialization and deserialization use the field number
> to determine the field.
> *So you're right. We can rename it.*
>
> *Name renaming of the same number is compatible. *
> *However, the numbering of the fields is different, which will result in a
> mismatch between the field number when the message is written and the field
> number when the message is read, *
> *resulting in the data not being parsed correctly when the message is
> read.*
>
>
> 3.
>
>> > > > - The writtenSchema cannot add required fields, but optional or
>> > > > duplicate fields can be added (The field number must be new).
>> > > >
>> > > > That's not true.
>> > > You can have a required field in writeSchema, and not have that
>> field in
>> > > readSchema (based on tag number).
>> >
>> >
>> > The required field must exist. In your case, where does readSchema go
>> to
>> > read its required fields? It's not in writtenSchema.
>> > The second and third sentences in the official website say, add and
>> delete
>> > do not operate required field!
>
>
> Just like on the official website about why not add the required field.
> You can check it out.
>
> https://protobuf.dev/programming-guides/dos-donts/#dont-add-a-required-field
>
> Required field to be so harmful that it's being removed.
>
>
> 4.
>
>> > > So if my write schema is
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > and my read schema is
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > You can see I removed a field from writeSchema that does not have a
>> > default
>> > > value.
>> > > First, I can read it without any problems: Field 2 might exists, no
>> > > problem. Field 3 might exists, no problem. I ignore field 1.
>> > > So the validation you wrote is no correct at all, without any regard
>> to
>> > > optional value.
>> >
>> >
>> > *It looks like you've added a required field here, which is not
>> allowed (in
>> > 3 i am said that).*
>> > *It also uses the number 1 of the deleted field.*
>> >
>> > I removed the required field "required string query = 1" from
>> writeSchema
>> compared with readSchema.
>> Of course you can say I added "required string query = 1" to writeSchema
>> compared with readSchema.
>> The main question: can I use readSchema to read message written by
>> writeSchema?
>> Yes I can.
>> I wrote why above.
>> I didn't use the number of 1 of deleted field. The read schema is using
>> number 2 and 3.
>
>
> This is a problem with 3 and does not allow the required field to be added.
>
>
> 5.
>
>> > What if I do the other way around?
>> > > My write schema is:
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > My read schema is:
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > Here I will fail, since it is required I will provide value for
>> field 1
>> > but
>> > > alas it will never be there.
>> >
>> >
>> > Yes, the required field has been removed, which is incompatible.
>> > The check for changes to the required field starts at the beginning, at
>> > which point an incompatible exception has been thrown and the following
>> > compatibility check is not performed.
>> >
>> You can't say "The required field " - it's plain wrong.
>> Look at your rule:
>> The writtenSchema cannot remove required fields in the readSchema.
>> You can say: the writeSchema has removed a required field compared with
>> readSchema.
>> This rule will fail the validation.
>> *not* the default rule we're discussing.
>
>
> Here we are talking about changing the default value. But in this case the
> writtenSchema removes the required field so it is not compatible. In
> implementation, the rules check are in order. Changes to the required field
> are checked at the beginning. So it is not necessary to discuss changes to
> the required field until the default value is changed.
>
>
> 6.
>
>> > *It looks like sixth item in PIP should be removed.*
>> > *And Rule 7 in PIP should be removed:*
>> >
>> > Delete rule 6.
>> I would keep rule 7. See:
>>
>> https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
>> It sounds like a good best practice. We can omit it if we don't want to be
>> overly strict.
>
>
> I think we should really keep rule 7(In PIP issue, it is `(5)` now).
> Follow the instructions on the website.
> ```
> (7) There can not be a field which exists both in `readSchema` and
> `writtenSchema`, with same field number, having different default values.
> **This rule applies to proto2. But in proto3, the default value cannot be
> set by yourself! If you use proto3, it will not check for this rule**
> ```
> By the way, changes to the required field are not allowed in `3` and
> should be followed as well.
>
>
> 7.
>
>> > > Changing a default value is generally OK, as long as you remember
>> that
>> > > default values are never sent over the wire.
>> > > Thus, if a program receives a message in which a particular field
>> isn’t
>> > > set, the program will see the default value as it was defined in that
>> > > program’s version of the protocol.
>> > > It will NOT see the default value that was defined in the sender’s
>> code.
>> >
>> > This is in the official document 11:
>> > https://protobuf.dev/programming-guides/proto2/#updating
>> >
>> >
>> >
>> So what do you wish to state with this quote?
>
>
> Oh, rule 7 has been retained. Let's ignore this.
>
>
>
> *I also took into account that proto2 was a really problematic version.
> And the pulsar website seems to say that only proto3 is used.*
> https://pulsar.apache.org/docs/next/schema-understand/#struct-schema
> *```*
> *ProtobufNativeSchema is based on protobuf native descriptor.*
> *This allows Pulsar to:*
> *- use native protobuf-v3 to serialize or deserialize data.*
> *- use AutoConsume to deserialize data.*
> *```*
> *In my opinion, there are some differences between proto2 and proto3 check
> rules, and this PIP also supports proto2. So for this PIP, we may need to
> change the description of "ProtoBufNative" on the official website as well.*
>
>
>
> Thanks,
> sinan
>
>
>
>
>
> Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月20日周一 17:46写道：
>
>> On Sun, Mar 19, 2023 at 4:47 PM SiNan Liu <liusinan1...@gmail.com> wrote:
>>
>> > 1.
>> >
>> > > message SearchReq { string query = 1; int32 page_number = 2; int32
>> > > result_per_page = 3;}
>> > > Then second version I use:
>> > > message SearchRequest { string query = 1; int32 page_number = 2;
>> > > int32 result_per_page = 3;}
>> >
>> >
>> > The rule in PIP improve the previous implementation, so the previous
>> > implementation needs to be added.
>> > If the user switches from the previous implementation to the current
>> > implementation of PIP, and the current implementation doesn't check for
>> > root message name changes, isn't that changing the old behavior? This
>> PIP
>> > is to make compatibility checking more stringent.
>> >
>> >
>> Well, this is exactly the argument we had in this thread. This is why you
>> make 2 implementation classes ( I called them v1, v2 if you remember). The
>> user can choose between them. This is exactly while I liked the _v1 _v2
>> naming, so they will know they are not backward compatible.
>>
>> Also, you're talking about improving right? Checking the message name is
>> outright wrong, so you're actually fixing a big mistake someone made back
>> then.
>>
>>
>> >
>> > 2.
>> >
>> > > This is a guideline for a human making the change, not for software
>> > > validation right?
>> > > When you write down the code for doing the comparison to know if the
>> > field
>> > > number has changed, you have to take into account both the field name
>> AND
>> > > field type. If both are equal but the field number is different then
>> fail
>> > > it.
>> >
>> >
>> > I use the field name here to match, the name is the same, but the
>> number is
>> > not the same will be incompatible.
>> >
>> >
>> https://github.com/apache/pulsar/pull/19566/files#diff-14b840259375cc8fcae564586fc1c2a188cb71822ab5995ed6807f7dab30c210R129-R132
>> > PIP description here is the rule, but how to implement it is a matter in
>> > PR. I have considered the example you gave here, which is also
>> incompatible
>> > in this case.
>> >
>> > Your changes are also described here:
>> >
>> >
>> https://stackoverflow.com/questions/65230623/safeness-of-changing-proto-field-number
>> >
>> >
>> This sentence "the PIP description here is the rule, but how to
>> implemented
>> it is a matter in PR" is a completely invalid argument.
>> For compatibility rules you specify the exact validation.
>> We argue here about the rule.
>> You can say in the PIP you will match by name and then in PR match by name
>> and type. You have to be exact in the PIP.
>>
>> Regarding "I have considered the example you gave here, which is also
>> incompatible" - please explain? I find it compatible - meaning you can
>> read
>> using the readSchema message written using writeSchema.
>>
>> Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
>> --> `int foo = 2;` : same name, same type, different field number --> fail
>> validation.
>> So it doesn't prove anything, or help this argument.
>>
>>
>>
>>
>>
>> >
>> > 3.
>> >
>> > > If you'll read sentence 2 from the guide, you'll see they write
>> > > "This means that any messages serialized by code using your “old”
>> message
>> > > format can be parsed by your new generated code, as they won’t be
>> missing
>> > > any required elements."
>> > > This is for the *other side*: when you add a required to readSchema
>> > > compare
>> > > to write schema. This of course is not allowed and won't work.
>> >
>> >
>> > Note that in proto2: **Any new fields that you add should be optional or
>> > repeated.**
>> > https://protobuf.dev/programming-guides/proto2/#updating
>> >
>> > I also stated in PIP that proto3 will not check this rule for required
>> > field changes.
>> > This rule applies to proto2. However, proto3 removes required. If you
>> use
>> > proto3, it will not check for changes to the required field
>> >
>> > I'm talking about proto2.
>>
>> Listen.
>>
>> This is what you wrote in the PIP, ok?
>>
>>
>> >    - The writtenSchema cannot add required fields, but optional or
>> >    duplicate fields can be added (The field number must be new).
>> >
>> > I answered:
>>
>> That's not true.
>> > You can have a required field in writeSchema, and not have that field in
>> > readSchema (based on tag number).
>>
>> I also gave you an example showing what you wrote is not true (see
>> previous
>> mails), in which I add a required field to writeSchema compared with
>> readSchema and I can still read it with readSchema.
>>
>> You mention a link to a guideline saying you can only add optional or
>> repeated without reading the rest of the paragraph and without
>> understanding the protocol buffer encoding and decoding.
>>
>> I'm not sure how to move forward here.
>> This validation rule is invalid.
>>
>>
>>
>>
>> >
>> > 4.
>> >
>> > > My write schema is
>> > > message SearchRequest { string query = 1; int32 page_number = 2;
>> > > int32 result_per_page = 3;}
>> > > my read schema is
>> > > message SearchRequest { string query = 1; int32 page_number = 2;
>> > > int32 resultPerPage = 3;}
>> > > - Non-required fields can be removed, as long as the field number is
>> not
>> > > used again in your updated message type. You may want to rename the
>> field
>> > > instead, perhaps adding the prefix “OBSOLETE_”, or make the field
>> number
>> > > reserved <https://protobuf.dev/programming-guides/proto2/#reserved>,
>> so
>> > > that future users of your .proto can’t accidentally reuse the number.
>> > > First, they are talking about removal of fields. They suggest you
>> rename
>> > > the field to OBSOLETE_myField instead of actually deleting it, to
>> make it
>> > > easier on compatability. They don't talk about rules that forbid you
>> to
>> > > rename a field. It's unrelated.
>> >
>> >
>> > Maybe you are right that changing the field name is compatible.
>> >
>> >
>> https://stackoverflow.com/questions/45431685/protocol-buffer-does-changing-field-name-break-the-message/45431953#comment84548234_45431953
>> >
>> > But it is interesting I tested the use `avro-proto` implementation
>> > `ProtobufSchemaCompatibilityCheck` compatibility check.（
>> >
>> `org.apache.pulsar.broker.service.schema.BaseAvroSchemaCompatibilityTest`）
>> >
>> > private static final String schemaJson1 =
>> >
>> >
>> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
>> > +
>> > ".AvroSchemaCompatibilityCheckTest\",\"fields\":[
>> > {\"name\":\"field1\",\"type\":\"string\"}]}";
>> >
>> > private static final String schemaJson4 =
>> >
>> >
>> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
>> > +
>> >
>> >
>> ".AvroSchemaCompatibilityCheckTest\",\"fields\":[{\"name\":\"field1_v2\",\"type\":\"string\","
>> > +
>> > "\"aliases\":[\"field1\"]}]}";
>> >
>> > schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
>> > SchemaCompatibilityStrategy.BACKWARD);
>> > This is compatible.
>> >
>> > But that's not compatible
>> > schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
>> > SchemaCompatibilityStrategy.FORWARD);
>> >
>> > So, what do you think of AVRO implementation
>> > `ProtobufSchemaCompatibilityCheck`, rules about renaming are wrong is
>> it?
>> > It looks something like this.
>> >
>> >
>> You're asking why Avro doesn't allow when read schema is schema 4, and
>> write schema is 1 and the difference between them is a single field
>> rename?
>> I don't know - need to debug to understand.
>> I do know that Avro encoding is completely different. They don't have a
>> field identifier like protobuf does. They completely rely on the schema of
>> the writer and the reader.
>> But it is besides the point, since they are supposed to allow field rename
>> from my knowledge.
>> You can debug and find out.
>>
>>
>>
>> >
>> > 5.
>> >
>> > > I think my previous explanation in this email reply should be good
>> enough
>> > > to explain why the name is irrelevant.
>> > > If not, please ping me on this.
>> > > Regarding your example, can you quote it? I don't understand the read
>> and
>> > > write schema here.
>> > > Also, regarding message names and why they are irrelevant. Same thing:
>> > Read
>> > > https://protobuf.dev/programming-guides/encoding/
>> > > In there you will see the names are not encoded thus can be changed
>> > freely
>> > > and still successfully decode binary data.
>> >
>> >
>> > Reader.proto
>> > ```protobuf
>> > syntax = "proto2";
>> > package proto.reader;
>> > import "ExternalReader.proto";
>> > option java_package = "org.apache.pulsar.client.schema.proto.reader";
>> > option java_outer_classname = "Reader";
>> >
>> > enum WeekEnum {
>> > ......
>> > }
>> >
>> > message ProtobufMessage {
>> > required string protobufFoo = 1;
>> > required double protobufBar = 2;
>> > }
>> >
>> > message ProtobufSchema {
>> > ......
>> > optional ProtobufMessage messageField = 12;
>> > ......
>> > optional WeekEnum enumField = 15;
>> > }
>> > ```
>> >
>> > WriterWithTypeNameChange.proto:
>> > ```protobuf
>> > syntax = "proto2";
>> > package proto.writerWithTypeNameChange;
>> > import "ExternalReader.proto";
>> > option java_package =
>> > "org.apache.pulsar.client.schema.proto.writerWithTypeNameChange";
>> > option java_outer_classname = "WriterWithTypeNameChange";
>> >
>> > enum WeekEnum {
>> > ......
>> > }
>> >
>> > message ProtobufMessage_V2 {
>> > required string protobufFoo = 1;
>> > required double protobufBar = 2;
>> > }
>> >
>> > message ProtobufSchema {
>> > ......
>> > optional ProtobufMessage_V2 messageField = 12;
>> > ......
>> > optional WeekEnum enumField = 15;
>> > ......
>> > }
>> > ```
>> >
>> > *Doesn't that change the field type? Should that be compatible?*
>> >
>> >
>> It doesn't change the field type since you haven't changed the field types
>> of the fields inside ProtobufMessage - they are still field 1 and field 2
>> double.
>> It is compatible since the message type name is not encoded in the binary.
>>
>>
>> >
>> > 6.
>> >
>> > > So if my write schema is
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > and my read schema is
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > You can see I removed a field from writeSchema that does not have a
>> > default
>> > > value.
>> > > First, I can read it without any problems: Field 2 might exists, no
>> > > problem. Field 3 might exists, no problem. I ignore field 1.
>> > > So the validation you wrote is no correct at all, without any regard
>> to
>> > > optional value.
>> >
>> >
>> > *It looks like you've added a required field here, which is not allowed
>> (in
>> > 3 i am said that).*
>> > *It also uses the number 1 of the deleted field.*
>> >
>> > I removed the required field "required string query = 1" from
>> writeSchema
>> compared with readSchema.
>> Of course you can say I added "required string query = 1" to writeSchema
>> compared with readSchema.
>> The main question: can I use readSchema to read message written by
>> writeSchema?
>> Yes I can.
>> I wrote why above.
>> I didn't use the number of 1 of deleted field. The read schema is using
>> number 2 and 3.
>>
>>
>>
>>
>>
>> > What if I do the other way around?
>> > > My write schema is:
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > My read schema is:
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > Here I will fail, since it is required I will provide value for field
>> 1
>> > but
>> > > alas it will never be there.
>> >
>> >
>> > Yes, the required field has been removed, which is incompatible.
>> > The check for changes to the required field starts at the beginning, at
>> > which point an incompatible exception has been thrown and the following
>> > compatibility check is not performed.
>> >
>>
>> You can't say "The required field " - it's plain wrong.
>>
>> Look at your rule:
>>
>> The writtenSchema cannot remove required fields in the readSchema.
>>
>>
>> You can say: the writeSchema has removed a required field compared with
>> readSchema.
>> This rule will fail the validation.
>> *not* the default rule we're discussing.
>>
>>
>>
>>
>> >
>> > Now let's focus on the optional field, ok?
>> > > Write schema
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > Read schema:
>> > > message SearchRequest { optional string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > When I read, I won't find field number 1 of course, so I will use the
>> > > default value dictated by the type, as I quoted above, which is for
>> > string
>> > > is the empty byte string.
>> >
>> >
>> > *It looks like sixth item in PIP should be removed.*
>> > *And Rule 7 in PIP should be removed:*
>> >
>> > Delete rule 6.
>> I would keep rule 7. See:
>>
>> https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
>> It sounds like a good best practice. We can omit it if we don't want to be
>> overly strict.
>>
>>
>>
>> > > Changing a default value is generally OK, as long as you remember that
>> > > default values are never sent over the wire.
>> > > Thus, if a program receives a message in which a particular field
>> isn’t
>> > > set, the program will see the default value as it was defined in that
>> > > program’s version of the protocol.
>> > > It will NOT see the default value that was defined in the sender’s
>> code.
>> >
>> > This is in the official document 11:
>> > https://protobuf.dev/programming-guides/proto2/#updating
>> >
>> >
>> >
>> So what do you wish to state with this quote?
>>
>>
>> > 7.
>> >
>> > > Read this and then please explain why rule number 3 should not be
>> added:
>> > >
>> > >
>> >
>> https://protobuf.dev/programming-guides/dos-donts/#dont-go-from-repeated-to-scalar
>> >
>> >
>> > It looks like this rule needs to be added.
>> >
>> > Thanks
>>
>>
>> >
>> >
>> > Thanks,
>> > sinan
>> >
>> >
>> >
>> > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月19日周日 19:11写道：
>> >
>> > > Also:
>> > >
>> > > (2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for
>> > protobuf
>> > > > schema compatibility checking.
>> > >
>> > > The root message name is the class name we pass in when we create the
>> > > > producer or consumer. ProtoBuf has many nested messages or
>> > dependencies.
>> > > > The current implementation only checks if the passed class name is
>> the
>> > > > same. It does not check if the fields in the file change in a way
>> that
>> > is
>> > > > compatible with older versions of the schema.
>> > >
>> > >
>> > > You missed the most important thing there: PROTOBUF_NATIVE uses
>> Protobuf
>> > > Descriptor when persisting the schema. It's not using Avro Schema
>> > > definition. As I wrote you in previous email:
>> > >
>> > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
>> > persisting
>> > > > the Protobuf Descriptor and using Protobuf for encoding.
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <liusinan1...@gmail.com>
>> > wrote:
>> > >
>> > > > 1.
>> > > >
>> > > > > > Why? The root message name is not written over the wire to the
>> best
>> > > of
>> > > > my
>> > > > > > knowledge. I haven't found it written in the official doc.
>> > > >
>> > > >
>> > > > The name of the root message check is the rules in the previous
>> > > > `ProtobufNativeSchemaCompatibilityCheck`. Because if the root
>> message
>> > > has a
>> > > > different name, there is no need to check its contents. "Same"
>> schema,
>> > > > their names must be the same.
>> > > >
>> > > >
>> > > > 2.
>> > > >
>> > > > > >The writtenSchema can not change the field number of any field in
>> > > > > readSchema (the > field name is the same, but the field number is
>> > > > > different).
>> > > > > >You have to take into account field type as well when comparing.
>> > > >
>> > > >
>> > > > The first sentence on the website says that the number of fields
>> cannot
>> > > be
>> > > > changed.
>> > > >
>> > > > > Don’t change the field numbers for any existing fields.
>> > > >
>> > > >
>> > > >
>> > > > 3.
>> > > >
>> > > > > > - The writtenSchema cannot add required fields, but optional or
>> > > > > > duplicate fields can be added (The field number must be new).
>> > > > > >
>> > > > > > That's not true.
>> > > > > You can have a required field in writeSchema, and not have that
>> field
>> > > in
>> > > > > readSchema (based on tag number).
>> > > >
>> > > >
>> > > > The required field must exist. In your case, where does readSchema
>> go
>> > to
>> > > > read its required fields? It's not in writtenSchema.
>> > > > The second and third sentences in the official website say, add and
>> > > delete
>> > > > do not operate required field!
>> > > >
>> > > >
>> > > > 4.
>> > > >
>> > > > > (4) The writtenSchema can not change the field name of any field
>> in
>> > > > > > readSchema (the field number is the same, but the field name is
>> > > > > > different).
>> > > > > This is incorrect.
>> > > > > Fields names are encoded into the wire. I don't see this in any
>> best
>> > > > > practice.
>> > > >
>> > > >
>> > > > The third sentence on the website:
>> > > >
>> > > > > You may want to rename the field instead, perhaps adding the
>> prefix
>> > > > > “OBSOLETE_”, or make the field number reserved, so that future
>> users
>> > of
>> > > > > your .proto can’t accidentally reuse the number.
>> > > >
>> > > > If you want to rename a field, or add a new field. To delete with
>> the
>> > new
>> > > > number!
>> > > >
>> > > >
>> > > > 5.
>> > > >
>> > > > > The writtenSchema does not change the field name and number, but
>> it
>> > > does
>> > > > > change the field type.
>> > > > > > Small correction: for the same field number you are not allowed
>> to
>> > > > change
>> > > > > types. Name is irrelevant.
>> > > >
>> > > >
>> > > > Why doesn't the name irrelevant?
>> > > > Here is the change in type, which is the rule stated in
>> Alternatives.
>> > > There
>> > > > is no check here, just a warning to the user.
>> > > > Another change is that the name of enum is changed, or the name of
>> > > MESSAGE
>> > > > is changed, which is the same as the root message name check in 1,
>> is
>> > the
>> > > > change still the same? This is not allowed to change!
>> > > >
>> > > > *Here is example:*
>> > > > readSchema(
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
>> > > > )
>> > > > writtenSchema(
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
>> > > > )
>> > > >
>> > > >
>> > > > 6.
>> > > >
>> > > > > (6) The writtenSchema removes fields that do not have default
>> values
>> > in
>> > > > > > readSchema. Then the schema is incompatible.
>> > > > > Protobuf gives you its own default if you don't supply one. This
>> is
>> > > > > incorrect.
>> > > >
>> > > >
>> > > > (1) This rule only applies if proto2 does not set the default
>> value. If
>> > > > proto3 does not check, the default value will always be there.
>> > > >
>> > > > (2) In PIP issue:
>> > > >
>> > > > > Proto3 canceled the required field, so there is no need to check
>> the
>> > > > > required field. We get the syntax(proto2 or proto3) of proto in
>> the
>> > > code,
>> > > > > and skip the check of the required field if it is proto3. All
>> other
>> > > > > checking rules also apply to proto3.
>> > > >
>> > > >
>> > > > *I made a mistake here. This default value check is not need in
>> > proto3. I
>> > > > will modify the rules later according to your suggestion.*
>> > > >
>> > > > > I would remove the proto2/proto3 sections, since they only differ
>> in
>> > 1
>> > > > > rule, and just mention that distinction inside that rule (less
>> work
>> > for
>> > > > the
>> > > > > reade).
>> > > >
>> > > >
>> > > > (3) And add rules that look like they should be checked:
>> > > >
>> > > > > Rules that you don't have in the doc, but should IMO*
>> > > > > .......
>> > > >
>> > > > There can't be a field in writerSchema, that exists in readerSchema
>> > (tag
>> > > > > number based), which in writerSchema its type is scalar, but in
>> > > > readSchema
>> > > > > its type is scalar, it's repeated but with packed=true.
>> > > >
>> > > >
>> > > > But I don't think rule number three needs to be added.
>> > > >
>> > > >
>> > > > Thanks,
>> > > > sinan
>> > > >
>> > > >
>> > > >
>> > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月14日周二 22:33写道：
>> > > >
>> > > > > Hi Sinan,
>> > > > >
>> > > > > The doc looks much better!
>> > > > >
>> > > > > I have a few additional comments:
>> > > > >
>> > > > > Pasting comment from previous emails:
>> > > > >
>> > > > > Can you convert the code block which is actually a quote in the
>> > > > > beginning of the PIP to something which doesn't require to scroll
>> > > > > horizontally so much?
>> > > > > Use
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>> > > > >
>> > > > > *Validation Rules*
>> > > > >
>> > > > > (1) If the root message names of writtenSchema and readSchema are
>> > > > > > different, then incompatible.
>> > > > >
>> > > > > Why? The root message name is not written over the wire to the
>> best
>> > of
>> > > my
>> > > > > knowledge. I haven't found it written in the official doc.
>> > > > >
>> > > > >
>> > > > > >    - The writtenSchema cannot add required fields, but optional
>> or
>> > > > > >    duplicate fields can be added (The field number must be new).
>> > > > > >
>> > > > > > That's not true.
>> > > > > You can have a required field in writeSchema, and not have that
>> field
>> > > in
>> > > > > readSchema (based on tag number).
>> > > > >
>> > > > > The writtenSchema can not change the field number of any field in
>> > > > > readSchema (the
>> > > > > > field name is the same, but the field number is different).
>> > > > >
>> > > > > You have to take into account field type as well when comparing.
>> > > > >
>> > > > > If I have in readSchema
>> > > > > int32 justAnID = 1
>> > > > > int32 customerId = 2
>> > > > >
>> > > > > and in writeSchema I have
>> > > > > int32 justAnID = 1
>> > > > > string customerId = 3
>> > > > >
>> > > > > This is valid.
>> > > > >
>> > > > > (4) The writtenSchema can not change the field name of any field
>> in
>> > > > > > readSchema (the field number is the same, but the field name is
>> > > > > > different).
>> > > > >
>> > > > > This is incorrect.
>> > > > > Fields names are encoded into the wire. I don't see this in any
>> best
>> > > > > practice.
>> > > > >
>> > > > > ) The writtenSchema does not change the field name and number,
>> but it
>> > > > does
>> > > > > > change the field type.
>> > > > > >
>> > > > > >    - If the field type is ENUM or MESSAGE, the schema is not
>> > > compatible
>> > > > > >    when the type name is changed
>> > > > > >    - If the type of the field is another type. The schemas under
>> > this
>> > > > > >    rule are not incompatible, but warn the user.(There is
>> another
>> > way
>> > > > of
>> > > > > >    testing in PIP issue Alternatives)
>> > > > > >
>> > > > > > Small correction: for the same field number you are not allowed
>> to
>> > > > change
>> > > > > types. Name is irrelevant.
>> > > > >
>> > > > > (6) The writtenSchema removes fields that do not have default
>> values
>> > in
>> > > > > > readSchema. Then the schema is incompatible.
>> > > > >
>> > > > > Protobuf gives you its own default if you don't supply one. This
>> is
>> > > > > incorrect.
>> > > > >
>> > > > >
>> > > > > *Rules that you don't have in the doc, but should IMO*
>> > > > > * There can not be a field which exists both in readSchema and
>> > > > writeSchema,
>> > > > > with same tag number, having different default values
>> > > > > * There can't be a field in writerSchema, that exists in
>> readerSchema
>> > > > (tag
>> > > > > number based), which in writerSchema is repeated and its type is
>> > > Scalar (
>> > > > > https://protobuf.dev/programming-guides/proto/#scalar) but in
>> > > readSchema
>> > > > > it
>> > > > > is not repeated anymore.
>> > > > > * There can't be a field in writerSchema, that exists in
>> readerSchema
>> > > > (tag
>> > > > > number based), which in writerSchema its type is scalar, but in
>> > > > readSchema
>> > > > > its type is scalar, it's repeated but with packed=true.
>> > > > >
>> > > > > *Rules you have , but I would phrase a bit differently*
>> > > > >
>> > > > > I would remove the proto2/proto3 sections, since they only differ
>> in
>> > 1
>> > > > > rule, and just mention that distinction inside that rule (less
>> work
>> > for
>> > > > the
>> > > > > reade).
>> > > > >
>> > > > > * readSchema has a field which doesn't exist in writerSchema
>> (based
>> > on
>> > > > tag
>> > > > > number).
>> > > > >    * Proto v2:
>> > > > >       * That field must be `optional` or `repeated` (must not be
>> > > > > `required`)
>> > > > >    * Proto v3:
>> > > > >       * No problem.
>> > > > > * There can not be a field which exists both in readSchema and
>> > > > writeSchema,
>> > > > > with the same tag number, but having different types.
>> > > > >
>> > > > > *Motivation*
>> > > > >
>> > > > > Basically in the motivation section you want people to understand
>> the
>> > > > > following:
>> > > > >
>> > > > > Pulsar has built-in support for typed messages. It allows
>> specifying
>> > an
>> > > > > encoding scheme and its matching schema.
>> > > > > For example, it supports Avro. You specify a schema for a given
>> > topic,
>> > > > > using Avro Schema Definition (i.e. a JSON describing the schema).
>> > > > Everytime
>> > > > > you produce a message, you first declare the schema definition you
>> > wish
>> > > > to
>> > > > > use for your messages. The message data should be an avro-encoded
>> > > binary
>> > > > > data (which the client in some SDKs helps encode a given
>> > > > > data-structure/object).
>> > > > > The same applies when you consume a message. You first specify the
>> > > schema
>> > > > > you use to read the messages, and the client in some SDKs helps by
>> > > > decoding
>> > > > > the message binary data into an object/data-structure.
>> > > > >
>> > > > > Each time you specify a schema to be used, either by a producer
>> or a
>> > > > > consumer, the schema is persisted in Pulsar and given an
>> increasing
>> > > > version
>> > > > > number. If the schema was the same as the previous version, it is
>> not
>> > > > > saved. When the message is persisted, the version number is
>> encoded
>> > in
>> > > > the
>> > > > > message headers.
>> > > > >
>> > > > > Pulsar provides a very useful feature named Schema Evolution
>> > > > > <
>> > > >
>> > >
>> >
>> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
>> > > > > >.
>> > > > > It allows us to check if a new schema version is compatible with
>> > > previous
>> > > > > versions or versions. When you configure the schema for the topic
>> you
>> > > > > decide the strategy to use for doing the validation check. The
>> > > strategies
>> > > > > validate the following:
>> > > > >
>> > > > >    - BACKWARD strategy
>> > > > >       - A consumer with newSchema can read a message written using
>> > > > >       existingSchema
>> > > > >    - BACKWARD_TRANSITIVE strategy
>> > > > >       - A consumer with newSchema can read messages written using
>> all
>> > > > >       existingSchema
>> > > > >    - FORWARD
>> > > > >       - A consumer with existingSchema can read messages written
>> > using
>> > > > >       newSchema
>> > > > >    - FORWARD_TRANSITIVE
>> > > > >       - A consumer defined with any of the existingSchema can read
>> > > > messages
>> > > > >       written using newSchema
>> > > > >    - FULL
>> > > > >       - A consumer defined with newSchema can read messages
>> written
>> > > using
>> > > > >       existingSchema
>> > > > >       - A consumer defined with existingSchema can read messages
>> > > written
>> > > > >       using newSchema
>> > > > >    - FULL_TRANSITIVE
>> > > > >       - A consumer defined with newSchema can read messages
>> written
>> > > using
>> > > > >       any of the existingSchema
>> > > > >       - A consumer defined with any of the existingSchema can read
>> > > > messages
>> > > > >       written using newSchema
>> > > > >
>> > > > >
>> > > > > Aside from Avro, Pulsar also has two additional supported
>> encodings:
>> > > > > PROTOBUF and PROTOBUF_NATIVE.
>> > > > >
>> > > > > PROTOBUF is a bit strange. It encodes the messages using Protobuf
>> > > > encoding,
>> > > > > but the schema that is persisted to Pulsar is *not* Protobuf
>> > Descriptor
>> > > > as
>> > > > > you would have expected. The saved schema is a translation of the
>> > > > Protobuf
>> > > > > Descriptor to an Avro Schema, so in fact an Avro schema
>> definition is
>> > > > saved
>> > > > > as the schema.
>> > > > >
>> > > > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
>> > > > persisting
>> > > > > the Protobuf Descriptor and using Protobuf for encoding.
>> > > > > The problem is that the authors of PROTOBUF_NATIVE haven't
>> completed
>> > it
>> > > > > fully, and the backward compatibility validation code almost does
>> not
>> > > > > exist: It only checks if the root message name is the same between
>> > > > > versions.
>> > > > >
>> > > > > GOALS
>> > > > > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
>> > > > > functional validation for any of the defined Schema Compatibility
>> > > > > Strategies.
>> > > > > A secondary goal is to allow the user to choose between different
>> > > > > implementations: The new fully functional validation or the
>> existing
>> > > > > barebones validation.
>> > > > >
>> > > > > -------- END
>> > > > >
>> > > > > I'm ok with having links in the Motivation , as *further reading*.
>> > > > > I'm against stacking up work for the reader to go read 5-6
>> different
>> > > > links
>> > > > > just to understand the motivation and background knowledge
>> required
>> > to
>> > > > > understand the feature.
>> > > > >
>> > > > > I'm against putting code in the Motivation. Especially if it is
>> > > supposed
>> > > > to
>> > > > > replace description in plain English making it easy to understand
>> the
>> > > > > design.
>> > > > > Leave the code to the motivation.
>> > > > > Paste code only if you absolutely can't use plain old
>> descriptions to
>> > > > > explain.
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <
>> liusinan1...@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > *I guess that's right, too! *
>> > > > > >
>> > > > > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
>> > > > better,
>> > > > > > because we don't know whether the future will have V2, V3. The
>> > > official
>> > > > > > solution can be called
>> > > > `ProtobufNativeOfficialSchemaCompatibilityCheck`,
>> > > > > or
>> > > > > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck`
>> > > third-party
>> > > > > > solution.
>> > > > > >
>> > > > > > I've updated my design in PIP issue.
>> > > > > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather
>> > than a
>> > > > > > ProtobufNativeSchemaCompatibilityCheck different validator
>> > > > > implementation.
>> > > > > > 2. Remove the 'builder'
>> > > > > > 3. Clarify the relationship between newSchema, existingSchema,
>> and
>> > > > > > writtenSchema in canRead.
>> > > > > >
>> > > > > > Help to see if the description is comprehensive and what changes
>> > and
>> > > > > > improvements need to be made.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > sinan
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月9日周四 17:35写道：
>> > > > > >
>> > > > > > > I like Bo's suggestion - I haven't realized each schema type
>> > > > > > > compatibility check is actually a plugin.
>> > > > > > > It makes sense for any schema type checks to evolve, sometimes
>> > in a
>> > > > > > > non-backward compatible way hence having two plugins like
>> > > > > > > protobufNativeSchemaCompatabilityCheckV1 and then
>> > > > > > > protobufNativeSchemaCompatabilityCheckV2 and then
>> > > > > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bog...@apache.org> wrote:
>> > > > > > >
>> > > > > > > >  Hi siNan:
>> > > > > > > >
>> > > > > > > > From my point of view, it is just a plug-in. I don't think
>> it
>> > is
>> > > > > > > > necessary to add configuration for the plugin.
>> > > > > > > > This is meaningless, and it will increase the difficulty of
>> use
>> > > for
>> > > > > > > users.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > SiNan Liu <liusinan1...@gmail.com> 于2023年3月8日周三 15:54写道：
>> > > > > > > > >
>> > > > > > > > > Hi, bo.
>> > > > > > > > >
>> > > > > > > > > 1. I understand what you say, to develop a new
>> > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather
>> than
>> > > > > > changing
>> > > > > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I
>> > found
>> > > a
>> > > > > few
>> > > > > > > > small
>> > > > > > > > > problems:
>> > > > > > > > >
>> > > > > > > > > （1）ProtobufNativeAdvancedSchemaCompatibilityCheck and
>> > > > > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
>> > > > > PROTOBUF_NATIVE.
>> > > > > > > It
>> > > > > > > > > looks like both checkers are PROTOBUF not using
>> > AVRO-PROTOBUF's
>> > > > > > > "native"
>> > > > > > > > > implementation, which leads to some problems or
>> > "unreasonable"
>> > > > and
>> > > > > > > gives
>> > > > > > > > me
>> > > > > > > > > some extended thinking and questions.
>> > > > > > > > >
>> > > > > > > > `CompatibilityCheck ` its only a plugin.
>> > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later
>> > > leave
>> > > > > the
>> > > > > > > > stage, when
>> `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
>> > > > > > > > stable, we can make it the default Checker.
>> > > > > > > >
>> > > > > > > > It is just a plug-in, users can change it at will and ensure
>> > that
>> > > > it
>> > > > > > > > is used correctly
>> > > > > > > > > （2）In broker.conf
>> > > > > > > > >
>> > > > > > > > > `schemaRegistryCompatibilityCheckers`. If
>> > > > > > > > > ProtobufNativeSchemaCompatibilityCheck and
>> > > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set.
>> This
>> > is
>> > > > > going
>> > > > > > > to
>> > > > > > > > > overwrite each other. Because this is a map:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
>> > > > > > > > >
>> > > > > > > > > ```java
>> > > > > > > > >
>> > > > > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
>> > > > > HashMap<>();
>> > > > > > > > >
>> > > > > > > > > for (String className : checkerClasses) {
>> > > > > > > > >
>> > > > > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
>> > > > > > > > > Reflections.createInstance(className,
>> > > > > > > > >
>> > > > > > > > > SchemaCompatibilityCheck.class,
>> > > > > > > > > Thread.currentThread().getContextClassLoader());
>> > > > > > > > >
>> > > > > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
>> > > > > > > > > schemaCompatibilityCheck);
>> > > > > > > > >
>> > > > > > > > > ```
>> > > > > > > > >
>> > > > > > > > > Is this a big problem or a small one? Is it possible or
>> > > > > unnecessary?
>> > > > > > > > Maybe
>> > > > > > > > > we can write in the documentation that protobufNative
>> > checkers
>> > > > can
>> > > > > > only
>> > > > > > > > > choose one of the two? Why are there two Checkers for
>> > different
>> > > > > > > > > implementations of the same schemaType? Why not the
>> checker
>> > to
>> > > > > create
>> > > > > > > > > different validator, so we don not have to change
>> > > > > > > > > schemaRegistryCompatibilityCheckers.
>> > > > > > > >
>> > > > > > > > users can only use one, not two, which will bring
>> complexity to
>> > > > users
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > （3）And after the update to
>> > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
>> > > > > > > > > Existing topics previously only checked the name of the
>> root
>> > > > > message,
>> > > > > > > not
>> > > > > > > > > the content of protobuf.
>> > > > > > > > >
>> > > > > > > > > What if the user wants both Checkers?
>> > > > > > > > >
>> > > > > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck,
>> affect
>> > > the
>> > > > > > topic
>> > > > > > > > of
>> > > > > > > > > the existing schema?
>> > > > > > > > >
>> > > > > > > > > Older topics still use the old checker, and newer topics
>> or
>> > > > certain
>> > > > > > > older
>> > > > > > > > > topics use the new advancedchecker.
>> > > > > > > > >
>> > > > > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck`
>> stable,
>> > > > > > > > users will not choose
>> `ProtobufNativeSchemaCompatibilityCheck`.
>> > > > > > > > because it not a complete checker.
>> > > > > > > > > （4）So should we have one schemaType for a checker?
>> > > > > > > protobufNativeChecker
>> > > > > > > > > can have as many different implementation classes as
>> > possible.
>> > > > This
>> > > > > > > > > classname configuration in PIP, let's see if it can be
>> set at
>> > > the
>> > > > > > topic
>> > > > > > > > > level. In the current PIP design I just load this
>> parameter
>> > > into
>> > > > > the
>> > > > > > > > > checker when the broker is started and the checkers map is
>> > set
>> > > > up.
>> > > > > > Can
>> > > > > > > I
>> > > > > > > > do
>> > > > > > > > > this in the new normal pr if I want to support topic
>> level?
>> > Or
>> > > > > > perfect
>> > > > > > > it
>> > > > > > > > > here?
>> > > > > > > > >
>> > > > > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType
>> corresponding
>> > > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to
>> be
>> > > more
>> > > > > > > > trouble).
>> > > > > > > > >
>> > > > > > > > > Sorry I can not use the computer and network in the
>> company,
>> > I
>> > > > use
>> > > > > my
>> > > > > > > > > mobile phone to reply to the email, the format may be a
>> bit
>> > > > messy.
>> > > > > > > Please
>> > > > > > > > > understand.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > sinan
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > 丛搏 <bog...@apache.org> 于 2023年3月7日周二 下午11:39写道：
>> > > > > > > > >
>> > > > > > > > > > SiNan Liu <liusinan1...@gmail.com> 于2023年3月7日周二
>> 13:22写道：
>> > > > > > > > > > >
>> > > > > > > > > > > Great to see your comment, bo!
>> > > > > > > > > > >
>> > > > > > > > > > > 1. The first way. The protobuf website has a
>> description
>> > of
>> > > > the
>> > > > > > > > rules,
>> > > > > > > > > > but
>> > > > > > > > > > > no plans to implement them.
>> > > > > > > > > > >
>> https://protobuf.dev/programming-guides/proto/#updating
>> > > > > > > > > >
>> > > > > > > > > > https://groups.google.com/g/protobuf
>> > > > > > > > > > maybe ask here
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > 2. I think this PIP can be divided into two parts.
>> > > > > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
>> > > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the
>> broker
>> > > > > starts.
>> > > > > > > > > > > ValidatorClassName is empty by default, and the
>> > > > implementation
>> > > > > > > > continues
>> > > > > > > > > > as
>> > > > > > > > > > > before, with no change for the user.
>> > > > > > > > > >
>> > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
>> > > > > > `broker.conf`
>> > > > > > > > > > ```
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
>> > > > > > > > > > ```
>> > > > > > > > > > I do not recommend that we directly modify this plugin
>> and
>> > > > > continue
>> > > > > > > to
>> > > > > > > > > > add configuration items, which will cause trouble for
>> > users.
>> > > > > > > > > > We have a lot of configs and it's getting very unwieldy.
>> > > > > > > > > > in my opinion, we don't change
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
>> > > > > > > > > > it is a simple implementation, it doesn't go wrong very
>> > > often,
>> > > > > most
>> > > > > > > > > > users will use it. we can add another
>> ProtobufNativeCheck
>> > > named
>> > > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or
>> other.
>> > > in
>> > > > > this
>> > > > > > > > > > way, we don't need to add this flag. There is no need to
>> > > > consider
>> > > > > > > > > > compatibility, it is just a plug-in and will not affect
>> > > current
>> > > > > > > logic.
>> > > > > > > > > > If the user needs it, just change the plugin to the new
>> > > > > > > implementation
>> > > > > > > > > >
>> > > > > > > > > > > ```java
>> > > > > > > > > > >     ProtobufNativeSchemaValidator DEFAULT =
>> > > (fromDescriptors,
>> > > > > > > > > > toDescriptor)
>> > > > > > > > > > > -> {
>> > > > > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
>> > > > > > > > fromDescriptors) {
>> > > > > > > > > > >             // The default implementation only checks
>> if
>> > > the
>> > > > > root
>> > > > > > > > message
>> > > > > > > > > > > has changed.
>> > > > > > > > > > >             if
>> > > > > > > > > > >
>> > > > > >
>> (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
>> > > > > > > {
>> > > > > > > > > > >                 throw new
>> > > > > ProtoBufCanReadCheckException("Protobuf
>> > > > > > > > root
>> > > > > > > > > > > message isn't allow change!");
>> > > > > > > > > > >             }
>> > > > > > > > > > >         }
>> > > > > > > > > > >     };
>> > > > > > > > > > > ```
>> > > > > > > > > > > `ValidatorClassName` value also can be set to the
>> current
>> > > > > > > > implementation
>> > > > > > > > > > of
>> > > > > > > > > > > PIP add
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
>> > > > > > > > > > >
>> > > > > > > > > > > (2) Recoding the
>> > `ProtobufNativeSchemaCompatibilityCheck`.
>> > > > > > Through
>> > > > > > > > the
>> > > > > > > > > > flag
>> > > > > > > > > > > (`ValidatorClassName`) to build different
>> > > > > > > > > > `ProtobufNativeSchemaValidator`.
>> > > > > > > > > > > Isn't it just a plug-in? The user can develop and
>> choose
>> > a
>> > > > > > > different
>> > > > > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't
>> change
>> > > the
>> > > > > > > logic,
>> > > > > > > > it
>> > > > > > > > > > > just allowed him to expand it.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > I think this PIP should be an enhancement and
>> supplement
>> > to
>> > > > the
>> > > > > > > > function,
>> > > > > > > > > > > and there is no such thing as unnecessary and
>> > meaningless.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks,
>> > > > > > > > > > > sinan
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > 丛搏 <bog...@apache.org> 于2023年3月7日周二 11:53写道：
>> > > > > > > > > > >
>> > > > > > > > > > > > I think we have two ways to do that.
>> > > > > > > > > > > >
>> > > > > > > > > > > > First way: We need to advance the improvement of
>> java
>> > in
>> > > > > > > protobuf.
>> > > > > > > > Ask
>> > > > > > > > > > > > if they have plans to improve.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Second way: the new PROTOBUF_NATIVE
>> > > > > `SchemaCompatibilityCheck`
>> > > > > > > > should
>> > > > > > > > > > > > be implemented as a plugin, don't change any
>> existing
>> > > > plugin
>> > > > > > > logic
>> > > > > > > > > > > > (it's simple and already used). I don't recommend
>> > adding
>> > > > > flags
>> > > > > > > for
>> > > > > > > > > > > > rollback, it adds configuration and makes little
>> sense.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > Bo
>> > > > > > > > > > > >
>> > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月6日周一
>> > > 23:00写道：
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Can you convert the code block which is actually a
>> > > quote
>> > > > in
>> > > > > > the
>> > > > > > > > > > > > > beginning of the PIP to something which doesn't
>> > require
>> > > > to
>> > > > > > > scroll
>> > > > > > > > > > > > > horizontally so much?
>> > > > > > > > > > > > > Use
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Let's improve the clarity of what you wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > "the PROTOBUF uses avro struct to store."
>> > > > > > > > > > > > > -->
>> > > > > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client
>> > > assumes
>> > > > > the
>> > > > > > > > object
>> > > > > > > > > > given
>> > > > > > > > > > > > > to it as message data is an auto-generated POJO
>> > > > containing
>> > > > > > the
>> > > > > > > > > > > > annotations
>> > > > > > > > > > > > > encoding the schema. The client is using a
>> converter,
>> > > > which
>> > > > > > > > converts
>> > > > > > > > > > a
>> > > > > > > > > > > > > Protobuf schema descriptor into an Avro schema and
>> > > sends
>> > > > > that
>> > > > > > > as
>> > > > > > > > the
>> > > > > > > > > > > > Schema
>> > > > > > > > > > > > > of the producer/consumer.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > "On the broker side, protobuf and avro both use
>> > > > SchemaData
>> > > > > > > > converted
>> > > > > > > > > > to
>> > > > > > > > > > > > > org.apache.avro.Schema."
>> > > > > > > > > > > > > -->
>> > > > > > > > > > > > > Since the schema is an Avro schema, the
>> > implementation
>> > > of
>> > > > > > > > > > compatibility
>> > > > > > > > > > > > > check on the broker side is to simply re-use the
>> > > > > > compatibility
>> > > > > > > > check
>> > > > > > > > > > of
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > AVRO schema type.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > "ProtobufSchema is different from
>> > ProtobufNativeSchema
>> > > in
>> > > > > > > schema
>> > > > > > > > > > > > > compatibility check it uses avro-protobuf.
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
>> > > > > > > > > > > > > But the current implementation of ProtobufNative
>> > schema
>> > > > > > > > compatibility
>> > > > > > > > > > > > > check only
>> > > > > > > > > > > > > checked if the root message name is changed."
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > -->
>> > > > > > > > > > > > > PROTOBUF_NATIVE schema type is different.
>> > > > > > > > > > > > > The client is actually using Protobuf Descriptor
>> as
>> > the
>> > > > > > schema,
>> > > > > > > > as
>> > > > > > > > > > > > opposed
>> > > > > > > > > > > > > to Avro schema of PROTOBUF schema type. In the
>> > broker,
>> > > > the
>> > > > > > > > > > > > PROTOBUF_NATIVE
>> > > > > > > > > > > > > compatibility check actually hasn't implemented
>> any
>> > > rule,
>> > > > > > > besides
>> > > > > > > > > > one:
>> > > > > > > > > > > > > checking if the root message name has changed.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >    1. For now, there is no official or
>> third-party
>> > > > > solution
>> > > > > > > for
>> > > > > > > > > > > > ProtoBuf
>> > > > > > > > > > > > > >    compatibility. If in the future have better
>> > > > solutions
>> > > > > > of a
>> > > > > > > > third
>> > > > > > > > > > > > party or
>> > > > > > > > > > > > > >    the official, we develop new
>> > > > > > ProtobufNativeSchemaValidator
>> > > > > > > > and
>> > > > > > > > > > use,
>> > > > > > > > > > > > so
>> > > > > > > > > > > > > >    add a flag.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Who do you need to make that configurable? Once
>> you
>> > > > > found a
>> > > > > > > > third
>> > > > > > > > > > > > party,
>> > > > > > > > > > > > > just switch to it? Who knows, maybe you never
>> will.
>> > > > > Introduce
>> > > > > > > it
>> > > > > > > > > > when you
>> > > > > > > > > > > > > find it, not now.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > We improve in
>> ProtobufNativeSchemaCompatibilityCheck
>> > > > > > BACKWARD,
>> > > > > > > > > > FORWARD
>> > > > > > > > > > > > > > these strategies. As with the AVRO
>> implementation,
>> > > > > protobuf
>> > > > > > > > > > > > compatibility
>> > > > > > > > > > > > > > checking need implementing the canRead method.
>> > *This
>> > > > will
>> > > > > > > check
>> > > > > > > > > > that
>> > > > > > > > > > > > > > the writtenschema can be read by readSchema.*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I completely disagree.
>> > > > > > > > > > > > > Avro implementation is confusing for our use case.
>> > > Don't
>> > > > > copy
>> > > > > > > > that.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > You have
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > public void checkCompatible(SchemaData from,
>> > SchemaData
>> > > > to,
>> > > > > > > > > > > > > SchemaCompatibilityStrategy strategy)
>> > > > > > > > > > > > >         throws IncompatibleSchemaException {
>> > > > > > > > > > > > >     Descriptor fromDescriptor =
>> > > > > > > > > > > > >
>> > ProtobufNativeSchemaUtils.deserialize(from.getData());
>> > > > > > > > > > > > >     Descriptor toDescriptor =
>> > > > > > > > > > > > >
>> ProtobufNativeSchemaUtils.deserialize(to.getData());
>> > > > > > > > > > > > >     switch (strategy) {
>> > > > > > > > > > > > >         case BACKWARD_TRANSITIVE:
>> > > > > > > > > > > > >         case BACKWARD:
>> > > > > > > > > > > > >         case FORWARD_TRANSITIVE:
>> > > > > > > > > > > > >         case FORWARD:
>> > > > > > > > > > > > >         case FULL_TRANSITIVE:
>> > > > > > > > > > > > >         case FULL:
>> > > > > > > > > > > > >             checkRootMessageChange(fromDescriptor,
>> > > > > > > toDescriptor,
>> > > > > > > > > > > > strategy);
>> > > > > > > > > > > > >             return;
>> > > > > > > > > > > > >         case ALWAYS_COMPATIBLE:
>> > > > > > > > > > > > >             return;
>> > > > > > > > > > > > >         default:
>> > > > > > > > > > > > >             throw new
>> > > > IncompatibleSchemaException("Unknown
>> > > > > > > > > > > > > SchemaCompatibilityStrategy.");
>> > > > > > > > > > > > >     }
>> > > > > > > > > > > > > }
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I would rename :
>> > > > > > > > > > > > > from --> currentSchema
>> > > > > > > > > > > > > to --> newSchema
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Use that switch case and have a method for each
>> like:
>> > > > > > > > > > > > > validateBackwardsCompatibility(currentSchema,
>> > > newSchema)
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I dislike canRead and usage of writtenSchema,
>> since
>> > you
>> > > > > have
>> > > > > > > two
>> > > > > > > > > > > > completely
>> > > > > > > > > > > > > different use cases: from the producing side and
>> the
>> > > > > consumer
>> > > > > > > > side.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > schemaValidatorBuilder
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I dislike this proposal. IMO Avro
>> implementation is
>> > > way
>> > > > > too
>> > > > > > > > > > > > complicated.
>> > > > > > > > > > > > > Why not have a simple function for validation for
>> > each
>> > > > > switch
>> > > > > > > > case
>> > > > > > > > > > above?
>> > > > > > > > > > > > > Why do we need strategy and builder, and all this
>> > > > > complexity?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > *Here are the basic compatibility rules we've
>> > defined:*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > IMO it's impossible to read the validation rules
>> as
>> > you
>> > > > > > > described
>> > > > > > > > > > them.
>> > > > > > > > > > > > > I wrote how they should be structured numerous
>> times
>> > > > above.
>> > > > > > > > > > > > > I can't validate them.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > IMO, the current design is very hard to read.
>> > > > > > > > > > > > > Please try to avoid jumping into code sections.
>> > > > > > > > > > > > > Write a high level design section, in which you
>> > > describe
>> > > > in
>> > > > > > > words
>> > > > > > > > > > what
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > plan to do.
>> > > > > > > > > > > > > Write the validation rules in the structure that
>> is
>> > > easy
>> > > > to
>> > > > > > > > > > understand:
>> > > > > > > > > > > > > rules per each compatibility check, and use proper
>> > > words
>> > > > > > > (current
>> > > > > > > > > > schema,
>> > > > > > > > > > > > > new schema), since new schema can be once used for
>> > read
>> > > > and
>> > > > > > > once
>> > > > > > > > > > used for
>> > > > > > > > > > > > > write.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > In its current form it takes too much time to
>> > > understand
>> > > > > the
>> > > > > > > > design,
>> > > > > > > > > > and
>> > > > > > > > > > > > it
>> > > > > > > > > > > > > shouldn't be the case.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Asaf
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
>> > > > > > > liusinan1...@gmail.com
>> > > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi! I updated the explanation of some things in
>> the
>> > > PIP
>> > > > > > > issue.
>> > > > > > > > And
>> > > > > > > > > > also
>> > > > > > > > > > > > > > added a new “flag” in the conf is used as the
>> > > different
>> > > > > > > > > > > > > > ProtobufNativeSchemaValidator implementation,
>> also
>> > > set
>> > > > > > > > > > > > > > ProtobufNativeSchemaValidator default only check
>> > > > whether
>> > > > > > the
>> > > > > > > > name
>> > > > > > > > > > of
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > root message is the same.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com>
>> 于2023年3月5日周日
>> > > > > 20:21写道：
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
>> > > > > > > > liusinan1...@gmail.com
>> > > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Can you please explain how a Protobuf
>> Schema
>> > > > > > descriptor
>> > > > > > > > can
>> > > > > > > > > > be
>> > > > > > > > > > > > > > > validated
>> > > > > > > > > > > > > > > > > for backward compatibility check using
>> Avro
>> > > based
>> > > > > > > > > > compatibility
>> > > > > > > > > > > > > > rules?
>> > > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro,
>> but
>> > it
>> > > > is
>> > > > > > > > actually a
>> > > > > > > > > > > > > > Protobuf
>> > > > > > > > > > > > > > > > > descriptor?
>> > > > > > > > > > > > > > > > > Is there some translation happening?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 1. *You can take a quick look at the
>> previous
>> > > > design,
>> > > > > > the
>> > > > > > > > > > PROTOBUF
>> > > > > > > > > > > > uses
>> > > > > > > > > > > > > > > > avro struct to store.*
>> > > > > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Ok. So to summarize your code (easier to
>> write it
>> > > > than
>> > > > > > send
>> > > > > > > > > > links):
>> > > > > > > > > > > > > > > * Pulsar Client, when used with Protobuf
>> Schema,
>> > > > > actually
>> > > > > > > > > > converts
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using
>> > code
>> > > > > found
>> > > > > > > > inside
>> > > > > > > > > > Avro
>> > > > > > > > > > > > > > > library) and saves that Avro schema as the
>> > schema.
>> > > > It's
>> > > > > > not
>> > > > > > > > > > saving
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > protobuf descriptor at all. Very confusing I
>> have
>> > > to
>> > > > > add
>> > > > > > -
>> > > > > > > > never
>> > > > > > > > > > > > expected
>> > > > > > > > > > > > > > > that.
>> > > > > > > > > > > > > > > This explains why In the
>> > > > > ProtobufSchemaCompatibilityCheck
>> > > > > > > > they
>> > > > > > > > > > just
>> > > > > > > > > > > > > > extend
>> > > > > > > > > > > > > > > the Avro without doing any translation.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Thanks for that.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Now thatI finally understand this, I can say
>> > that:
>> > > > you
>> > > > > > > *must*
>> > > > > > > > > > explain
>> > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > in the motivation part in your PIP.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro
>> both
>> > > use
>> > > > > > > > `SchemaData`
>> > > > > > > > > > > > > > converted
>> > > > > > > > > > > > > > > > to `org.apache.avro.Schema`.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Actually those links don't really help.
>> > > > > > > > > > > > > > > The main link that helps is:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I'm sorry - I don't understand.
>> > > > > > > > > > > > > > > > > I understand the different compatibility
>> > check
>> > > > > > > > strategies.
>> > > > > > > > > > > > > > > > > If you just spell them out here, then as
>> you
>> > > say,
>> > > > > > just
>> > > > > > > > > > translate
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema
>> and
>> > run
>> > > > the
>> > > > > > > Avro
>> > > > > > > > > > > > > > > > > compatibility validation, no?
>> > > > > > > > > > > > > > > > > I believe the answer is no, since you may
>> > want
>> > > to
>> > > > > > > verify
>> > > > > > > > > > > > different
>> > > > > > > > > > > > > > > things
>> > > > > > > > > > > > > > > > > when it comes to Protobuf, which are
>> > different
>> > > > then
>> > > > > > > Avro.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 1.
>> > > > > > > > > > > > > > > > *ProtobufSchema is different from
>> > > > > ProtobufNativeSchema
>> > > > > > in
>> > > > > > > > that
>> > > > > > > > > > it
>> > > > > > > > > > > > uses
>> > > > > > > > > > > > > > > > avro-protobuf.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
>> > > > > > > > > > > > > > > > *ProtobufNativeSchema needs a native
>> > > compatibility
>> > > > > > check,
>> > > > > > > > but
>> > > > > > > > > > > > there is
>> > > > > > > > > > > > > > no
>> > > > > > > > > > > > > > > > official or third party implementation. So
>> this
>> > > PIP
>> > > > > > does
>> > > > > > > > not
>> > > > > > > > > > use
>> > > > > > > > > > > > > > > > avro-protobuf for protobuf compatibility
>> > > checking.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 2. *By the way, this is implemented in much
>> the
>> > > > same
>> > > > > > way
>> > > > > > > > that
>> > > > > > > > > > > > Apache
>> > > > > > > > > > > > > > avro
>> > > > > > > > > > > > > > > > does compatibility checking.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
>> > > > > > > > > > > > > > > >
>> > > > > > > `canReadStrategy`，`canBeReadStrategy`，`mutualReadStrategy`
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
>> > > > > > > > > > > > > > > > *In `ValidateMutualRead.java`, the
>> arguments of
>> > > > > > > > `canRead()` are
>> > > > > > > > > > > > > > > > writtenSchema and readSchema. We only need
>> to
>> > > > change
>> > > > > > the
>> > > > > > > > order
>> > > > > > > > > > of
>> > > > > > > > > > > > > > > arguments
>> > > > > > > > > > > > > > > > we pass to `canRead()`.*
>> > > > > > > > > > > > > > > > ```java
>> > > > > > > > > > > > > > > > private void
>> > > > > > validateWithStrategy(Descriptors.Descriptor
>> > > > > > > > > > > > toValidate,
>> > > > > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor)
>> throws
>> > > > > > > > > > > > > > > ProtoBufCanReadCheckException
>> > > > > > > > > > > > > > > > {
>> > > > > > > > > > > > > > > > switch (strategy) {
>> > > > > > > > > > > > > > > > case CanReadExistingStrategy ->
>> > > > > canRead(fromDescriptor,
>> > > > > > > > > > > > toValidate);
>> > > > > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
>> > > > > canRead(toValidate,
>> > > > > > > > > > > > > > fromDescriptor);
>> > > > > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
>> > > > > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
>> > > > > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
>> > > > > > > writtenSchema,
>> > > > > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
>> > > > > > > > > > > > > > ProtoBufCanReadCheckException {
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
>> > > > > > > > > > > > > > > > readSchema);
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > > ```
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I get that you want to take inspiration from
>> the
>> > > > > existing
>> > > > > > > > Avro
>> > > > > > > > > > Schema
>> > > > > > > > > > > > > > > compatibility check, to do your code design.
>> > > > > > > > > > > > > > > I also understand you *won't* use any existing
>> > avro
>> > > > > code
>> > > > > > > for
>> > > > > > > > > > that.
>> > > > > > > > > > > > > > > I also understand, you have to write the
>> > validation
>> > > > > check
>> > > > > > > on
>> > > > > > > > your
>> > > > > > > > > > > > own,
>> > > > > > > > > > > > > > > since there is no 3rd party to explain that.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > The only thing I can't understand are the
>> actual
>> > > > rules
>> > > > > > as I
>> > > > > > > > wrote
>> > > > > > > > > > > > before,
>> > > > > > > > > > > > > > > since they are written confusingly.
>> > > > > > > > > > > > > > > So, I repeat what I asked before:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I think you should structure the validation
>> rules
>> > > > > > > > differently:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > * Backward checks
>> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
>> > schema
>> > > > > used
>> > > > > > by
>> > > > > > > > > > producer
>> > > > > > > > > > > > or
>> > > > > > > > > > > > > > > consumer) and existingSchema (last schema
>> used)
>> > > > > > > > > > > > > > > * Forward
>> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
>> > schema
>> > > > > used
>> > > > > > by
>> > > > > > > > > > producer
>> > > > > > > > > > > > or
>> > > > > > > > > > > > > > > consumer) and existingSchema (last schema
>> used)
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Once that's accomplished I will be able to
>> > > understand
>> > > > > the
>> > > > > > > > > > different
>> > > > > > > > > > > > > > > validation rules for each compatibility check.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com>
>> > 于2023年3月1日周三
>> > > > > > > 21:19写道：
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu
>> <
>> > > > > > > > > > > > liusinan1...@gmail.com>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I read it and they look identical.
>> What's
>> > > the
>> > > > > > > > difference
>> > > > > > > > > > > > between
>> > > > > > > > > > > > > > > > them?
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas
>> are
>> > > all
>> > > > > > > > implemented
>> > > > > > > > > > > > based
>> > > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > > AVRO.
>> > > > > > > > > > > > > > > > > > > What do you mean, they are all
>> > implemented
>> > > > > based
>> > > > > > on
>> > > > > > > > > > Avro? You
>> > > > > > > > > > > > > > mean
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > protobuf schema is converted into an
>> Avro
>> > > > > Schema,
>> > > > > > > and
>> > > > > > > > > > then
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > Avro
>> > > > > > > > > > > > > > > > > > > compatibility validation?
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
>> > > > > > > > > > > > > > > > > > They all extends
>> > > > > > `AvroSchemaBasedCompatibilityCheck`,
>> > > > > > > > the
>> > > > > > > > > > > > > > > > > > `checkCompatible()` is the same
>> > > implementation
>> > > > > with
>> > > > > > > > `AVRO`.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Can you please explain how a Protobuf
>> Schema
>> > > > > > descriptor
>> > > > > > > > can
>> > > > > > > > > > be
>> > > > > > > > > > > > > > > validated
>> > > > > > > > > > > > > > > > > for backward compatibility check using
>> Avro
>> > > based
>> > > > > > > > > > compatibility
>> > > > > > > > > > > > > > rules?
>> > > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro,
>> but
>> > it
>> > > > is
>> > > > > > > > actually a
>> > > > > > > > > > > > > > Protobuf
>> > > > > > > > > > > > > > > > > descriptor?
>> > > > > > > > > > > > > > > > > Is there some translation happening?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > I think you should structure the
>> validation
>> > > > rules
>> > > > > > > > > > differently:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > The Compatibility check strategy is
>> > described
>> > > > on
>> > > > > > the
>> > > > > > > > > > website
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
>> > > > > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy):
>> > > Consumers
>> > > > > > using
>> > > > > > > > > > schema
>> > > > > > > > > > > > V3 can
>> > > > > > > > > > > > > > > > process
>> > > > > > > > > > > > > > > > > > data written by producers using the last
>> > > schema
>> > > > > > > version
>> > > > > > > > > > V2. So
>> > > > > > > > > > > > V2
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
>> > > > > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
>> > > > > Consumers
>> > > > > > > > using
>> > > > > > > > > > the
>> > > > > > > > > > > > last
>> > > > > > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > > > > version V2 can process data written by
>> > > > producers
>> > > > > > > using
>> > > > > > > > a
>> > > > > > > > > > new
>> > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > V3,
>> > > > > > > > > > > > > > > > > > even though they may not be able to use
>> the
>> > > > full
>> > > > > > > > > > capabilities
>> > > > > > > > > > > > of
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2
>> is
>> > > > > > > > "readSchema".
>> > > > > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy):
>> Schemas
>> > are
>> > > > > both
>> > > > > > > > > > backward and
>> > > > > > > > > > > > > > > forward
>> > > > > > > > > > > > > > > > > > compatible.
>> > > > > > > > > > > > > > > > > > Schema can evolve. The old version
>> schema
>> > and
>> > > > the
>> > > > > > new
>> > > > > > > > > > version
>> > > > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > > should
>> > > > > > > > > > > > > > > > > > be well understood.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I'm sorry - I don't understand.
>> > > > > > > > > > > > > > > > > I understand the different compatibility
>> > check
>> > > > > > > > strategies.
>> > > > > > > > > > > > > > > > > If you just spell them out here, then as
>> you
>> > > say,
>> > > > > > just
>> > > > > > > > > > translate
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema
>> and
>> > run
>> > > > the
>> > > > > > > Avro
>> > > > > > > > > > > > > > > > > compatibility validation, no?
>> > > > > > > > > > > > > > > > > I believe the answer is no, since you may
>> > want
>> > > to
>> > > > > > > verify
>> > > > > > > > > > > > different
>> > > > > > > > > > > > > > > things
>> > > > > > > > > > > > > > > > > when it comes to Protobuf, which are
>> > different
>> > > > then
>> > > > > > > Avro.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > At the current state, I can't understand
>> your
>> > > > > design
>> > > > > > at
>> > > > > > > > all.
>> > > > > > > > > > > > Please
>> > > > > > > > > > > > > > > help
>> > > > > > > > > > > > > > > > > clarify that.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > So each strategy should have its own
>> > section.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > The arguments of `canRead()` are
>> > > writtenSchema
>> > > > > and
>> > > > > > > > > > readSchema.
>> > > > > > > > > > > > As
>> > > > > > > > > > > > > > > we've
>> > > > > > > > > > > > > > > > > > just described, we only need to change
>> the
>> > > > order
>> > > > > of
>> > > > > > > > > > arguments
>> > > > > > > > > > > > we
>> > > > > > > > > > > > > > pass
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > `canRead()`.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com>
>> > > > > 于2023年2月27日周一
>> > > > > > > > 20:49写道：
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > And you can see the difference
>> between
>> > > > > ProtoBuf
>> > > > > > > and
>> > > > > > > > > > > > > > > ProtoBufNative:
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > >
>> > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >  I read it and they look identical.
>> > What's
>> > > > the
>> > > > > > > > difference
>> > > > > > > > > > > > between
>> > > > > > > > > > > > > > > > them?
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Current avro,json, and protobuf
>> schemas
>> > are
>> > > > all
>> > > > > > > > > > implemented
>> > > > > > > > > > > > based
>> > > > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > > > > AVRO.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > What do you mean, they are all
>> > implemented
>> > > > > based
>> > > > > > on
>> > > > > > > > > > Avro? You
>> > > > > > > > > > > > > > mean
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > protobuf schema is converted into an
>> Avro
>> > > > > Schema,
>> > > > > > > and
>> > > > > > > > > > then
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > Avro
>> > > > > > > > > > > > > > > > > > > compatibility validation?
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > *Here are the basic compatibility
>> rules
>> > > > we've
>> > > > > > > > defined:*
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I think you should structure the
>> > validation
>> > > > > rules
>> > > > > > > > > > > > differently:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > * Backward checks
>> > > > > > > > > > > > > > > > > > > ** List down rules, where use
>> newSchema
>> > > (the
>> > > > > > schema
>> > > > > > > > used
>> > > > > > > > > > by
>> > > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > > > > or
>> > > > > > > > > > > > > > > > > > > consumer) and existingSchema (last
>> schema
>> > > > used)
>> > > > > > > > > > > > > > > > > > > * Forward
>> > > > > > > > > > > > > > > > > > > ** List down rules, where use
>> newSchema
>> > > (the
>> > > > > > schema
>> > > > > > > > used
>> > > > > > > > > > by
>> > > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > > > > or
>> > > > > > > > > > > > > > > > > > > consumer) and existingSchema (last
>> schema
>> > > > used)
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > So each strategy should have its own
>> > > section.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I'm saying this since you used
>> > > > "writttenSchema"
>> > > > > > > word
>> > > > > > > > but
>> > > > > > > > > > it
>> > > > > > > > > > > > > > > > represents
>> > > > > > > > > > > > > > > > > > > something completely different if it's
>> > > > backward
>> > > > > > or
>> > > > > > > > > > forward
>> > > > > > > > > > > > check.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Once you'll have that structure like
>> > that,
>> > > I
>> > > > > > > > personally
>> > > > > > > > > > will
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > > > able
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > read and understand it.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > The motivation and problem statement
>> are
>> > > now
>> > > > > > good -
>> > > > > > > > > > thanks
>> > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > improving
>> > > > > > > > > > > > > > > > > > > it.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan
>> > Liu <
>> > > > > > > > > > > > > > liusinan1...@gmail.com>
>> > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again.
>> This
>> > > > time
>> > > > > > I've
>> > > > > > > > added
>> > > > > > > > > > > > some
>> > > > > > > > > > > > > > > > background
>> > > > > > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > > > > some explanations.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > The compatibility check rules are
>> > already
>> > > > > > written
>> > > > > > > > in
>> > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > Implementation.
>> > > > > > > > > > > > > > > > > > > > ProtoBufNative implements the same
>> > > canRead
>> > > > > > method
>> > > > > > > > as
>> > > > > > > > > > Apache
>> > > > > > > > > > > > > > Avro.
>> > > > > > > > > > > > > > > > > > > > It does this by checking whether the
>> > > schema
>> > > > > for
>> > > > > > > > > > writing and
>> > > > > > > > > > > > > > > reading
>> > > > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > > > > > compatible. I also indicate whether
>> the
>> > > > > > > > writtenSchema
>> > > > > > > > > > and
>> > > > > > > > > > > > > > > > readSchema of
>> > > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > Backward, Forward, and Full
>> strategies
>> > > are
>> > > > > the
>> > > > > > > old
>> > > > > > > > or
>> > > > > > > > > > the
>> > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > version
>> > > > > > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > > > > the schema.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com>
>> > > > > > > 于2023年2月26日周日
>> > > > > > > > > > 23:24写道：
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a
>> lot
>> > of
>> > > > > > > background
>> > > > > > > > > > > > knowledge,
>> > > > > > > > > > > > > > so
>> > > > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > > > > need
>> > > > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > > add IMO for people to understand
>> it.
>> > > You
>> > > > > > don't
>> > > > > > > > need
>> > > > > > > > > > to
>> > > > > > > > > > > > > > explain
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > entire
>> > > > > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the
>> very
>> > > > least a
>> > > > > > few
>> > > > > > > > > > > > paragraphs
>> > > > > > > > > > > > > > > > detailing
>> > > > > > > > > > > > > > > > > > > all
>> > > > > > > > > > > > > > > > > > > > > you need to know, to put you in
>> > > context:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a
>> > > built-in
>> > > > > > > schema
>> > > > > > > > > > registry
>> > > > > > > > > > > > > > > inside
>> > > > > > > > > > > > > > > > > > Pulsar
>> > > > > > > > > > > > > > > > > > > > >    broker.
>> > > > > > > > > > > > > > > > > > > > >       - Every time the client
>> updates
>> > > the
>> > > > > > > > schema, it
>> > > > > > > > > > > > uploads
>> > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > >       broker. When that happens,
>> it
>> > > has a
>> > > > > > > feature
>> > > > > > > > > > which
>> > > > > > > > > > > > > > > validates
>> > > > > > > > > > > > > > > > if
>> > > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > > > > >       schema version is compatible
>> > with
>> > > > the
>> > > > > > > > previous
>> > > > > > > > > > > > > > versions.
>> > > > > > > > > > > > > > > > There
>> > > > > > > > > > > > > > > > > > > > > are 4 types
>> > > > > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
>> > > > (complete
>> > > > > > and
>> > > > > > > > > > explain
>> > > > > > > > > > > > each
>> > > > > > > > > > > > > > > one
>> > > > > > > > > > > > > > > > > > > briefly)
>> > > > > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema
>> > > registry
>> > > > > > > supports
>> > > > > > > > > > various
>> > > > > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf
>> native,
>> > > ...
>> > > > > > > > (complete
>> > > > > > > > > > the
>> > > > > > > > > > > > > > rest),
>> > > > > > > > > > > > > > > > each
>> > > > > > > > > > > > > > > > > > > > > protocol
>> > > > > > > > > > > > > > > > > > > > >    has a schema which dictates
>> how to
>> > > > > > serialize
>> > > > > > > > and
>> > > > > > > > > > > > > > deserialize
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > message
>> > > > > > > > > > > > > > > > > > > > >    content into typed object.
>> > > > > > > > > > > > > > > > > > > > >    - Explain in short what is
>> > protobuf
>> > > > > native
>> > > > > > > > > > (compare
>> > > > > > > > > > > > > > protobuf
>> > > > > > > > > > > > > > > > > > > > non-native)
>> > > > > > > > > > > > > > > > > > > > >    - Please don't paste code
>> instead
>> > of
>> > > > > > > > explaining.
>> > > > > > > > > > > > > > > > > > > > >       - Explain that protobuf
>> native
>> > > > > current
>> > > > > > > > > > validation
>> > > > > > > > > > > > check
>> > > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > only
>> > > > > > > > > > > > > > > > > > > > >       composed of checking the
>> root
>> > > > message
>> > > > > > > name
>> > > > > > > > is
>> > > > > > > > > > the
>> > > > > > > > > > > > same
>> > > > > > > > > > > > > > > > between
>> > > > > > > > > > > > > > > > > > > > > the current
>> > > > > > > > > > > > > > > > > > > > >       schema version and the new
>> > > version.
>> > > > > > > > > > > > > > > > > > > > >          - Explain briefly what
>> is a
>> > > root
>> > > > > > > message
>> > > > > > > > > > and its
>> > > > > > > > > > > > > > name.
>> > > > > > > > > > > > > > > > > > > > >       - Explain the problem (list
>> > > > > scenarios)
>> > > > > > > > that we
>> > > > > > > > > > have
>> > > > > > > > > > > > > > > because
>> > > > > > > > > > > > > > > > > > > > protobuf
>> > > > > > > > > > > > > > > > > > > > >       native schema only supports
>> > FULL
>> > > > > > > > compatibility
>> > > > > > > > > > > > > > > validation.
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Regarding high level design - as
>> in
>> > > what
>> > > > > you
>> > > > > > > > plan to
>> > > > > > > > > > do.
>> > > > > > > > > > > > > > > > > > > > > I suggest you add "High Level
>> Design"
>> > > and
>> > > > > in
>> > > > > > it
>> > > > > > > > > > detail
>> > > > > > > > > > > > how
>> > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > plan
>> > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > > validate, per protobuf version,
>> per
>> > > > > > > compatibility
>> > > > > > > > > > check
>> > > > > > > > > > > > > > > > (backward,
>> > > > > > > > > > > > > > > > > > > > forward,
>> > > > > > > > > > > > > > > > > > > > > full,...).
>> > > > > > > > > > > > > > > > > > > > > I tried reading the
>> implementation -
>> > > for
>> > > > > me ,
>> > > > > > > > it's
>> > > > > > > > > > all
>> > > > > > > > > > > > over
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > place.
>> > > > > > > > > > > > > > > > > > > > Can
>> > > > > > > > > > > > > > > > > > > > > you please list in order what I
>> wrote
>> > > > > above,
>> > > > > > > and
>> > > > > > > > > > list the
>> > > > > > > > > > > > > > > > validation
>> > > > > > > > > > > > > > > > > > > > rules
>> > > > > > > > > > > > > > > > > > > > > with a good explanation why you
>> > > validate
>> > > > it
>> > > > > > > like
>> > > > > > > > > > that?
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Lastly, one you have all the
>> > validation
>> > > > > rules
>> > > > > > > > clearly
>> > > > > > > > > > > > stated,
>> > > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > can
>> > > > > > > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > > > > > > it to document it properly so
>> users
>> > can
>> > > > > know
>> > > > > > > what
>> > > > > > > > > > > > validation
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > expect.
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Asaf
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM
>> SiNan
>> > > > Liu <
>> > > > > > > > > > > > > > > > liusinan1...@gmail.com>
>> > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the
>> > code
>> > > > and
>> > > > > > > > > > described the
>> > > > > > > > > > > > > > > design
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > improve
>> > > > > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
>> > > > compatibility
>> > > > > > > > checks.
>> > > > > > > > > > You
>> > > > > > > > > > > > can
>> > > > > > > > > > > > > > > have a
>> > > > > > > > > > > > > > > > > > look.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <
>> asaf.mes...@gmail.com
>> > >
>> > > > > > > > 于2023年2月22日周三
>> > > > > > > > > > > > 21:16写道：
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > I read it but you're almost
>> > > directly
>> > > > > > diving
>> > > > > > > > into
>> > > > > > > > > > the
>> > > > > > > > > > > > > > code -
>> > > > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > > > > take
>> > > > > > > > > > > > > > > > > > > > > > me
>> > > > > > > > > > > > > > > > > > > > > > > hours just to reverse engineer
>> > your
>> > > > > > design.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > Can you please include a "High
>> > > Level
>> > > > > > > Design"
>> > > > > > > > > > section
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > > which
>> > > > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > > > > > > explain
>> > > > > > > > > > > > > > > > > > > > > > > how you plan to tackle any
>> issue?
>> > > > > > > > > > > > > > > > > > > > > > > If I can read that section and
>> > > > explain
>> > > > > to
>> > > > > > > > someone
>> > > > > > > > > > > > else
>> > > > > > > > > > > > > > how
>> > > > > > > > > > > > > > > > this
>> > > > > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > > > > > work,
>> > > > > > > > > > > > > > > > > > > > > > > it means the section is
>> complete.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > Let's leave the code to the
>> PRs.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59
>> PM
>> > > SiNan
>> > > > > > Liu <
>> > > > > > > > > > > > > > > > > > liusinan1...@gmail.com>
>> > > > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > Hi all,
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > https://github.com/apache/pulsar/issues/19565
>> > > > > > > > > > > > > > > > > > > > > > .
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > We can talk about the
>> current
>> > > > design
>> > > > > > > here.
>> > > > > > > > > > > > Especially
>> > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > field
>> > > > > > > > > > > > > > > > > > > > > > type
>> > > > > > > > > > > > > > > > > > > > > > > > change check rules, please
>> give
>> > > > your
>> > > > > > > > valuable
>> > > > > > > > > > > > advice.
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > > > > > > > Sinan
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Reply via email to