Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Asaf Mesika Mon, 20 Mar 2023 02:47:08 -0700

On Sun, Mar 19, 2023 at 4:47 PM SiNan Liu <[email protected]> wrote:


> 1.
>
> > message SearchReq { string query = 1; int32 page_number = 2; int32
> > result_per_page = 3;}
> > Then second version I use:
> > message SearchRequest { string query = 1; int32 page_number = 2;
> > int32 result_per_page = 3;}
>
>
> The rule in PIP improve the previous implementation, so the previous
> implementation needs to be added.
> If the user switches from the previous implementation to the current
> implementation of PIP, and the current implementation doesn't check for
> root message name changes, isn't that changing the old behavior? This PIP
> is to make compatibility checking more stringent.
>
>
Well, this is exactly the argument we had in this thread. This is why you
make 2 implementation classes ( I called them v1, v2 if you remember). The
user can choose between them. This is exactly while I liked the _v1 _v2
naming, so they will know they are not backward compatible.

Also, you're talking about improving right? Checking the message name is
outright wrong, so you're actually fixing a big mistake someone made back
then.


>
> 2.
>
> > This is a guideline for a human making the change, not for software
> > validation right?
> > When you write down the code for doing the comparison to know if the
> field
> > number has changed, you have to take into account both the field name AND
> > field type. If both are equal but the field number is different then fail
> > it.
>
>
> I use the field name here to match, the name is the same, but the number is
> not the same will be incompatible.
>
> https://github.com/apache/pulsar/pull/19566/files#diff-14b840259375cc8fcae564586fc1c2a188cb71822ab5995ed6807f7dab30c210R129-R132
> PIP description here is the rule, but how to implement it is a matter in
> PR. I have considered the example you gave here, which is also incompatible
> in this case.
>
> Your changes are also described here:
>
> https://stackoverflow.com/questions/65230623/safeness-of-changing-proto-field-number
>
>
This sentence "the PIP description here is the rule, but how to implemented
it is a matter in PR" is a completely invalid argument.
For compatibility rules you specify the exact validation.
We argue here about the rule.
You can say in the PIP you will match by name and then in PR match by name
and type. You have to be exact in the PIP.

Regarding "I have considered the example you gave here, which is also
incompatible" - please explain? I find it compatible - meaning you can read
using the readSchema message written using writeSchema.

Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
--> `int foo = 2;` : same name, same type, different field number --> fail
validation.
So it doesn't prove anything, or help this argument.





>
> 3.
>
> > If you'll read sentence 2 from the guide, you'll see they write
> > "This means that any messages serialized by code using your “old” message
> > format can be parsed by your new generated code, as they won’t be missing
> > any required elements."
> > This is for the *other side*: when you add a required to readSchema
> > compare
> > to write schema. This of course is not allowed and won't work.
>
>
> Note that in proto2: **Any new fields that you add should be optional or
> repeated.**
> https://protobuf.dev/programming-guides/proto2/#updating
>
> I also stated in PIP that proto3 will not check this rule for required
> field changes.
> This rule applies to proto2. However, proto3 removes required. If you use
> proto3, it will not check for changes to the required field
>
> I'm talking about proto2.

Listen.

This is what you wrote in the PIP, ok?


>    - The writtenSchema cannot add required fields, but optional or
>    duplicate fields can be added (The field number must be new).
>
> I answered:

That's not true.
> You can have a required field in writeSchema, and not have that field in
> readSchema (based on tag number).

I also gave you an example showing what you wrote is not true (see previous
mails), in which I add a required field to writeSchema compared with
readSchema and I can still read it with readSchema.

You mention a link to a guideline saying you can only add optional or
repeated without reading the rest of the paragraph and without
understanding the protocol buffer encoding and decoding.

I'm not sure how to move forward here.
This validation rule is invalid.




>
> 4.
>
> > My write schema is
> > message SearchRequest { string query = 1; int32 page_number = 2;
> > int32 result_per_page = 3;}
> > my read schema is
> > message SearchRequest { string query = 1; int32 page_number = 2;
> > int32 resultPerPage = 3;}
> > - Non-required fields can be removed, as long as the field number is not
> > used again in your updated message type. You may want to rename the field
> > instead, perhaps adding the prefix “OBSOLETE_”, or make the field number
> > reserved <https://protobuf.dev/programming-guides/proto2/#reserved>, so
> > that future users of your .proto can’t accidentally reuse the number.
> > First, they are talking about removal of fields. They suggest you rename
> > the field to OBSOLETE_myField instead of actually deleting it, to make it
> > easier on compatability. They don't talk about rules that forbid you to
> > rename a field. It's unrelated.
>
>
> Maybe you are right that changing the field name is compatible.
>
> https://stackoverflow.com/questions/45431685/protocol-buffer-does-changing-field-name-break-the-message/45431953#comment84548234_45431953
>
> But it is interesting I tested the use `avro-proto` implementation
> `ProtobufSchemaCompatibilityCheck` compatibility check.（
> `org.apache.pulsar.broker.service.schema.BaseAvroSchemaCompatibilityTest`）
>
> private static final String schemaJson1 =
>
> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
> +
> ".AvroSchemaCompatibilityCheckTest\",\"fields\":[
> {\"name\":\"field1\",\"type\":\"string\"}]}";
>
> private static final String schemaJson4 =
>
> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
> +
>
> ".AvroSchemaCompatibilityCheckTest\",\"fields\":[{\"name\":\"field1_v2\",\"type\":\"string\","
> +
> "\"aliases\":[\"field1\"]}]}";
>
> schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
> SchemaCompatibilityStrategy.BACKWARD);
> This is compatible.
>
> But that's not compatible
> schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
> SchemaCompatibilityStrategy.FORWARD);
>
> So, what do you think of AVRO implementation
> `ProtobufSchemaCompatibilityCheck`, rules about renaming are wrong is it?
> It looks something like this.
>
>
You're asking why Avro doesn't allow when read schema is schema 4, and
write schema is 1 and the difference between them is a single field rename?
I don't know - need to debug to understand.
I do know that Avro encoding is completely different. They don't have a
field identifier like protobuf does. They completely rely on the schema of
the writer and the reader.
But it is besides the point, since they are supposed to allow field rename
from my knowledge.
You can debug and find out.



>
> 5.
>
> > I think my previous explanation in this email reply should be good enough
> > to explain why the name is irrelevant.
> > If not, please ping me on this.
> > Regarding your example, can you quote it? I don't understand the read and
> > write schema here.
> > Also, regarding message names and why they are irrelevant. Same thing:
> Read
> > https://protobuf.dev/programming-guides/encoding/
> > In there you will see the names are not encoded thus can be changed
> freely
> > and still successfully decode binary data.
>
>
> Reader.proto
> ```protobuf
> syntax = "proto2";
> package proto.reader;
> import "ExternalReader.proto";
> option java_package = "org.apache.pulsar.client.schema.proto.reader";
> option java_outer_classname = "Reader";
>
> enum WeekEnum {
> ......
> }
>
> message ProtobufMessage {
> required string protobufFoo = 1;
> required double protobufBar = 2;
> }
>
> message ProtobufSchema {
> ......
> optional ProtobufMessage messageField = 12;
> ......
> optional WeekEnum enumField = 15;
> }
> ```
>
> WriterWithTypeNameChange.proto:
> ```protobuf
> syntax = "proto2";
> package proto.writerWithTypeNameChange;
> import "ExternalReader.proto";
> option java_package =
> "org.apache.pulsar.client.schema.proto.writerWithTypeNameChange";
> option java_outer_classname = "WriterWithTypeNameChange";
>
> enum WeekEnum {
> ......
> }
>
> message ProtobufMessage_V2 {
> required string protobufFoo = 1;
> required double protobufBar = 2;
> }
>
> message ProtobufSchema {
> ......
> optional ProtobufMessage_V2 messageField = 12;
> ......
> optional WeekEnum enumField = 15;
> ......
> }
> ```
>
> *Doesn't that change the field type? Should that be compatible?*
>
>
It doesn't change the field type since you haven't changed the field types
of the fields inside ProtobufMessage - they are still field 1 and field 2
double.
It is compatible since the message type name is not encoded in the binary.


>
> 6.
>
> > So if my write schema is
> > message SearchRequest { required string query = 1; optional int32
> > page_number = 2; optional int32 result_per_page = 3;}
> > and my read schema is
> > message SearchRequest { optional int32 page_number = 2; optional
> > int32 result_per_page = 3;}
> > You can see I removed a field from writeSchema that does not have a
> default
> > value.
> > First, I can read it without any problems: Field 2 might exists, no
> > problem. Field 3 might exists, no problem. I ignore field 1.
> > So the validation you wrote is no correct at all, without any regard to
> > optional value.
>
>
> *It looks like you've added a required field here, which is not allowed (in
> 3 i am said that).*
> *It also uses the number 1 of the deleted field.*
>
> I removed the required field "required string query = 1" from writeSchema
compared with readSchema.
Of course you can say I added "required string query = 1" to writeSchema
compared with readSchema.
The main question: can I use readSchema to read message written by
writeSchema?
Yes I can.
I wrote why above.
I didn't use the number of 1 of deleted field. The read schema is using
number 2 and 3.





> What if I do the other way around?
> > My write schema is:
> > message SearchRequest { optional int32 page_number = 2; optional
> > int32 result_per_page = 3;}
> > My read schema is:
> > message SearchRequest { required string query = 1; optional int32
> > page_number = 2; optional int32 result_per_page = 3;}
> > Here I will fail, since it is required I will provide value for field 1
> but
> > alas it will never be there.
>
>
> Yes, the required field has been removed, which is incompatible.
> The check for changes to the required field starts at the beginning, at
> which point an incompatible exception has been thrown and the following
> compatibility check is not performed.
>

You can't say "The required field " - it's plain wrong.

Look at your rule:

The writtenSchema cannot remove required fields in the readSchema.


You can say: the writeSchema has removed a required field compared with
readSchema.
This rule will fail the validation.
*not* the default rule we're discussing.




>
> Now let's focus on the optional field, ok?
> > Write schema
> > message SearchRequest { optional int32 page_number = 2; optional
> > int32 result_per_page = 3;}
> > Read schema:
> > message SearchRequest { optional string query = 1; optional int32
> > page_number = 2; optional int32 result_per_page = 3;}
> > When I read, I won't find field number 1 of course, so I will use the
> > default value dictated by the type, as I quoted above, which is for
> string
> > is the empty byte string.
>
>
> *It looks like sixth item in PIP should be removed.*
> *And Rule 7 in PIP should be removed:*
>
> Delete rule 6.
I would keep rule 7. See:
https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
It sounds like a good best practice. We can omit it if we don't want to be
overly strict.



> > Changing a default value is generally OK, as long as you remember that
> > default values are never sent over the wire.
> > Thus, if a program receives a message in which a particular field isn’t
> > set, the program will see the default value as it was defined in that
> > program’s version of the protocol.
> > It will NOT see the default value that was defined in the sender’s code.
>
> This is in the official document 11:
> https://protobuf.dev/programming-guides/proto2/#updating
>
>
>
So what do you wish to state with this quote?


> 7.
>
> > Read this and then please explain why rule number 3 should not be added:
> >
> >
> https://protobuf.dev/programming-guides/dos-donts/#dont-go-from-repeated-to-scalar
>
>
> It looks like this rule needs to be added.
>
> Thanks


>
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <[email protected]> 于2023年3月19日周日 19:11写道：
>
> > Also:
> >
> > (2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for
> protobuf
> > > schema compatibility checking.
> >
> > The root message name is the class name we pass in when we create the
> > > producer or consumer. ProtoBuf has many nested messages or
> dependencies.
> > > The current implementation only checks if the passed class name is the
> > > same. It does not check if the fields in the file change in a way that
> is
> > > compatible with older versions of the schema.
> >
> >
> > You missed the most important thing there: PROTOBUF_NATIVE uses Protobuf
> > Descriptor when persisting the schema. It's not using Avro Schema
> > definition. As I wrote you in previous email:
> >
> > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> persisting
> > > the Protobuf Descriptor and using Protobuf for encoding.
> > >
> >
> >
> >
> >
> >
> > On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <[email protected]>
> wrote:
> >
> > > 1.
> > >
> > > > > Why? The root message name is not written over the wire to the best
> > of
> > > my
> > > > > knowledge. I haven't found it written in the official doc.
> > >
> > >
> > > The name of the root message check is the rules in the previous
> > > `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message
> > has a
> > > different name, there is no need to check its contents. "Same" schema,
> > > their names must be the same.
> > >
> > >
> > > 2.
> > >
> > > > >The writtenSchema can not change the field number of any field in
> > > > readSchema (the > field name is the same, but the field number is
> > > > different).
> > > > >You have to take into account field type as well when comparing.
> > >
> > >
> > > The first sentence on the website says that the number of fields cannot
> > be
> > > changed.
> > >
> > > > Don’t change the field numbers for any existing fields.
> > >
> > >
> > >
> > > 3.
> > >
> > > > > - The writtenSchema cannot add required fields, but optional or
> > > > > duplicate fields can be added (The field number must be new).
> > > > >
> > > > > That's not true.
> > > > You can have a required field in writeSchema, and not have that field
> > in
> > > > readSchema (based on tag number).
> > >
> > >
> > > The required field must exist. In your case, where does readSchema go
> to
> > > read its required fields? It's not in writtenSchema.
> > > The second and third sentences in the official website say, add and
> > delete
> > > do not operate required field!
> > >
> > >
> > > 4.
> > >
> > > > (4) The writtenSchema can not change the field name of any field in
> > > > > readSchema (the field number is the same, but the field name is
> > > > > different).
> > > > This is incorrect.
> > > > Fields names are encoded into the wire. I don't see this in any best
> > > > practice.
> > >
> > >
> > > The third sentence on the website:
> > >
> > > > You may want to rename the field instead, perhaps adding the prefix
> > > > “OBSOLETE_”, or make the field number reserved, so that future users
> of
> > > > your .proto can’t accidentally reuse the number.
> > >
> > > If you want to rename a field, or add a new field. To delete with the
> new
> > > number!
> > >
> > >
> > > 5.
> > >
> > > > The writtenSchema does not change the field name and number, but it
> > does
> > > > change the field type.
> > > > > Small correction: for the same field number you are not allowed to
> > > change
> > > > types. Name is irrelevant.
> > >
> > >
> > > Why doesn't the name irrelevant?
> > > Here is the change in type, which is the rule stated in Alternatives.
> > There
> > > is no check here, just a warning to the user.
> > > Another change is that the name of enum is changed, or the name of
> > MESSAGE
> > > is changed, which is the same as the root message name check in 1, is
> the
> > > change still the same? This is not allowed to change!
> > >
> > > *Here is example:*
> > > readSchema(
> > >
> > >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
> > > )
> > > writtenSchema(
> > >
> > >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
> > > )
> > >
> > >
> > > 6.
> > >
> > > > (6) The writtenSchema removes fields that do not have default values
> in
> > > > > readSchema. Then the schema is incompatible.
> > > > Protobuf gives you its own default if you don't supply one. This is
> > > > incorrect.
> > >
> > >
> > > (1) This rule only applies if proto2 does not set the default value. If
> > > proto3 does not check, the default value will always be there.
> > >
> > > (2) In PIP issue:
> > >
> > > > Proto3 canceled the required field, so there is no need to check the
> > > > required field. We get the syntax(proto2 or proto3) of proto in the
> > code,
> > > > and skip the check of the required field if it is proto3. All other
> > > > checking rules also apply to proto3.
> > >
> > >
> > > *I made a mistake here. This default value check is not need in
> proto3. I
> > > will modify the rules later according to your suggestion.*
> > >
> > > > I would remove the proto2/proto3 sections, since they only differ in
> 1
> > > > rule, and just mention that distinction inside that rule (less work
> for
> > > the
> > > > reade).
> > >
> > >
> > > (3) And add rules that look like they should be checked:
> > >
> > > > Rules that you don't have in the doc, but should IMO*
> > > > .......
> > >
> > > There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > > > number based), which in writerSchema its type is scalar, but in
> > > readSchema
> > > > its type is scalar, it's repeated but with packed=true.
> > >
> > >
> > > But I don't think rule number three needs to be added.
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > > Asaf Mesika <[email protected]> 于2023年3月14日周二 22:33写道：
> > >
> > > > Hi Sinan,
> > > >
> > > > The doc looks much better!
> > > >
> > > > I have a few additional comments:
> > > >
> > > > Pasting comment from previous emails:
> > > >
> > > > Can you convert the code block which is actually a quote in the
> > > > beginning of the PIP to something which doesn't require to scroll
> > > > horizontally so much?
> > > > Use
> > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > >
> > > > *Validation Rules*
> > > >
> > > > (1) If the root message names of writtenSchema and readSchema are
> > > > > different, then incompatible.
> > > >
> > > > Why? The root message name is not written over the wire to the best
> of
> > my
> > > > knowledge. I haven't found it written in the official doc.
> > > >
> > > >
> > > > >    - The writtenSchema cannot add required fields, but optional or
> > > > >    duplicate fields can be added (The field number must be new).
> > > > >
> > > > > That's not true.
> > > > You can have a required field in writeSchema, and not have that field
> > in
> > > > readSchema (based on tag number).
> > > >
> > > > The writtenSchema can not change the field number of any field in
> > > > readSchema (the
> > > > > field name is the same, but the field number is different).
> > > >
> > > > You have to take into account field type as well when comparing.
> > > >
> > > > If I have in readSchema
> > > > int32 justAnID = 1
> > > > int32 customerId = 2
> > > >
> > > > and in writeSchema I have
> > > > int32 justAnID = 1
> > > > string customerId = 3
> > > >
> > > > This is valid.
> > > >
> > > > (4) The writtenSchema can not change the field name of any field in
> > > > > readSchema (the field number is the same, but the field name is
> > > > > different).
> > > >
> > > > This is incorrect.
> > > > Fields names are encoded into the wire. I don't see this in any best
> > > > practice.
> > > >
> > > > ) The writtenSchema does not change the field name and number, but it
> > > does
> > > > > change the field type.
> > > > >
> > > > >    - If the field type is ENUM or MESSAGE, the schema is not
> > compatible
> > > > >    when the type name is changed
> > > > >    - If the type of the field is another type. The schemas under
> this
> > > > >    rule are not incompatible, but warn the user.(There is another
> way
> > > of
> > > > >    testing in PIP issue Alternatives)
> > > > >
> > > > > Small correction: for the same field number you are not allowed to
> > > change
> > > > types. Name is irrelevant.
> > > >
> > > > (6) The writtenSchema removes fields that do not have default values
> in
> > > > > readSchema. Then the schema is incompatible.
> > > >
> > > > Protobuf gives you its own default if you don't supply one. This is
> > > > incorrect.
> > > >
> > > >
> > > > *Rules that you don't have in the doc, but should IMO*
> > > > * There can not be a field which exists both in readSchema and
> > > writeSchema,
> > > > with same tag number, having different default values
> > > > * There can't be a field in writerSchema, that exists in readerSchema
> > > (tag
> > > > number based), which in writerSchema is repeated and its type is
> > Scalar (
> > > > https://protobuf.dev/programming-guides/proto/#scalar) but in
> > readSchema
> > > > it
> > > > is not repeated anymore.
> > > > * There can't be a field in writerSchema, that exists in readerSchema
> > > (tag
> > > > number based), which in writerSchema its type is scalar, but in
> > > readSchema
> > > > its type is scalar, it's repeated but with packed=true.
> > > >
> > > > *Rules you have , but I would phrase a bit differently*
> > > >
> > > > I would remove the proto2/proto3 sections, since they only differ in
> 1
> > > > rule, and just mention that distinction inside that rule (less work
> for
> > > the
> > > > reade).
> > > >
> > > > * readSchema has a field which doesn't exist in writerSchema (based
> on
> > > tag
> > > > number).
> > > >    * Proto v2:
> > > >       * That field must be `optional` or `repeated` (must not be
> > > > `required`)
> > > >    * Proto v3:
> > > >       * No problem.
> > > > * There can not be a field which exists both in readSchema and
> > > writeSchema,
> > > > with the same tag number, but having different types.
> > > >
> > > > *Motivation*
> > > >
> > > > Basically in the motivation section you want people to understand the
> > > > following:
> > > >
> > > > Pulsar has built-in support for typed messages. It allows specifying
> an
> > > > encoding scheme and its matching schema.
> > > > For example, it supports Avro. You specify a schema for a given
> topic,
> > > > using Avro Schema Definition (i.e. a JSON describing the schema).
> > > Everytime
> > > > you produce a message, you first declare the schema definition you
> wish
> > > to
> > > > use for your messages. The message data should be an avro-encoded
> > binary
> > > > data (which the client in some SDKs helps encode a given
> > > > data-structure/object).
> > > > The same applies when you consume a message. You first specify the
> > schema
> > > > you use to read the messages, and the client in some SDKs helps by
> > > decoding
> > > > the message binary data into an object/data-structure.
> > > >
> > > > Each time you specify a schema to be used, either by a producer or a
> > > > consumer, the schema is persisted in Pulsar and given an increasing
> > > version
> > > > number. If the schema was the same as the previous version, it is not
> > > > saved. When the message is persisted, the version number is encoded
> in
> > > the
> > > > message headers.
> > > >
> > > > Pulsar provides a very useful feature named Schema Evolution
> > > > <
> > >
> >
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> > > > >.
> > > > It allows us to check if a new schema version is compatible with
> > previous
> > > > versions or versions. When you configure the schema for the topic you
> > > > decide the strategy to use for doing the validation check. The
> > strategies
> > > > validate the following:
> > > >
> > > >    - BACKWARD strategy
> > > >       - A consumer with newSchema can read a message written using
> > > >       existingSchema
> > > >    - BACKWARD_TRANSITIVE strategy
> > > >       - A consumer with newSchema can read messages written using all
> > > >       existingSchema
> > > >    - FORWARD
> > > >       - A consumer with existingSchema can read messages written
> using
> > > >       newSchema
> > > >    - FORWARD_TRANSITIVE
> > > >       - A consumer defined with any of the existingSchema can read
> > > messages
> > > >       written using newSchema
> > > >    - FULL
> > > >       - A consumer defined with newSchema can read messages written
> > using
> > > >       existingSchema
> > > >       - A consumer defined with existingSchema can read messages
> > written
> > > >       using newSchema
> > > >    - FULL_TRANSITIVE
> > > >       - A consumer defined with newSchema can read messages written
> > using
> > > >       any of the existingSchema
> > > >       - A consumer defined with any of the existingSchema can read
> > > messages
> > > >       written using newSchema
> > > >
> > > >
> > > > Aside from Avro, Pulsar also has two additional supported encodings:
> > > > PROTOBUF and PROTOBUF_NATIVE.
> > > >
> > > > PROTOBUF is a bit strange. It encodes the messages using Protobuf
> > > encoding,
> > > > but the schema that is persisted to Pulsar is *not* Protobuf
> Descriptor
> > > as
> > > > you would have expected. The saved schema is a translation of the
> > > Protobuf
> > > > Descriptor to an Avro Schema, so in fact an Avro schema definition is
> > > saved
> > > > as the schema.
> > > >
> > > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> > > persisting
> > > > the Protobuf Descriptor and using Protobuf for encoding.
> > > > The problem is that the authors of PROTOBUF_NATIVE haven't completed
> it
> > > > fully, and the backward compatibility validation code almost does not
> > > > exist: It only checks if the root message name is the same between
> > > > versions.
> > > >
> > > > GOALS
> > > > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> > > > functional validation for any of the defined Schema Compatibility
> > > > Strategies.
> > > > A secondary goal is to allow the user to choose between different
> > > > implementations: The new fully functional validation or the existing
> > > > barebones validation.
> > > >
> > > > -------- END
> > > >
> > > > I'm ok with having links in the Motivation , as *further reading*.
> > > > I'm against stacking up work for the reader to go read 5-6 different
> > > links
> > > > just to understand the motivation and background knowledge required
> to
> > > > understand the feature.
> > > >
> > > > I'm against putting code in the Motivation. Especially if it is
> > supposed
> > > to
> > > > replace description in plain English making it easy to understand the
> > > > design.
> > > > Leave the code to the motivation.
> > > > Paste code only if you absolutely can't use plain old descriptions to
> > > > explain.
> > > >
> > > >
> > > >
> > > > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <[email protected]>
> > > wrote:
> > > >
> > > > > *I guess that's right, too! *
> > > > >
> > > > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > better,
> > > > > because we don't know whether the future will have V2, V3. The
> > official
> > > > > solution can be called
> > > `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> > > > or
> > > > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck`
> > third-party
> > > > > solution.
> > > > >
> > > > > I've updated my design in PIP issue.
> > > > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather
> than a
> > > > > ProtobufNativeSchemaCompatibilityCheck different validator
> > > > implementation.
> > > > > 2. Remove the 'builder'
> > > > > 3. Clarify the relationship between newSchema, existingSchema, and
> > > > > writtenSchema in canRead.
> > > > >
> > > > > Help to see if the description is comprehensive and what changes
> and
> > > > > improvements need to be made.
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > >
> > > > > Asaf Mesika <[email protected]> 于2023年3月9日周四 17:35写道：
> > > > >
> > > > > > I like Bo's suggestion - I haven't realized each schema type
> > > > > > compatibility check is actually a plugin.
> > > > > > It makes sense for any schema type checks to evolve, sometimes
> in a
> > > > > > non-backward compatible way hence having two plugins like
> > > > > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > > > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <[email protected]> wrote:
> > > > > >
> > > > > > >  Hi siNan:
> > > > > > >
> > > > > > > From my point of view, it is just a plug-in. I don't think it
> is
> > > > > > > necessary to add configuration for the plugin.
> > > > > > > This is meaningless, and it will increase the difficulty of use
> > for
> > > > > > users.
> > > > > > >
> > > > > > >
> > > > > > > SiNan Liu <[email protected]> 于2023年3月8日周三 15:54写道：
> > > > > > > >
> > > > > > > > Hi, bo.
> > > > > > > >
> > > > > > > > 1. I understand what you say, to develop a new
> > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> > > > > changing
> > > > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I
> found
> > a
> > > > few
> > > > > > > small
> > > > > > > > problems:
> > > > > > > >
> > > > > > > > （1）ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> > > > PROTOBUF_NATIVE.
> > > > > > It
> > > > > > > > looks like both checkers are PROTOBUF not using
> AVRO-PROTOBUF's
> > > > > > "native"
> > > > > > > > implementation, which leads to some problems or
> "unreasonable"
> > > and
> > > > > > gives
> > > > > > > me
> > > > > > > > some extended thinking and questions.
> > > > > > > >
> > > > > > > `CompatibilityCheck ` its only a plugin.
> > > > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later
> > leave
> > > > the
> > > > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > > > > stable, we can make it the default Checker.
> > > > > > >
> > > > > > > It is just a plug-in, users can change it at will and ensure
> that
> > > it
> > > > > > > is used correctly
> > > > > > > > （2）In broker.conf
> > > > > > > >
> > > > > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This
> is
> > > > going
> > > > > > to
> > > > > > > > overwrite each other. Because this is a map:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > > > > >
> > > > > > > > ```java
> > > > > > > >
> > > > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> > > > HashMap<>();
> > > > > > > >
> > > > > > > > for (String className : checkerClasses) {
> > > > > > > >
> > > > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > > > > Reflections.createInstance(className,
> > > > > > > >
> > > > > > > > SchemaCompatibilityCheck.class,
> > > > > > > > Thread.currentThread().getContextClassLoader());
> > > > > > > >
> > > > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > > > > schemaCompatibilityCheck);
> > > > > > > >
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Is this a big problem or a small one? Is it possible or
> > > > unnecessary?
> > > > > > > Maybe
> > > > > > > > we can write in the documentation that protobufNative
> checkers
> > > can
> > > > > only
> > > > > > > > choose one of the two? Why are there two Checkers for
> different
> > > > > > > > implementations of the same schemaType? Why not the checker
> to
> > > > create
> > > > > > > > different validator, so we don not have to change
> > > > > > > > schemaRegistryCompatibilityCheckers.
> > > > > > >
> > > > > > > users can only use one, not two, which will bring complexity to
> > > users
> > > > > > >
> > > > > > > >
> > > > > > > > （3）And after the update to
> > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > > > > Existing topics previously only checked the name of the root
> > > > message,
> > > > > > not
> > > > > > > > the content of protobuf.
> > > > > > > >
> > > > > > > > What if the user wants both Checkers?
> > > > > > > >
> > > > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect
> > the
> > > > > topic
> > > > > > > of
> > > > > > > > the existing schema?
> > > > > > > >
> > > > > > > > Older topics still use the old checker, and newer topics or
> > > certain
> > > > > > older
> > > > > > > > topics use the new advancedchecker.
> > > > > > > >
> > > > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > > > because it not a complete checker.
> > > > > > > > （4）So should we have one schemaType for a checker?
> > > > > > protobufNativeChecker
> > > > > > > > can have as many different implementation classes as
> possible.
> > > This
> > > > > > > > classname configuration in PIP, let's see if it can be set at
> > the
> > > > > topic
> > > > > > > > level. In the current PIP design I just load this parameter
> > into
> > > > the
> > > > > > > > checker when the broker is started and the checkers map is
> set
> > > up.
> > > > > Can
> > > > > > I
> > > > > > > do
> > > > > > > > this in the new normal pr if I want to support topic level?
> Or
> > > > > perfect
> > > > > > it
> > > > > > > > here?
> > > > > > > >
> > > > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be
> > more
> > > > > > > trouble).
> > > > > > > >
> > > > > > > > Sorry I can not use the computer and network in the company,
> I
> > > use
> > > > my
> > > > > > > > mobile phone to reply to the email, the format may be a bit
> > > messy.
> > > > > > Please
> > > > > > > > understand.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > > 丛搏 <[email protected]> 于 2023年3月7日周二 下午11:39写道：
> > > > > > > >
> > > > > > > > > SiNan Liu <[email protected]> 于2023年3月7日周二 13:22写道：
> > > > > > > > > >
> > > > > > > > > > Great to see your comment, bo!
> > > > > > > > > >
> > > > > > > > > > 1. The first way. The protobuf website has a description
> of
> > > the
> > > > > > > rules,
> > > > > > > > > but
> > > > > > > > > > no plans to implement them.
> > > > > > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > > > > > >
> > > > > > > > > https://groups.google.com/g/protobuf
> > > > > > > > > maybe ask here
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker
> > > > starts.
> > > > > > > > > > ValidatorClassName is empty by default, and the
> > > implementation
> > > > > > > continues
> > > > > > > > > as
> > > > > > > > > > before, with no change for the user.
> > > > > > > > >
> > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > > > > `broker.conf`
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > > > > ```
> > > > > > > > > I do not recommend that we directly modify this plugin and
> > > > continue
> > > > > > to
> > > > > > > > > add configuration items, which will cause trouble for
> users.
> > > > > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > > > > in my opinion, we don't change
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > > > > it is a simple implementation, it doesn't go wrong very
> > often,
> > > > most
> > > > > > > > > users will use it. we can add another ProtobufNativeCheck
> > named
> > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other.
> > in
> > > > this
> > > > > > > > > way, we don't need to add this flag. There is no need to
> > > consider
> > > > > > > > > compatibility, it is just a plug-in and will not affect
> > current
> > > > > > logic.
> > > > > > > > > If the user needs it, just change the plugin to the new
> > > > > > implementation
> > > > > > > > >
> > > > > > > > > > ```java
> > > > > > > > > >     ProtobufNativeSchemaValidator DEFAULT =
> > (fromDescriptors,
> > > > > > > > > toDescriptor)
> > > > > > > > > > -> {
> > > > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > > > > fromDescriptors) {
> > > > > > > > > >             // The default implementation only checks if
> > the
> > > > root
> > > > > > > message
> > > > > > > > > > has changed.
> > > > > > > > > >             if
> > > > > > > > > >
> > > > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > > > > {
> > > > > > > > > >                 throw new
> > > > ProtoBufCanReadCheckException("Protobuf
> > > > > > > root
> > > > > > > > > > message isn't allow change!");
> > > > > > > > > >             }
> > > > > > > > > >         }
> > > > > > > > > >     };
> > > > > > > > > > ```
> > > > > > > > > > `ValidatorClassName` value also can be set to the current
> > > > > > > implementation
> > > > > > > > > of
> > > > > > > > > > PIP add
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > > > > >
> > > > > > > > > > (2) Recoding the
> `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > Through
> > > > > > > the
> > > > > > > > > flag
> > > > > > > > > > (`ValidatorClassName`) to build different
> > > > > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > > > > Isn't it just a plug-in? The user can develop and choose
> a
> > > > > > different
> > > > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change
> > the
> > > > > > logic,
> > > > > > > it
> > > > > > > > > > just allowed him to expand it.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think this PIP should be an enhancement and supplement
> to
> > > the
> > > > > > > function,
> > > > > > > > > > and there is no such thing as unnecessary and
> meaningless.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 丛搏 <[email protected]> 于2023年3月7日周二 11:53写道：
> > > > > > > > > >
> > > > > > > > > > > I think we have two ways to do that.
> > > > > > > > > > >
> > > > > > > > > > > First way: We need to advance the improvement of java
> in
> > > > > > protobuf.
> > > > > > > Ask
> > > > > > > > > > > if they have plans to improve.
> > > > > > > > > > >
> > > > > > > > > > > Second way: the new PROTOBUF_NATIVE
> > > > `SchemaCompatibilityCheck`
> > > > > > > should
> > > > > > > > > > > be implemented as a plugin, don't change any existing
> > > plugin
> > > > > > logic
> > > > > > > > > > > (it's simple and already used). I don't recommend
> adding
> > > > flags
> > > > > > for
> > > > > > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Bo
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <[email protected]> 于2023年3月6日周一
> > 23:00写道：
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Can you convert the code block which is actually a
> > quote
> > > in
> > > > > the
> > > > > > > > > > > > beginning of the PIP to something which doesn't
> require
> > > to
> > > > > > scroll
> > > > > > > > > > > > horizontally so much?
> > > > > > > > > > > > Use
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > > > > >
> > > > > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > > > > -->
> > > > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client
> > assumes
> > > > the
> > > > > > > object
> > > > > > > > > given
> > > > > > > > > > > > to it as message data is an auto-generated POJO
> > > containing
> > > > > the
> > > > > > > > > > > annotations
> > > > > > > > > > > > encoding the schema. The client is using a converter,
> > > which
> > > > > > > converts
> > > > > > > > > a
> > > > > > > > > > > > Protobuf schema descriptor into an Avro schema and
> > sends
> > > > that
> > > > > > as
> > > > > > > the
> > > > > > > > > > > Schema
> > > > > > > > > > > > of the producer/consumer.
> > > > > > > > > > > >
> > > > > > > > > > > > "On the broker side, protobuf and avro both use
> > > SchemaData
> > > > > > > converted
> > > > > > > > > to
> > > > > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > > > > -->
> > > > > > > > > > > > Since the schema is an Avro schema, the
> implementation
> > of
> > > > > > > > > compatibility
> > > > > > > > > > > > check on the broker side is to simply re-use the
> > > > > compatibility
> > > > > > > check
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > AVRO schema type.
> > > > > > > > > > > >
> > > > > > > > > > > > "ProtobufSchema is different from
> ProtobufNativeSchema
> > in
> > > > > > schema
> > > > > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > But the current implementation of ProtobufNative
> schema
> > > > > > > compatibility
> > > > > > > > > > > > check only
> > > > > > > > > > > > checked if the root message name is changed."
> > > > > > > > > > > >
> > > > > > > > > > > > -->
> > > > > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > > > > The client is actually using Protobuf Descriptor as
> the
> > > > > schema,
> > > > > > > as
> > > > > > > > > > > opposed
> > > > > > > > > > > > to Avro schema of PROTOBUF schema type. In the
> broker,
> > > the
> > > > > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > > > > compatibility check actually hasn't implemented any
> > rule,
> > > > > > besides
> > > > > > > > > one:
> > > > > > > > > > > > checking if the root message name has changed.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >    1. For now, there is no official or third-party
> > > > solution
> > > > > > for
> > > > > > > > > > > ProtoBuf
> > > > > > > > > > > > >    compatibility. If in the future have better
> > > solutions
> > > > > of a
> > > > > > > third
> > > > > > > > > > > party or
> > > > > > > > > > > > >    the official, we develop new
> > > > > ProtobufNativeSchemaValidator
> > > > > > > and
> > > > > > > > > use,
> > > > > > > > > > > so
> > > > > > > > > > > > >    add a flag.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Who do you need to make that configurable? Once you
> > > > found a
> > > > > > > third
> > > > > > > > > > > party,
> > > > > > > > > > > > just switch to it? Who knows, maybe you never will.
> > > > Introduce
> > > > > > it
> > > > > > > > > when you
> > > > > > > > > > > > find it, not now.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> > > > > BACKWARD,
> > > > > > > > > FORWARD
> > > > > > > > > > > > > these strategies. As with the AVRO implementation,
> > > > protobuf
> > > > > > > > > > > compatibility
> > > > > > > > > > > > > checking need implementing the canRead method.
> *This
> > > will
> > > > > > check
> > > > > > > > > that
> > > > > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I completely disagree.
> > > > > > > > > > > > Avro implementation is confusing for our use case.
> > Don't
> > > > copy
> > > > > > > that.
> > > > > > > > > > > >
> > > > > > > > > > > > You have
> > > > > > > > > > > >
> > > > > > > > > > > > public void checkCompatible(SchemaData from,
> SchemaData
> > > to,
> > > > > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > > > >
> ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > > > > >     switch (strategy) {
> > > > > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > > > > >         case BACKWARD:
> > > > > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > > > > >         case FORWARD:
> > > > > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > > > > >         case FULL:
> > > > > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > > > > toDescriptor,
> > > > > > > > > > > strategy);
> > > > > > > > > > > >             return;
> > > > > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > > > > >             return;
> > > > > > > > > > > >         default:
> > > > > > > > > > > >             throw new
> > > IncompatibleSchemaException("Unknown
> > > > > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > > > > >     }
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > I would rename :
> > > > > > > > > > > > from --> currentSchema
> > > > > > > > > > > > to --> newSchema
> > > > > > > > > > > >
> > > > > > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > > > > > validateBackwardsCompatibility(currentSchema,
> > newSchema)
> > > > > > > > > > > >
> > > > > > > > > > > > I dislike canRead and usage of writtenSchema, since
> you
> > > > have
> > > > > > two
> > > > > > > > > > > completely
> > > > > > > > > > > > different use cases: from the producing side and the
> > > > consumer
> > > > > > > side.
> > > > > > > > > > > >
> > > > > > > > > > > > schemaValidatorBuilder
> > > > > > > > > > > > >
> > > > > > > > > > > > > I dislike this proposal. IMO Avro implementation is
> > way
> > > > too
> > > > > > > > > > > complicated.
> > > > > > > > > > > > Why not have a simple function for validation for
> each
> > > > switch
> > > > > > > case
> > > > > > > > > above?
> > > > > > > > > > > > Why do we need strategy and builder, and all this
> > > > complexity?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > *Here are the basic compatibility rules we've
> defined:*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > IMO it's impossible to read the validation rules as
> you
> > > > > > described
> > > > > > > > > them.
> > > > > > > > > > > > I wrote how they should be structured numerous times
> > > above.
> > > > > > > > > > > > I can't validate them.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > > > > Write a high level design section, in which you
> > describe
> > > in
> > > > > > words
> > > > > > > > > what
> > > > > > > > > > > you
> > > > > > > > > > > > plan to do.
> > > > > > > > > > > > Write the validation rules in the structure that is
> > easy
> > > to
> > > > > > > > > understand:
> > > > > > > > > > > > rules per each compatibility check, and use proper
> > words
> > > > > > (current
> > > > > > > > > schema,
> > > > > > > > > > > > new schema), since new schema can be once used for
> read
> > > and
> > > > > > once
> > > > > > > > > used for
> > > > > > > > > > > > write.
> > > > > > > > > > > >
> > > > > > > > > > > > In its current form it takes too much time to
> > understand
> > > > the
> > > > > > > design,
> > > > > > > > > and
> > > > > > > > > > > it
> > > > > > > > > > > > shouldn't be the case.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi! I updated the explanation of some things in the
> > PIP
> > > > > > issue.
> > > > > > > And
> > > > > > > > > also
> > > > > > > > > > > > > added a new “flag” in the conf is used as the
> > different
> > > > > > > > > > > > > ProtobufNativeSchemaValidator implementation, also
> > set
> > > > > > > > > > > > > ProtobufNativeSchemaValidator default only check
> > > whether
> > > > > the
> > > > > > > name
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > root message is the same.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > sinan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <[email protected]> 于2023年3月5日周日
> > > > 20:21写道：
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > > > > [email protected]
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > > > descriptor
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > > for backward compatibility check using Avro
> > based
> > > > > > > > > compatibility
> > > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but
> it
> > > is
> > > > > > > actually a
> > > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. *You can take a quick look at the previous
> > > design,
> > > > > the
> > > > > > > > > PROTOBUF
> > > > > > > > > > > uses
> > > > > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ok. So to summarize your code (easier to write it
> > > than
> > > > > send
> > > > > > > > > links):
> > > > > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema,
> > > > actually
> > > > > > > > > converts
> > > > > > > > > > > the
> > > > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using
> code
> > > > found
> > > > > > > inside
> > > > > > > > > Avro
> > > > > > > > > > > > > > library) and saves that Avro schema as the
> schema.
> > > It's
> > > > > not
> > > > > > > > > saving
> > > > > > > > > > > the
> > > > > > > > > > > > > > protobuf descriptor at all. Very confusing I have
> > to
> > > > add
> > > > > -
> > > > > > > never
> > > > > > > > > > > expected
> > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > This explains why In the
> > > > ProtobufSchemaCompatibilityCheck
> > > > > > > they
> > > > > > > > > just
> > > > > > > > > > > > > extend
> > > > > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Now thatI finally understand this, I can say
> that:
> > > you
> > > > > > *must*
> > > > > > > > > explain
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both
> > use
> > > > > > > `SchemaData`
> > > > > > > > > > > > > converted
> > > > > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > > > > The main link that helps is:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > > I understand the different compatibility
> check
> > > > > > > strategies.
> > > > > > > > > > > > > > > > If you just spell them out here, then as you
> > say,
> > > > > just
> > > > > > > > > translate
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and
> run
> > > the
> > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > > I believe the answer is no, since you may
> want
> > to
> > > > > > verify
> > > > > > > > > > > different
> > > > > > > > > > > > > > things
> > > > > > > > > > > > > > > > when it comes to Protobuf, which are
> different
> > > then
> > > > > > Avro.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1.
> > > > > > > > > > > > > > > *ProtobufSchema is different from
> > > > ProtobufNativeSchema
> > > > > in
> > > > > > > that
> > > > > > > > > it
> > > > > > > > > > > uses
> > > > > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > > > *ProtobufNativeSchema needs a native
> > compatibility
> > > > > check,
> > > > > > > but
> > > > > > > > > > > there is
> > > > > > > > > > > > > no
> > > > > > > > > > > > > > > official or third party implementation. So this
> > PIP
> > > > > does
> > > > > > > not
> > > > > > > > > use
> > > > > > > > > > > > > > > avro-protobuf for protobuf compatibility
> > checking.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. *By the way, this is implemented in much the
> > > same
> > > > > way
> > > > > > > that
> > > > > > > > > > > Apache
> > > > > > > > > > > > > avro
> > > > > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > > > > >
> > > > > > `canReadStrategy`，`canBeReadStrategy`，`mutualReadStrategy`
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > > > > > `canRead()` are
> > > > > > > > > > > > > > > writtenSchema and readSchema. We only need to
> > > change
> > > > > the
> > > > > > > order
> > > > > > > > > of
> > > > > > > > > > > > > > arguments
> > > > > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > > > > ```java
> > > > > > > > > > > > > > > private void
> > > > > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > > > > toValidate,
> > > > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > > > > case CanReadExistingStrategy ->
> > > > canRead(fromDescriptor,
> > > > > > > > > > > toValidate);
> > > > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> > > > canRead(toValidate,
> > > > > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > > > > writtenSchema,
> > > > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > > > > readSchema);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I get that you want to take inspiration from the
> > > > existing
> > > > > > > Avro
> > > > > > > > > Schema
> > > > > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > > > > I also understand you *won't* use any existing
> avro
> > > > code
> > > > > > for
> > > > > > > > > that.
> > > > > > > > > > > > > > I also understand, you have to write the
> validation
> > > > check
> > > > > > on
> > > > > > > your
> > > > > > > > > > > own,
> > > > > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The only thing I can't understand are the actual
> > > rules
> > > > > as I
> > > > > > > wrote
> > > > > > > > > > > before,
> > > > > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think you should structure the validation rules
> > > > > > > differently:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> schema
> > > > used
> > > > > by
> > > > > > > > > producer
> > > > > > > > > > > or
> > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> schema
> > > > used
> > > > > by
> > > > > > > > > producer
> > > > > > > > > > > or
> > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Once that's accomplished I will be able to
> > understand
> > > > the
> > > > > > > > > different
> > > > > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf Mesika <[email protected]>
> 于2023年3月1日周三
> > > > > > 21:19写道：
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I read it and they look identical. What's
> > the
> > > > > > > difference
> > > > > > > > > > > between
> > > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are
> > all
> > > > > > > implemented
> > > > > > > > > > > based
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > > > What do you mean, they are all
> implemented
> > > > based
> > > > > on
> > > > > > > > > Avro? You
> > > > > > > > > > > > > mean
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > > > Schema,
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > > you
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > > They all extends
> > > > > `AvroSchemaBasedCompatibilityCheck`,
> > > > > > > the
> > > > > > > > > > > > > > > > > `checkCompatible()` is the same
> > implementation
> > > > with
> > > > > > > `AVRO`.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > > > descriptor
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > > for backward compatibility check using Avro
> > based
> > > > > > > > > compatibility
> > > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but
> it
> > > is
> > > > > > > actually a
> > > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think you should structure the validation
> > > rules
> > > > > > > > > differently:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The Compatibility check strategy is
> described
> > > on
> > > > > the
> > > > > > > > > website
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy):
> > Consumers
> > > > > using
> > > > > > > > > schema
> > > > > > > > > > > V3 can
> > > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > > > data written by producers using the last
> > schema
> > > > > > version
> > > > > > > > > V2. So
> > > > > > > > > > > V2
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> > > > Consumers
> > > > > > > using
> > > > > > > > > the
> > > > > > > > > > > last
> > > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > version V2 can process data written by
> > > producers
> > > > > > using
> > > > > > > a
> > > > > > > > > new
> > > > > > > > > > > schema
> > > > > > > > > > > > > > V3,
> > > > > > > > > > > > > > > > > even though they may not be able to use the
> > > full
> > > > > > > > > capabilities
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > > > > > "readSchema".
> > > > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas
> are
> > > > both
> > > > > > > > > backward and
> > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > > > > Schema can evolve. The old version schema
> and
> > > the
> > > > > new
> > > > > > > > > version
> > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > > I understand the different compatibility
> check
> > > > > > > strategies.
> > > > > > > > > > > > > > > > If you just spell them out here, then as you
> > say,
> > > > > just
> > > > > > > > > translate
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and
> run
> > > the
> > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > > I believe the answer is no, since you may
> want
> > to
> > > > > > verify
> > > > > > > > > > > different
> > > > > > > > > > > > > > things
> > > > > > > > > > > > > > > > when it comes to Protobuf, which are
> different
> > > then
> > > > > > Avro.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > At the current state, I can't understand your
> > > > design
> > > > > at
> > > > > > > all.
> > > > > > > > > > > Please
> > > > > > > > > > > > > > help
> > > > > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So each strategy should have its own
> section.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The arguments of `canRead()` are
> > writtenSchema
> > > > and
> > > > > > > > > readSchema.
> > > > > > > > > > > As
> > > > > > > > > > > > > > we've
> > > > > > > > > > > > > > > > > just described, we only need to change the
> > > order
> > > > of
> > > > > > > > > arguments
> > > > > > > > > > > we
> > > > > > > > > > > > > pass
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Asaf Mesika <[email protected]>
> > > > 于2023年2月27日周一
> > > > > > > 20:49写道：
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > And you can see the difference between
> > > > ProtoBuf
> > > > > > and
> > > > > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >  I read it and they look identical.
> What's
> > > the
> > > > > > > difference
> > > > > > > > > > > between
> > > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas
> are
> > > all
> > > > > > > > > implemented
> > > > > > > > > > > based
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > What do you mean, they are all
> implemented
> > > > based
> > > > > on
> > > > > > > > > Avro? You
> > > > > > > > > > > > > mean
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > > > Schema,
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > > you
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > *Here are the basic compatibility rules
> > > we've
> > > > > > > defined:*
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think you should structure the
> validation
> > > > rules
> > > > > > > > > > > differently:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> > (the
> > > > > schema
> > > > > > > used
> > > > > > > > > by
> > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> > > used)
> > > > > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> > (the
> > > > > schema
> > > > > > > used
> > > > > > > > > by
> > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> > > used)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > So each strategy should have its own
> > section.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm saying this since you used
> > > "writttenSchema"
> > > > > > word
> > > > > > > but
> > > > > > > > > it
> > > > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > > > something completely different if it's
> > > backward
> > > > > or
> > > > > > > > > forward
> > > > > > > > > > > check.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Once you'll have that structure like
> that,
> > I
> > > > > > > personally
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > > able
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The motivation and problem statement are
> > now
> > > > > good -
> > > > > > > > > thanks
> > > > > > > > > > > for
> > > > > > > > > > > > > > > improving
> > > > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan
> Liu <
> > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This
> > > time
> > > > > I've
> > > > > > > added
> > > > > > > > > > > some
> > > > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The compatibility check rules are
> already
> > > > > written
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > > > > ProtoBufNative implements the same
> > canRead
> > > > > method
> > > > > > > as
> > > > > > > > > Apache
> > > > > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > > > > It does this by checking whether the
> > schema
> > > > for
> > > > > > > > > writing and
> > > > > > > > > > > > > > reading
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > > > > > writtenSchema
> > > > > > > > > and
> > > > > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > Backward, Forward, and Full strategies
> > are
> > > > the
> > > > > > old
> > > > > > > or
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > > > > > > version
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Asaf Mesika <[email protected]>
> > > > > > 于2023年2月26日周日
> > > > > > > > > 23:24写道：
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot
> of
> > > > > > background
> > > > > > > > > > > knowledge,
> > > > > > > > > > > > > so
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > add IMO for people to understand it.
> > You
> > > > > don't
> > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very
> > > least a
> > > > > few
> > > > > > > > > > > paragraphs
> > > > > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > you need to know, to put you in
> > context:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a
> > built-in
> > > > > > schema
> > > > > > > > > registry
> > > > > > > > > > > > > > inside
> > > > > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > > > > >       - Every time the client updates
> > the
> > > > > > > schema, it
> > > > > > > > > > > uploads
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > >       broker. When that happens, it
> > has a
> > > > > > feature
> > > > > > > > > which
> > > > > > > > > > > > > > validates
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > > >       schema version is compatible
> with
> > > the
> > > > > > > previous
> > > > > > > > > > > > > versions.
> > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
> > > (complete
> > > > > and
> > > > > > > > > explain
> > > > > > > > > > > each
> > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema
> > registry
> > > > > > supports
> > > > > > > > > various
> > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native,
> > ...
> > > > > > > (complete
> > > > > > > > > the
> > > > > > > > > > > > > rest),
> > > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > > > > >    has a schema which dictates how to
> > > > > serialize
> > > > > > > and
> > > > > > > > > > > > > deserialize
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > > > > >    - Explain in short what is
> protobuf
> > > > native
> > > > > > > > > (compare
> > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > > > > >    - Please don't paste code instead
> of
> > > > > > > explaining.
> > > > > > > > > > > > > > > > > > > >       - Explain that protobuf native
> > > > current
> > > > > > > > > validation
> > > > > > > > > > > check
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > > > >       composed of checking the root
> > > message
> > > > > > name
> > > > > > > is
> > > > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > > > > >       schema version and the new
> > version.
> > > > > > > > > > > > > > > > > > > >          - Explain briefly what is a
> > root
> > > > > > message
> > > > > > > > > and its
> > > > > > > > > > > > > name.
> > > > > > > > > > > > > > > > > > > >       - Explain the problem (list
> > > > scenarios)
> > > > > > > that we
> > > > > > > > > have
> > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > > >       native schema only supports
> FULL
> > > > > > > compatibility
> > > > > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding high level design - as in
> > what
> > > > you
> > > > > > > plan to
> > > > > > > > > do.
> > > > > > > > > > > > > > > > > > > > I suggest you add "High Level Design"
> > and
> > > > in
> > > > > it
> > > > > > > > > detail
> > > > > > > > > > > how
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > > > > compatibility
> > > > > > > > > check
> > > > > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > > > > I tried reading the implementation -
> > for
> > > > me ,
> > > > > > > it's
> > > > > > > > > all
> > > > > > > > > > > over
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > > > you please list in order what I wrote
> > > > above,
> > > > > > and
> > > > > > > > > list the
> > > > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > > > > with a good explanation why you
> > validate
> > > it
> > > > > > like
> > > > > > > > > that?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Lastly, one you have all the
> validation
> > > > rules
> > > > > > > clearly
> > > > > > > > > > > stated,
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > it to document it properly so users
> can
> > > > know
> > > > > > what
> > > > > > > > > > > validation
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan
> > > Liu <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the
> code
> > > and
> > > > > > > > > described the
> > > > > > > > > > > > > > design
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
> > > compatibility
> > > > > > > checks.
> > > > > > > > > You
> > > > > > > > > > > can
> > > > > > > > > > > > > > have a
> > > > > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Asaf Mesika <[email protected]
> >
> > > > > > > 于2023年2月22日周三
> > > > > > > > > > > 21:16写道：
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I read it but you're almost
> > directly
> > > > > diving
> > > > > > > into
> > > > > > > > > the
> > > > > > > > > > > > > code -
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > > > > hours just to reverse engineer
> your
> > > > > design.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Can you please include a "High
> > Level
> > > > > > Design"
> > > > > > > > > section
> > > > > > > > > > > in
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > > > > > If I can read that section and
> > > explain
> > > > to
> > > > > > > someone
> > > > > > > > > > > else
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM
> > SiNan
> > > > > Liu <
> > > > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > > > > >
> > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > We can talk about the current
> > > design
> > > > > > here.
> > > > > > > > > > > Especially
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > change check rules, please give
> > > your
> > > > > > > valuable
> > > > > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Reply via email to