I think we have two ways to do that. First way: We need to advance the improvement of java in protobuf. Ask if they have plans to improve.
Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should be implemented as a plugin, don't change any existing plugin logic (it's simple and already used). I don't recommend adding flags for rollback, it adds configuration and makes little sense. Thanks, Bo Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月6日周一 23:00写道: > > Can you convert the code block which is actually a quote in the > beginning of the PIP to something which doesn't require to scroll > horizontally so much? > Use > https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text > > Let's improve the clarity of what you wrote: > > "the PROTOBUF uses avro struct to store." > --> > When Schema type PROTOBUF is used, Pulsar Client assumes the object given > to it as message data is an auto-generated POJO containing the annotations > encoding the schema. The client is using a converter, which converts a > Protobuf schema descriptor into an Avro schema and sends that as the Schema > of the producer/consumer. > > "On the broker side, protobuf and avro both use SchemaData converted to > org.apache.avro.Schema." > --> > Since the schema is an Avro schema, the implementation of compatibility > check on the broker side is to simply re-use the compatibility check of the > AVRO schema type. > > "ProtobufSchema is different from ProtobufNativeSchema in schema > compatibility check it uses avro-protobuf. > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview > But the current implementation of ProtobufNative schema compatibility > check only > checked if the root message name is changed." > > --> > PROTOBUF_NATIVE schema type is different. > The client is actually using Protobuf Descriptor as the schema, as opposed > to Avro schema of PROTOBUF schema type. In the broker, the PROTOBUF_NATIVE > compatibility check actually hasn't implemented any rule, besides one: > checking if the root message name has changed. > > > > > 1. For now, there is no official or third-party solution for ProtoBuf > > compatibility. If in the future have better solutions of a third party or > > the official, we develop new ProtobufNativeSchemaValidator and use, so > > add a flag. > > > > Who do you need to make that configurable? Once you found a third party, > just switch to it? Who knows, maybe you never will. Introduce it when you > find it, not now. > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD > > these strategies. As with the AVRO implementation, protobuf compatibility > > checking need implementing the canRead method. *This will check that > > the writtenschema can be read by readSchema.* > > > I completely disagree. > Avro implementation is confusing for our use case. Don't copy that. > > You have > > public void checkCompatible(SchemaData from, SchemaData to, > SchemaCompatibilityStrategy strategy) > throws IncompatibleSchemaException { > Descriptor fromDescriptor = > ProtobufNativeSchemaUtils.deserialize(from.getData()); > Descriptor toDescriptor = > ProtobufNativeSchemaUtils.deserialize(to.getData()); > switch (strategy) { > case BACKWARD_TRANSITIVE: > case BACKWARD: > case FORWARD_TRANSITIVE: > case FORWARD: > case FULL_TRANSITIVE: > case FULL: > checkRootMessageChange(fromDescriptor, toDescriptor, strategy); > return; > case ALWAYS_COMPATIBLE: > return; > default: > throw new IncompatibleSchemaException("Unknown > SchemaCompatibilityStrategy."); > } > } > > I would rename : > from --> currentSchema > to --> newSchema > > Use that switch case and have a method for each like: > validateBackwardsCompatibility(currentSchema, newSchema) > > I dislike canRead and usage of writtenSchema, since you have two completely > different use cases: from the producing side and the consumer side. > > schemaValidatorBuilder > > > > I dislike this proposal. IMO Avro implementation is way too complicated. > Why not have a simple function for validation for each switch case above? > Why do we need strategy and builder, and all this complexity? > > > *Here are the basic compatibility rules we've defined:* > > > IMO it's impossible to read the validation rules as you described them. > I wrote how they should be structured numerous times above. > I can't validate them. > > > IMO, the current design is very hard to read. > Please try to avoid jumping into code sections. > Write a high level design section, in which you describe in words what you > plan to do. > Write the validation rules in the structure that is easy to understand: > rules per each compatibility check, and use proper words (current schema, > new schema), since new schema can be once used for read and once used for > write. > > In its current form it takes too much time to understand the design, and it > shouldn't be the case. > > Thanks, > > Asaf > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <liusinan1...@gmail.com> wrote: > > > Hi! I updated the explanation of some things in the PIP issue. And also > > added a new “flag” in the conf is used as the different > > ProtobufNativeSchemaValidator implementation, also set > > ProtobufNativeSchemaValidator default only check whether the name of the > > root message is the same. > > > > > > Thanks, > > sinan > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月5日周日 20:21写道: > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <liusinan1...@gmail.com> wrote: > > > > > > > > > > > > > Can you please explain how a Protobuf Schema descriptor can be > > > validated > > > > > for backward compatibility check using Avro based compatibility > > rules? > > > > > Doesn't it expect the schema to be Avro, but it is actually a > > Protobuf > > > > > descriptor? > > > > > Is there some translation happening? > > > > > > > > > > > > 1. *You can take a quick look at the previous design, the PROTOBUF uses > > > > avro struct to store.* > > > > https://github.com/apache/pulsar/pull/1954 > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61 > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115 > > > > > > > > > Ok. So to summarize your code (easier to write it than send links): > > > * Pulsar Client, when used with Protobuf Schema, actually converts the > > > Protobuf descriptor into an Avro Schema (using code found inside Avro > > > library) and saves that Avro schema as the schema. It's not saving the > > > protobuf descriptor at all. Very confusing I have to add - never expected > > > that. > > > This explains why In the ProtobufSchemaCompatibilityCheck they just > > extend > > > the Avro without doing any translation. > > > > > > Thanks for that. > > > > > > Now thatI finally understand this, I can say that: you *must* explain > > that > > > in the motivation part in your PIP. > > > > > > > > > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both use `SchemaData` > > converted > > > > to `org.apache.avro.Schema`.* > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293 > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31 > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70 > > > > > > > > > Actually those links don't really help. > > > The main link that helps is: > > > > > > > > https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122 > > > > > > > > > > > > > > > > > > > > > > > > > > I'm sorry - I don't understand. > > > > > I understand the different compatibility check strategies. > > > > > If you just spell them out here, then as you say, just translate the > > > > > Protobuf Descriptor into an Avro schema and run the Avro > > > > > compatibility validation, no? > > > > > I believe the answer is no, since you may want to verify different > > > things > > > > > when it comes to Protobuf, which are different then Avro. > > > > > > > > > > > > 1. > > > > *ProtobufSchema is different from ProtobufNativeSchema in that it uses > > > > avro-protobuf.* > > > > > > > > > > > > > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview > > > > *ProtobufNativeSchema needs a native compatibility check, but there is > > no > > > > official or third party implementation. So this PIP does not use > > > > avro-protobuf for protobuf compatibility checking.* > > > > > > > > 2. *By the way, this is implemented in much the same way that Apache > > avro > > > > does compatibility checking.* > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy` > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are > > > > writtenSchema and readSchema. We only need to change the order of > > > arguments > > > > we pass to `canRead()`.* > > > > ```java > > > > private void validateWithStrategy(Descriptors.Descriptor toValidate, > > > > Descriptors.Descriptor fromDescriptor) throws > > > ProtoBufCanReadCheckException > > > > { > > > > switch (strategy) { > > > > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate); > > > > case CanBeReadByExistingStrategy -> canRead(toValidate, > > fromDescriptor); > > > > case CanBeReadMutualStrategy -> { > > > > canRead(toValidate, fromDescriptor); > > > > canRead(fromDescriptor, toValidate); > > > > } > > > > } > > > > } > > > > > > > > private void canRead(Descriptors.Descriptor writtenSchema, > > > > Descriptors.Descriptor readSchema) throws > > ProtoBufCanReadCheckException { > > > > > > > > > ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema, > > > > readSchema); > > > > } > > > > ``` > > > > > > > > > > > I get that you want to take inspiration from the existing Avro Schema > > > compatibility check, to do your code design. > > > I also understand you *won't* use any existing avro code for that. > > > I also understand, you have to write the validation check on your own, > > > since there is no 3rd party to explain that. > > > > > > The only thing I can't understand are the actual rules as I wrote before, > > > since they are written confusingly. > > > So, I repeat what I asked before: > > > > > > I think you should structure the validation rules differently: > > > > > > * Backward checks > > > ** List down rules, where use newSchema (the schema used by producer or > > > consumer) and existingSchema (last schema used) > > > * Forward > > > ** List down rules, where use newSchema (the schema used by producer or > > > consumer) and existingSchema (last schema used) > > > > > > Once that's accomplished I will be able to understand the different > > > validation rules for each compatibility check. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > sinan > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 21:19写道: > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <liusinan1...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > > > > I read it and they look identical. What's the difference between > > > > them? > > > > > > > > > > > > Current avro,json, and protobuf schemas are all implemented based > > on > > > > AVRO. > > > > > > > What do you mean, they are all implemented based on Avro? You > > mean > > > > the > > > > > > > protobuf schema is converted into an Avro Schema, and then you > > use > > > > Avro > > > > > > > compatibility validation? > > > > > > > > > > > > > > > > > > > > > > > > > > > `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck` > > > > > > > > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck` > > > > > > > > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck` > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the > > > > > > `checkCompatible()` is the same implementation with `AVRO`. > > > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema descriptor can be > > > validated > > > > > for backward compatibility check using Avro based compatibility > > rules? > > > > > Doesn't it expect the schema to be Avro, but it is actually a > > Protobuf > > > > > descriptor? > > > > > Is there some translation happening? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you should structure the validation rules differently: > > > > > > > > > > > > > > > > > > The Compatibility check strategy is described on the website > > > > > > > > > > > > > > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can > > > > process > > > > > > data written by producers using the last schema version V2. So V2 > > is > > > > > > "writtenSchema" and V3 is "readSchema". > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last > > > > schema > > > > > > version V2 can process data written by producers using a new schema > > > V3, > > > > > > even though they may not be able to use the full capabilities of > > the > > > > new > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema". > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and > > > forward > > > > > > compatible. > > > > > > Schema can evolve. The old version schema and the new version > > schema > > > > should > > > > > > be well understood. > > > > > > > > > > > > > > > > > I'm sorry - I don't understand. > > > > > I understand the different compatibility check strategies. > > > > > If you just spell them out here, then as you say, just translate the > > > > > Protobuf Descriptor into an Avro schema and run the Avro > > > > > compatibility validation, no? > > > > > I believe the answer is no, since you may want to verify different > > > things > > > > > when it comes to Protobuf, which are different then Avro. > > > > > > > > > > At the current state, I can't understand your design at all. Please > > > help > > > > > clarify that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So each strategy should have its own section. > > > > > > > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema and readSchema. As > > > we've > > > > > > just described, we only need to change the order of arguments we > > pass > > > > to > > > > > > `canRead()`. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > sinan > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月27日周一 20:49写道: > > > > > > > > > > > > > > > > > > > > > > And you can see the difference between ProtoBuf and > > > ProtoBufNative: > > > > > > > > > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative > > > > > > > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf > > > > > > > > > > > > > > > I read it and they look identical. What's the difference between > > > > them? > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all implemented based > > > on > > > > > > AVRO. > > > > > > > > > > > > > > What do you mean, they are all implemented based on Avro? You > > mean > > > > the > > > > > > > protobuf schema is converted into an Avro Schema, and then you > > use > > > > Avro > > > > > > > compatibility validation? > > > > > > > > > > > > > > > > > > > > > > *Here are the basic compatibility rules we've defined:* > > > > > > > > > > > > > > > > > > > > > I think you should structure the validation rules differently: > > > > > > > > > > > > > > * Backward checks > > > > > > > ** List down rules, where use newSchema (the schema used by > > > producer > > > > or > > > > > > > consumer) and existingSchema (last schema used) > > > > > > > * Forward > > > > > > > ** List down rules, where use newSchema (the schema used by > > > producer > > > > or > > > > > > > consumer) and existingSchema (last schema used) > > > > > > > > > > > > > > So each strategy should have its own section. > > > > > > > > > > > > > > I'm saying this since you used "writttenSchema" word but it > > > > represents > > > > > > > something completely different if it's backward or forward check. > > > > > > > > > > > > > > Once you'll have that structure like that, I personally will be > > > able > > > > to > > > > > > > read and understand it. > > > > > > > > > > > > > > > > > > > > > The motivation and problem statement are now good - thanks for > > > > improving > > > > > > > it. > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu < > > liusinan1...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This time I've added some > > > > background > > > > > > > and > > > > > > > > some explanations. > > > > > > > > > > > > > > > > The compatibility check rules are already written in the > > > > > > Implementation. > > > > > > > > ProtoBufNative implements the same canRead method as Apache > > Avro. > > > > > > > > It does this by checking whether the schema for writing and > > > reading > > > > is > > > > > > > > compatible. I also indicate whether the writtenSchema and > > > > readSchema of > > > > > > > the > > > > > > > > Backward, Forward, and Full strategies are the old or the new > > > > version > > > > > > of > > > > > > > > the schema. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > sinan > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月26日周日 23:24写道: > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, > > so > > > > you > > > > > > > need > > > > > > > > to > > > > > > > > > add IMO for people to understand it. You don't need to > > explain > > > > the > > > > > > > entire > > > > > > > > > pulsar in this PIP, but at the very least a few paragraphs > > > > detailing > > > > > > > all > > > > > > > > > you need to know, to put you in context: > > > > > > > > > > > > > > > > > > > > > > > > > > > - Start by saying Pulsar as a built-in schema registry > > > inside > > > > > > Pulsar > > > > > > > > > broker. > > > > > > > > > - Every time the client updates the schema, it uploads > > it > > > > to > > > > > > the > > > > > > > > > broker. When that happens, it has a feature which > > > validates > > > > if > > > > > > > the > > > > > > > > > new > > > > > > > > > schema version is compatible with the previous > > versions. > > > > There > > > > > > > > > are 4 types > > > > > > > > > of compatibility: Full, ... (complete and explain each > > > one > > > > > > > briefly) > > > > > > > > > - Also explain Pulsar Schema registry supports various > > > schema > > > > > > > > > protocols: Avro, protobuf native, ... (complete the > > rest), > > > > each > > > > > > > > > protocol > > > > > > > > > has a schema which dictates how to serialize and > > deserialize > > > > the > > > > > > > > message > > > > > > > > > content into typed object. > > > > > > > > > - Explain in short what is protobuf native (compare > > protobuf > > > > > > > > non-native) > > > > > > > > > - Please don't paste code instead of explaining. > > > > > > > > > - Explain that protobuf native current validation check > > > is > > > > only > > > > > > > > > composed of checking the root message name is the same > > > > between > > > > > > > > > the current > > > > > > > > > schema version and the new version. > > > > > > > > > - Explain briefly what is a root message and its > > name. > > > > > > > > > - Explain the problem (list scenarios) that we have > > > because > > > > > > > > protobuf > > > > > > > > > native schema only supports FULL compatibility > > > validation. > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding high level design - as in what you plan to do. > > > > > > > > > I suggest you add "High Level Design" and in it detail how > > you > > > > plan > > > > > > to > > > > > > > > > validate, per protobuf version, per compatibility check > > > > (backward, > > > > > > > > forward, > > > > > > > > > full,...). > > > > > > > > > I tried reading the implementation - for me , it's all over > > the > > > > > > place. > > > > > > > > Can > > > > > > > > > you please list in order what I wrote above, and list the > > > > validation > > > > > > > > rules > > > > > > > > > with a good explanation why you validate it like that? > > > > > > > > > > > > > > > > > > Lastly, one you have all the validation rules clearly stated, > > > you > > > > can > > > > > > > use > > > > > > > > > it to document it properly so users can know what validation > > to > > > > > > expect. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Asaf > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu < > > > > liusinan1...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code and described the > > > design > > > > to > > > > > > > > improve > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can > > > have a > > > > > > look. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月22日周三 21:16写道: > > > > > > > > > > > > > > > > > > > > > I read it but you're almost directly diving into the > > code - > > > > it > > > > > > will > > > > > > > > > take > > > > > > > > > > me > > > > > > > > > > > hours just to reverse engineer your design. > > > > > > > > > > > > > > > > > > > > > > Can you please include a "High Level Design" section in > > > which > > > > you > > > > > > > > > explain > > > > > > > > > > > how you plan to tackle any issue? > > > > > > > > > > > If I can read that section and explain to someone else > > how > > > > this > > > > > > > will > > > > > > > > > > work, > > > > > > > > > > > it means the section is complete. > > > > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu < > > > > > > liusinan1...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss: > > > > > > > > > https://github.com/apache/pulsar/issues/19565 > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > We can talk about the current design here. Especially > > for > > > > the > > > > > > > field > > > > > > > > > > type > > > > > > > > > > > > change check rules, please give your valuable advice. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >