Hello Enrico. Thanks for your suggestion, according to my understanding of what you said "flag". How about we add a configuration in the next release:
protoBufNativeSchemaValidatorClassName=org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl Use the previous implementation if the configuration is empty (check only the name of the root message). If there is a better third-party or official solution in the future, develop a new " ProtobufNativeSchemaBreakValidatorImpl " to give users a choice. What do you think of this design? If there is a better third party or official solution in the future, do you think the current pr implementation should be retained or deleted? Thanks, sinan Enrico Olivelli <eolive...@gmail.com> 于 2023年3月2日周四 上午12:47写道: > (I apologise for top posting) > > Would it be possible to add a flag to fallback to the previous behaviour ? > I know that adding such flags is a burden but if the upgrade breaks > some workflows then users won't be able to upgrade. > We can add the flag in the next release and drop it in the next major > release > > Enrico > > Il giorno mer 1 mar 2023 alle ore 15:33 SiNan Liu > <liusinan1...@gmail.com> ha scritto: > > > > > > > > Can you please explain how a Protobuf Schema descriptor can be > validated > > > for backward compatibility check using Avro based compatibility rules? > > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf > > > descriptor? > > > Is there some translation happening? > > > > > > 1. *You can take a quick look at the previous design, the PROTOBUF uses > > avro struct to store.* > > https://github.com/apache/pulsar/pull/1954 > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61 > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115 > > > > 2. *On the broker side, protobuf and avro both use `SchemaData` converted > > to `org.apache.avro.Schema`.* > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293 > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31 > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70 > > > > > > > > I'm sorry - I don't understand. > > > I understand the different compatibility check strategies. > > > If you just spell them out here, then as you say, just translate the > > > Protobuf Descriptor into an Avro schema and run the Avro > > > compatibility validation, no? > > > I believe the answer is no, since you may want to verify different > things > > > when it comes to Protobuf, which are different then Avro. > > > > > > 1. > > *ProtobufSchema is different from ProtobufNativeSchema in that it uses > > avro-protobuf.* > > > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview > > *ProtobufNativeSchema needs a native compatibility check, but there is no > > official or third party implementation. So this PIP does not use > > avro-protobuf for protobuf compatibility checking.* > > > > 2. *By the way, this is implemented in much the same way that Apache avro > > does compatibility checking.* > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy` > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are > > writtenSchema and readSchema. We only need to change the order of > arguments > > we pass to `canRead()`.* > > ```java > > private void validateWithStrategy(Descriptors.Descriptor toValidate, > > Descriptors.Descriptor fromDescriptor) throws > ProtoBufCanReadCheckException > > { > > switch (strategy) { > > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate); > > case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor); > > case CanBeReadMutualStrategy -> { > > canRead(toValidate, fromDescriptor); > > canRead(fromDescriptor, toValidate); > > } > > } > > } > > > > private void canRead(Descriptors.Descriptor writtenSchema, > > Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException { > > > ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema, > > readSchema); > > } > > ``` > > > > > > Thanks, > > sinan > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 21:19写道: > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <liusinan1...@gmail.com> > wrote: > > > > > > > > > > > > > I read it and they look identical. What's the difference between > them? > > > > > > > > Current avro,json, and protobuf schemas are all implemented based on > > AVRO. > > > > > What do you mean, they are all implemented based on Avro? You mean > the > > > > > protobuf schema is converted into an Avro Schema, and then you use > > Avro > > > > > compatibility validation? > > > > > > > > > > > > > > > `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck` > > > > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck` > > > > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck` > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the > > > > `checkCompatible()` is the same implementation with `AVRO`. > > > > > > > > > > Can you please explain how a Protobuf Schema descriptor can be > validated > > > for backward compatibility check using Avro based compatibility rules? > > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf > > > descriptor? > > > Is there some translation happening? > > > > > > > > > > > > > > > > > > > > > I think you should structure the validation rules differently: > > > > > > > > > > > > The Compatibility check strategy is described on the website > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can > > process > > > > data written by producers using the last schema version V2. So V2 is > > > > "writtenSchema" and V3 is "readSchema". > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last > schema > > > > version V2 can process data written by producers using a new schema > V3, > > > > even though they may not be able to use the full capabilities of the > new > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema". > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and > forward > > > > compatible. > > > > Schema can evolve. The old version schema and the new version schema > > should > > > > be well understood. > > > > > > > > > > > I'm sorry - I don't understand. > > > I understand the different compatibility check strategies. > > > If you just spell them out here, then as you say, just translate the > > > Protobuf Descriptor into an Avro schema and run the Avro > > > compatibility validation, no? > > > I believe the answer is no, since you may want to verify different > things > > > when it comes to Protobuf, which are different then Avro. > > > > > > At the current state, I can't understand your design at all. Please > help > > > clarify that. > > > > > > > > > > > > > > > > > > > > > > > So each strategy should have its own section. > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema and readSchema. As > we've > > > > just described, we only need to change the order of arguments we > pass to > > > > `canRead()`. > > > > > > > > > > > > > > > > Thanks, > > > > sinan > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月27日周一 20:49写道: > > > > > > > > > > > > > > > > And you can see the difference between ProtoBuf and > ProtoBufNative: > > > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf > > > > > > > > > > > I read it and they look identical. What's the difference between > > them? > > > > > > > > > > Current avro,json, and protobuf schemas are all implemented based > on > > > > AVRO. > > > > > > > > > > What do you mean, they are all implemented based on Avro? You mean > the > > > > > protobuf schema is converted into an Avro Schema, and then you use > > Avro > > > > > compatibility validation? > > > > > > > > > > > > > > > > *Here are the basic compatibility rules we've defined:* > > > > > > > > > > > > > > > I think you should structure the validation rules differently: > > > > > > > > > > * Backward checks > > > > > ** List down rules, where use newSchema (the schema used by > producer > > or > > > > > consumer) and existingSchema (last schema used) > > > > > * Forward > > > > > ** List down rules, where use newSchema (the schema used by > producer > > or > > > > > consumer) and existingSchema (last schema used) > > > > > > > > > > So each strategy should have its own section. > > > > > > > > > > I'm saying this since you used "writttenSchema" word but it > represents > > > > > something completely different if it's backward or forward check. > > > > > > > > > > Once you'll have that structure like that, I personally will be > able > > to > > > > > read and understand it. > > > > > > > > > > > > > > > The motivation and problem statement are now good - thanks for > > improving > > > > > it. > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <liusinan1...@gmail.com> > > > > wrote: > > > > > > > > > > > Hi! I updated the PIP issue again. This time I've added some > > background > > > > > and > > > > > > some explanations. > > > > > > > > > > > > The compatibility check rules are already written in the > > > > Implementation. > > > > > > ProtoBufNative implements the same canRead method as Apache Avro. > > > > > > It does this by checking whether the schema for writing and > reading > > is > > > > > > compatible. I also indicate whether the writtenSchema and > > readSchema of > > > > > the > > > > > > Backward, Forward, and Full strategies are the old or the new > > version > > > > of > > > > > > the schema. > > > > > > > > > > > > Thanks, > > > > > > sinan > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月26日周日 23:24写道: > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so > > you > > > > > need > > > > > > to > > > > > > > add IMO for people to understand it. You don't need to explain > the > > > > > entire > > > > > > > pulsar in this PIP, but at the very least a few paragraphs > > detailing > > > > > all > > > > > > > you need to know, to put you in context: > > > > > > > > > > > > > > > > > > > > > - Start by saying Pulsar as a built-in schema registry > inside > > > > Pulsar > > > > > > > broker. > > > > > > > - Every time the client updates the schema, it uploads > it to > > > > the > > > > > > > broker. When that happens, it has a feature which > validates > > if > > > > > the > > > > > > > new > > > > > > > schema version is compatible with the previous versions. > > There > > > > > > > are 4 types > > > > > > > of compatibility: Full, ... (complete and explain each > one > > > > > briefly) > > > > > > > - Also explain Pulsar Schema registry supports various > schema > > > > > > > protocols: Avro, protobuf native, ... (complete the rest), > > each > > > > > > > protocol > > > > > > > has a schema which dictates how to serialize and deserialize > > the > > > > > > message > > > > > > > content into typed object. > > > > > > > - Explain in short what is protobuf native (compare protobuf > > > > > > non-native) > > > > > > > - Please don't paste code instead of explaining. > > > > > > > - Explain that protobuf native current validation check > is > > only > > > > > > > composed of checking the root message name is the same > > between > > > > > > > the current > > > > > > > schema version and the new version. > > > > > > > - Explain briefly what is a root message and its name. > > > > > > > - Explain the problem (list scenarios) that we have > because > > > > > > protobuf > > > > > > > native schema only supports FULL compatibility > validation. > > > > > > > > > > > > > > > > > > > > > Regarding high level design - as in what you plan to do. > > > > > > > I suggest you add "High Level Design" and in it detail how you > > plan > > > > to > > > > > > > validate, per protobuf version, per compatibility check > (backward, > > > > > > forward, > > > > > > > full,...). > > > > > > > I tried reading the implementation - for me , it's all over the > > > > place. > > > > > > Can > > > > > > > you please list in order what I wrote above, and list the > > validation > > > > > > rules > > > > > > > with a good explanation why you validate it like that? > > > > > > > > > > > > > > Lastly, one you have all the validation rules clearly stated, > you > > can > > > > > use > > > > > > > it to document it properly so users can know what validation to > > > > expect. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Asaf > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu < > liusinan1...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code and described the > design > > to > > > > > > improve > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can > have a > > > > look. > > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年2月22日周三 21:16写道: > > > > > > > > > > > > > > > > > I read it but you're almost directly diving into the code > - it > > > > will > > > > > > > take > > > > > > > > me > > > > > > > > > hours just to reverse engineer your design. > > > > > > > > > > > > > > > > > > Can you please include a "High Level Design" section in > which > > you > > > > > > > explain > > > > > > > > > how you plan to tackle any issue? > > > > > > > > > If I can read that section and explain to someone else how > > this > > > > > will > > > > > > > > work, > > > > > > > > > it means the section is complete. > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu < > > > > liusinan1...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss: > > > > > > > https://github.com/apache/pulsar/issues/19565 > > > > > > > > . > > > > > > > > > > > > > > > > > > > > We can talk about the current design here. Especially for > > the > > > > > field > > > > > > > > type > > > > > > > > > > change check rules, please give your valuable advice. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >