Also: (2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for protobuf > schema compatibility checking.
The root message name is the class name we pass in when we create the > producer or consumer. ProtoBuf has many nested messages or dependencies. > The current implementation only checks if the passed class name is the > same. It does not check if the fields in the file change in a way that is > compatible with older versions of the schema. You missed the most important thing there: PROTOBUF_NATIVE uses Protobuf Descriptor when persisting the schema. It's not using Avro Schema definition. As I wrote you in previous email: PROTOBUF_NATIVE was created to fix that shortcoming, by actually persisting > the Protobuf Descriptor and using Protobuf for encoding. > On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <liusinan1...@gmail.com> wrote: > 1. > > > > Why? The root message name is not written over the wire to the best of > my > > > knowledge. I haven't found it written in the official doc. > > > The name of the root message check is the rules in the previous > `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message has a > different name, there is no need to check its contents. "Same" schema, > their names must be the same. > > > 2. > > > >The writtenSchema can not change the field number of any field in > > readSchema (the > field name is the same, but the field number is > > different). > > >You have to take into account field type as well when comparing. > > > The first sentence on the website says that the number of fields cannot be > changed. > > > Don’t change the field numbers for any existing fields. > > > > 3. > > > > - The writtenSchema cannot add required fields, but optional or > > > duplicate fields can be added (The field number must be new). > > > > > > That's not true. > > You can have a required field in writeSchema, and not have that field in > > readSchema (based on tag number). > > > The required field must exist. In your case, where does readSchema go to > read its required fields? It's not in writtenSchema. > The second and third sentences in the official website say, add and delete > do not operate required field! > > > 4. > > > (4) The writtenSchema can not change the field name of any field in > > > readSchema (the field number is the same, but the field name is > > > different). > > This is incorrect. > > Fields names are encoded into the wire. I don't see this in any best > > practice. > > > The third sentence on the website: > > > You may want to rename the field instead, perhaps adding the prefix > > “OBSOLETE_”, or make the field number reserved, so that future users of > > your .proto can’t accidentally reuse the number. > > If you want to rename a field, or add a new field. To delete with the new > number! > > > 5. > > > The writtenSchema does not change the field name and number, but it does > > change the field type. > > > Small correction: for the same field number you are not allowed to > change > > types. Name is irrelevant. > > > Why doesn't the name irrelevant? > Here is the change in type, which is the rule stated in Alternatives. There > is no check here, just a warning to the user. > Another change is that the name of enum is changed, or the name of MESSAGE > is changed, which is the same as the root message name check in 1, is the > change still the same? This is not allowed to change! > > *Here is example:* > readSchema( > > https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4 > ) > writtenSchema( > > https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f > ) > > > 6. > > > (6) The writtenSchema removes fields that do not have default values in > > > readSchema. Then the schema is incompatible. > > Protobuf gives you its own default if you don't supply one. This is > > incorrect. > > > (1) This rule only applies if proto2 does not set the default value. If > proto3 does not check, the default value will always be there. > > (2) In PIP issue: > > > Proto3 canceled the required field, so there is no need to check the > > required field. We get the syntax(proto2 or proto3) of proto in the code, > > and skip the check of the required field if it is proto3. All other > > checking rules also apply to proto3. > > > *I made a mistake here. This default value check is not need in proto3. I > will modify the rules later according to your suggestion.* > > > I would remove the proto2/proto3 sections, since they only differ in 1 > > rule, and just mention that distinction inside that rule (less work for > the > > reade). > > > (3) And add rules that look like they should be checked: > > > Rules that you don't have in the doc, but should IMO* > > ....... > > There can't be a field in writerSchema, that exists in readerSchema (tag > > number based), which in writerSchema its type is scalar, but in > readSchema > > its type is scalar, it's repeated but with packed=true. > > > But I don't think rule number three needs to be added. > > > Thanks, > sinan > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月14日周二 22:33写道: > > > Hi Sinan, > > > > The doc looks much better! > > > > I have a few additional comments: > > > > Pasting comment from previous emails: > > > > Can you convert the code block which is actually a quote in the > > beginning of the PIP to something which doesn't require to scroll > > horizontally so much? > > Use > > > > > https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text > > > > *Validation Rules* > > > > (1) If the root message names of writtenSchema and readSchema are > > > different, then incompatible. > > > > Why? The root message name is not written over the wire to the best of my > > knowledge. I haven't found it written in the official doc. > > > > > > > - The writtenSchema cannot add required fields, but optional or > > > duplicate fields can be added (The field number must be new). > > > > > > That's not true. > > You can have a required field in writeSchema, and not have that field in > > readSchema (based on tag number). > > > > The writtenSchema can not change the field number of any field in > > readSchema (the > > > field name is the same, but the field number is different). > > > > You have to take into account field type as well when comparing. > > > > If I have in readSchema > > int32 justAnID = 1 > > int32 customerId = 2 > > > > and in writeSchema I have > > int32 justAnID = 1 > > string customerId = 3 > > > > This is valid. > > > > (4) The writtenSchema can not change the field name of any field in > > > readSchema (the field number is the same, but the field name is > > > different). > > > > This is incorrect. > > Fields names are encoded into the wire. I don't see this in any best > > practice. > > > > ) The writtenSchema does not change the field name and number, but it > does > > > change the field type. > > > > > > - If the field type is ENUM or MESSAGE, the schema is not compatible > > > when the type name is changed > > > - If the type of the field is another type. The schemas under this > > > rule are not incompatible, but warn the user.(There is another way > of > > > testing in PIP issue Alternatives) > > > > > > Small correction: for the same field number you are not allowed to > change > > types. Name is irrelevant. > > > > (6) The writtenSchema removes fields that do not have default values in > > > readSchema. Then the schema is incompatible. > > > > Protobuf gives you its own default if you don't supply one. This is > > incorrect. > > > > > > *Rules that you don't have in the doc, but should IMO* > > * There can not be a field which exists both in readSchema and > writeSchema, > > with same tag number, having different default values > > * There can't be a field in writerSchema, that exists in readerSchema > (tag > > number based), which in writerSchema is repeated and its type is Scalar ( > > https://protobuf.dev/programming-guides/proto/#scalar) but in readSchema > > it > > is not repeated anymore. > > * There can't be a field in writerSchema, that exists in readerSchema > (tag > > number based), which in writerSchema its type is scalar, but in > readSchema > > its type is scalar, it's repeated but with packed=true. > > > > *Rules you have , but I would phrase a bit differently* > > > > I would remove the proto2/proto3 sections, since they only differ in 1 > > rule, and just mention that distinction inside that rule (less work for > the > > reade). > > > > * readSchema has a field which doesn't exist in writerSchema (based on > tag > > number). > > * Proto v2: > > * That field must be `optional` or `repeated` (must not be > > `required`) > > * Proto v3: > > * No problem. > > * There can not be a field which exists both in readSchema and > writeSchema, > > with the same tag number, but having different types. > > > > *Motivation* > > > > Basically in the motivation section you want people to understand the > > following: > > > > Pulsar has built-in support for typed messages. It allows specifying an > > encoding scheme and its matching schema. > > For example, it supports Avro. You specify a schema for a given topic, > > using Avro Schema Definition (i.e. a JSON describing the schema). > Everytime > > you produce a message, you first declare the schema definition you wish > to > > use for your messages. The message data should be an avro-encoded binary > > data (which the client in some SDKs helps encode a given > > data-structure/object). > > The same applies when you consume a message. You first specify the schema > > you use to read the messages, and the client in some SDKs helps by > decoding > > the message binary data into an object/data-structure. > > > > Each time you specify a schema to be used, either by a producer or a > > consumer, the schema is persisted in Pulsar and given an increasing > version > > number. If the schema was the same as the previous version, it is not > > saved. When the message is persisted, the version number is encoded in > the > > message headers. > > > > Pulsar provides a very useful feature named Schema Evolution > > < > https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution > > >. > > It allows us to check if a new schema version is compatible with previous > > versions or versions. When you configure the schema for the topic you > > decide the strategy to use for doing the validation check. The strategies > > validate the following: > > > > - BACKWARD strategy > > - A consumer with newSchema can read a message written using > > existingSchema > > - BACKWARD_TRANSITIVE strategy > > - A consumer with newSchema can read messages written using all > > existingSchema > > - FORWARD > > - A consumer with existingSchema can read messages written using > > newSchema > > - FORWARD_TRANSITIVE > > - A consumer defined with any of the existingSchema can read > messages > > written using newSchema > > - FULL > > - A consumer defined with newSchema can read messages written using > > existingSchema > > - A consumer defined with existingSchema can read messages written > > using newSchema > > - FULL_TRANSITIVE > > - A consumer defined with newSchema can read messages written using > > any of the existingSchema > > - A consumer defined with any of the existingSchema can read > messages > > written using newSchema > > > > > > Aside from Avro, Pulsar also has two additional supported encodings: > > PROTOBUF and PROTOBUF_NATIVE. > > > > PROTOBUF is a bit strange. It encodes the messages using Protobuf > encoding, > > but the schema that is persisted to Pulsar is *not* Protobuf Descriptor > as > > you would have expected. The saved schema is a translation of the > Protobuf > > Descriptor to an Avro Schema, so in fact an Avro schema definition is > saved > > as the schema. > > > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually > persisting > > the Protobuf Descriptor and using Protobuf for encoding. > > The problem is that the authors of PROTOBUF_NATIVE haven't completed it > > fully, and the backward compatibility validation code almost does not > > exist: It only checks if the root message name is the same between > > versions. > > > > GOALS > > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully > > functional validation for any of the defined Schema Compatibility > > Strategies. > > A secondary goal is to allow the user to choose between different > > implementations: The new fully functional validation or the existing > > barebones validation. > > > > -------- END > > > > I'm ok with having links in the Motivation , as *further reading*. > > I'm against stacking up work for the reader to go read 5-6 different > links > > just to understand the motivation and background knowledge required to > > understand the feature. > > > > I'm against putting code in the Motivation. Especially if it is supposed > to > > replace description in plain English making it easy to understand the > > design. > > Leave the code to the motivation. > > Paste code only if you absolutely can't use plain old descriptions to > > explain. > > > > > > > > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <liusinan1...@gmail.com> > wrote: > > > > > *I guess that's right, too! * > > > > > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is > better, > > > because we don't know whether the future will have V2, V3. The official > > > solution can be called > `ProtobufNativeOfficialSchemaCompatibilityCheck`, > > or > > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck` third-party > > > solution. > > > > > > I've updated my design in PIP issue. > > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a > > > ProtobufNativeSchemaCompatibilityCheck different validator > > implementation. > > > 2. Remove the 'builder' > > > 3. Clarify the relationship between newSchema, existingSchema, and > > > writtenSchema in canRead. > > > > > > Help to see if the description is comprehensive and what changes and > > > improvements need to be made. > > > > > > Thanks, > > > sinan > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月9日周四 17:35写道: > > > > > > > I like Bo's suggestion - I haven't realized each schema type > > > > compatibility check is actually a plugin. > > > > It makes sense for any schema type checks to evolve, sometimes in a > > > > non-backward compatible way hence having two plugins like > > > > protobufNativeSchemaCompatabilityCheckV1 and then > > > > protobufNativeSchemaCompatabilityCheckV2 and then > > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me. > > > > > > > > > > > > > > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bog...@apache.org> wrote: > > > > > > > > > Hi siNan: > > > > > > > > > > From my point of view, it is just a plug-in. I don't think it is > > > > > necessary to add configuration for the plugin. > > > > > This is meaningless, and it will increase the difficulty of use for > > > > users. > > > > > > > > > > > > > > > SiNan Liu <liusinan1...@gmail.com> 于2023年3月8日周三 15:54写道: > > > > > > > > > > > > Hi, bo. > > > > > > > > > > > > 1. I understand what you say, to develop a new > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than > > > changing > > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a > > few > > > > > small > > > > > > problems: > > > > > > > > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and > > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is > > PROTOBUF_NATIVE. > > > > It > > > > > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's > > > > "native" > > > > > > implementation, which leads to some problems or "unreasonable" > and > > > > gives > > > > > me > > > > > > some extended thinking and questions. > > > > > > > > > > > `CompatibilityCheck ` its only a plugin. > > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave > > the > > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is > > > > > stable, we can make it the default Checker. > > > > > > > > > > It is just a plug-in, users can change it at will and ensure that > it > > > > > is used correctly > > > > > > (2)In broker.conf > > > > > > > > > > > > `schemaRegistryCompatibilityCheckers`. If > > > > > > ProtobufNativeSchemaCompatibilityCheck and > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is > > going > > > > to > > > > > > overwrite each other. Because this is a map: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44 > > > > > > > > > > > > ```java > > > > > > > > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new > > HashMap<>(); > > > > > > > > > > > > for (String className : checkerClasses) { > > > > > > > > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck = > > > > > > Reflections.createInstance(className, > > > > > > > > > > > > SchemaCompatibilityCheck.class, > > > > > > Thread.currentThread().getContextClassLoader()); > > > > > > > > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(), > > > > > > schemaCompatibilityCheck); > > > > > > > > > > > > ``` > > > > > > > > > > > > Is this a big problem or a small one? Is it possible or > > unnecessary? > > > > > Maybe > > > > > > we can write in the documentation that protobufNative checkers > can > > > only > > > > > > choose one of the two? Why are there two Checkers for different > > > > > > implementations of the same schemaType? Why not the checker to > > create > > > > > > different validator, so we don not have to change > > > > > > schemaRegistryCompatibilityCheckers. > > > > > > > > > > users can only use one, not two, which will bring complexity to > users > > > > > > > > > > > > > > > > > (3)And after the update to > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck. > > > > > > Existing topics previously only checked the name of the root > > message, > > > > not > > > > > > the content of protobuf. > > > > > > > > > > > > What if the user wants both Checkers? > > > > > > > > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the > > > topic > > > > > of > > > > > > the existing schema? > > > > > > > > > > > > Older topics still use the old checker, and newer topics or > certain > > > > older > > > > > > topics use the new advancedchecker. > > > > > > > > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable, > > > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`. > > > > > because it not a complete checker. > > > > > > (4)So should we have one schemaType for a checker? > > > > protobufNativeChecker > > > > > > can have as many different implementation classes as possible. > This > > > > > > classname configuration in PIP, let's see if it can be set at the > > > topic > > > > > > level. In the current PIP design I just load this parameter into > > the > > > > > > checker when the broker is started and the checkers map is set > up. > > > Can > > > > I > > > > > do > > > > > > this in the new normal pr if I want to support topic level? Or > > > perfect > > > > it > > > > > > here? > > > > > > > > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more > > > > > trouble). > > > > > > > > > > > > Sorry I can not use the computer and network in the company, I > use > > my > > > > > > mobile phone to reply to the email, the format may be a bit > messy. > > > > Please > > > > > > understand. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > sinan > > > > > > > > > > > > > > > > > > 丛搏 <bog...@apache.org> 于 2023年3月7日周二 下午11:39写道: > > > > > > > > > > > > > SiNan Liu <liusinan1...@gmail.com> 于2023年3月7日周二 13:22写道: > > > > > > > > > > > > > > > > Great to see your comment, bo! > > > > > > > > > > > > > > > > 1. The first way. The protobuf website has a description of > the > > > > > rules, > > > > > > > but > > > > > > > > no plans to implement them. > > > > > > > > https://protobuf.dev/programming-guides/proto/#updating > > > > > > > > > > > > > > https://groups.google.com/g/protobuf > > > > > > > maybe ask here > > > > > > > > > > > > > > > > > > > > > > > 2. I think this PIP can be divided into two parts. > > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker > > starts. > > > > > > > > ValidatorClassName is empty by default, and the > implementation > > > > > continues > > > > > > > as > > > > > > > > before, with no change for the user. > > > > > > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in > > > `broker.conf` > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck > > > > > > > ``` > > > > > > > I do not recommend that we directly modify this plugin and > > continue > > > > to > > > > > > > add configuration items, which will cause trouble for users. > > > > > > > We have a lot of configs and it's getting very unwieldy. > > > > > > > in my opinion, we don't change > > > > > > > > > > > > > > > > > > > > > > > > > > > > > `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`, > > > > > > > it is a simple implementation, it doesn't go wrong very often, > > most > > > > > > > users will use it. we can add another ProtobufNativeCheck named > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in > > this > > > > > > > way, we don't need to add this flag. There is no need to > consider > > > > > > > compatibility, it is just a plug-in and will not affect current > > > > logic. > > > > > > > If the user needs it, just change the plugin to the new > > > > implementation > > > > > > > > > > > > > > > ```java > > > > > > > > ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors, > > > > > > > toDescriptor) > > > > > > > > -> { > > > > > > > > for (Descriptors.Descriptor fromDescriptor : > > > > > fromDescriptors) { > > > > > > > > // The default implementation only checks if the > > root > > > > > message > > > > > > > > has changed. > > > > > > > > if > > > > > > > > > > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) > > > > { > > > > > > > > throw new > > ProtoBufCanReadCheckException("Protobuf > > > > > root > > > > > > > > message isn't allow change!"); > > > > > > > > } > > > > > > > > } > > > > > > > > }; > > > > > > > > ``` > > > > > > > > `ValidatorClassName` value also can be set to the current > > > > > implementation > > > > > > > of > > > > > > > > PIP add > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`. > > > > > > > > > > > > > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. > > > Through > > > > > the > > > > > > > flag > > > > > > > > (`ValidatorClassName`) to build different > > > > > > > `ProtobufNativeSchemaValidator`. > > > > > > > > Isn't it just a plug-in? The user can develop and choose a > > > > different > > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change the > > > > logic, > > > > > it > > > > > > > > just allowed him to expand it. > > > > > > > > > > > > > > > > > > > > > > > > I think this PIP should be an enhancement and supplement to > the > > > > > function, > > > > > > > > and there is no such thing as unnecessary and meaningless. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 丛搏 <bog...@apache.org> 于2023年3月7日周二 11:53写道: > > > > > > > > > > > > > > > > > I think we have two ways to do that. > > > > > > > > > > > > > > > > > > First way: We need to advance the improvement of java in > > > > protobuf. > > > > > Ask > > > > > > > > > if they have plans to improve. > > > > > > > > > > > > > > > > > > Second way: the new PROTOBUF_NATIVE > > `SchemaCompatibilityCheck` > > > > > should > > > > > > > > > be implemented as a plugin, don't change any existing > plugin > > > > logic > > > > > > > > > (it's simple and already used). I don't recommend adding > > flags > > > > for > > > > > > > > > rollback, it adds configuration and makes little sense. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Bo > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月6日周一 23:00写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you convert the code block which is actually a quote > in > > > the > > > > > > > > > > beginning of the PIP to something which doesn't require > to > > > > scroll > > > > > > > > > > horizontally so much? > > > > > > > > > > Use > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text > > > > > > > > > > > > > > > > > > > > Let's improve the clarity of what you wrote: > > > > > > > > > > > > > > > > > > > > "the PROTOBUF uses avro struct to store." > > > > > > > > > > --> > > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes > > the > > > > > object > > > > > > > given > > > > > > > > > > to it as message data is an auto-generated POJO > containing > > > the > > > > > > > > > annotations > > > > > > > > > > encoding the schema. The client is using a converter, > which > > > > > converts > > > > > > > a > > > > > > > > > > Protobuf schema descriptor into an Avro schema and sends > > that > > > > as > > > > > the > > > > > > > > > Schema > > > > > > > > > > of the producer/consumer. > > > > > > > > > > > > > > > > > > > > "On the broker side, protobuf and avro both use > SchemaData > > > > > converted > > > > > > > to > > > > > > > > > > org.apache.avro.Schema." > > > > > > > > > > --> > > > > > > > > > > Since the schema is an Avro schema, the implementation of > > > > > > > compatibility > > > > > > > > > > check on the broker side is to simply re-use the > > > compatibility > > > > > check > > > > > > > of > > > > > > > > > the > > > > > > > > > > AVRO schema type. > > > > > > > > > > > > > > > > > > > > "ProtobufSchema is different from ProtobufNativeSchema in > > > > schema > > > > > > > > > > compatibility check it uses avro-protobuf. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview > > > > > > > > > > But the current implementation of ProtobufNative schema > > > > > compatibility > > > > > > > > > > check only > > > > > > > > > > checked if the root message name is changed." > > > > > > > > > > > > > > > > > > > > --> > > > > > > > > > > PROTOBUF_NATIVE schema type is different. > > > > > > > > > > The client is actually using Protobuf Descriptor as the > > > schema, > > > > > as > > > > > > > > > opposed > > > > > > > > > > to Avro schema of PROTOBUF schema type. In the broker, > the > > > > > > > > > PROTOBUF_NATIVE > > > > > > > > > > compatibility check actually hasn't implemented any rule, > > > > besides > > > > > > > one: > > > > > > > > > > checking if the root message name has changed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. For now, there is no official or third-party > > solution > > > > for > > > > > > > > > ProtoBuf > > > > > > > > > > > compatibility. If in the future have better > solutions > > > of a > > > > > third > > > > > > > > > party or > > > > > > > > > > > the official, we develop new > > > ProtobufNativeSchemaValidator > > > > > and > > > > > > > use, > > > > > > > > > so > > > > > > > > > > > add a flag. > > > > > > > > > > > > > > > > > > > > > > Who do you need to make that configurable? Once you > > found a > > > > > third > > > > > > > > > party, > > > > > > > > > > just switch to it? Who knows, maybe you never will. > > Introduce > > > > it > > > > > > > when you > > > > > > > > > > find it, not now. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck > > > BACKWARD, > > > > > > > FORWARD > > > > > > > > > > > these strategies. As with the AVRO implementation, > > protobuf > > > > > > > > > compatibility > > > > > > > > > > > checking need implementing the canRead method. *This > will > > > > check > > > > > > > that > > > > > > > > > > > the writtenschema can be read by readSchema.* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I completely disagree. > > > > > > > > > > Avro implementation is confusing for our use case. Don't > > copy > > > > > that. > > > > > > > > > > > > > > > > > > > > You have > > > > > > > > > > > > > > > > > > > > public void checkCompatible(SchemaData from, SchemaData > to, > > > > > > > > > > SchemaCompatibilityStrategy strategy) > > > > > > > > > > throws IncompatibleSchemaException { > > > > > > > > > > Descriptor fromDescriptor = > > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData()); > > > > > > > > > > Descriptor toDescriptor = > > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData()); > > > > > > > > > > switch (strategy) { > > > > > > > > > > case BACKWARD_TRANSITIVE: > > > > > > > > > > case BACKWARD: > > > > > > > > > > case FORWARD_TRANSITIVE: > > > > > > > > > > case FORWARD: > > > > > > > > > > case FULL_TRANSITIVE: > > > > > > > > > > case FULL: > > > > > > > > > > checkRootMessageChange(fromDescriptor, > > > > toDescriptor, > > > > > > > > > strategy); > > > > > > > > > > return; > > > > > > > > > > case ALWAYS_COMPATIBLE: > > > > > > > > > > return; > > > > > > > > > > default: > > > > > > > > > > throw new > IncompatibleSchemaException("Unknown > > > > > > > > > > SchemaCompatibilityStrategy."); > > > > > > > > > > } > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > I would rename : > > > > > > > > > > from --> currentSchema > > > > > > > > > > to --> newSchema > > > > > > > > > > > > > > > > > > > > Use that switch case and have a method for each like: > > > > > > > > > > validateBackwardsCompatibility(currentSchema, newSchema) > > > > > > > > > > > > > > > > > > > > I dislike canRead and usage of writtenSchema, since you > > have > > > > two > > > > > > > > > completely > > > > > > > > > > different use cases: from the producing side and the > > consumer > > > > > side. > > > > > > > > > > > > > > > > > > > > schemaValidatorBuilder > > > > > > > > > > > > > > > > > > > > > > I dislike this proposal. IMO Avro implementation is way > > too > > > > > > > > > complicated. > > > > > > > > > > Why not have a simple function for validation for each > > switch > > > > > case > > > > > > > above? > > > > > > > > > > Why do we need strategy and builder, and all this > > complexity? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Here are the basic compatibility rules we've defined:* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > IMO it's impossible to read the validation rules as you > > > > described > > > > > > > them. > > > > > > > > > > I wrote how they should be structured numerous times > above. > > > > > > > > > > I can't validate them. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > IMO, the current design is very hard to read. > > > > > > > > > > Please try to avoid jumping into code sections. > > > > > > > > > > Write a high level design section, in which you describe > in > > > > words > > > > > > > what > > > > > > > > > you > > > > > > > > > > plan to do. > > > > > > > > > > Write the validation rules in the structure that is easy > to > > > > > > > understand: > > > > > > > > > > rules per each compatibility check, and use proper words > > > > (current > > > > > > > schema, > > > > > > > > > > new schema), since new schema can be once used for read > and > > > > once > > > > > > > used for > > > > > > > > > > write. > > > > > > > > > > > > > > > > > > > > In its current form it takes too much time to understand > > the > > > > > design, > > > > > > > and > > > > > > > > > it > > > > > > > > > > shouldn't be the case. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Asaf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu < > > > > liusinan1...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi! I updated the explanation of some things in the PIP > > > > issue. > > > > > And > > > > > > > also > > > > > > > > > > > added a new “flag” in the conf is used as the different > > > > > > > > > > > ProtobufNativeSchemaValidator implementation, also set > > > > > > > > > > > ProtobufNativeSchemaValidator default only check > whether > > > the > > > > > name > > > > > > > of > > > > > > > > > the > > > > > > > > > > > root message is the same. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月5日周日 > > 20:21写道: > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu < > > > > > liusinan1...@gmail.com > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema > > > descriptor > > > > > can > > > > > > > be > > > > > > > > > > > > validated > > > > > > > > > > > > > > for backward compatibility check using Avro based > > > > > > > compatibility > > > > > > > > > > > rules? > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it > is > > > > > actually a > > > > > > > > > > > Protobuf > > > > > > > > > > > > > > descriptor? > > > > > > > > > > > > > > Is there some translation happening? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. *You can take a quick look at the previous > design, > > > the > > > > > > > PROTOBUF > > > > > > > > > uses > > > > > > > > > > > > > avro struct to store.* > > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ok. So to summarize your code (easier to write it > than > > > send > > > > > > > links): > > > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema, > > actually > > > > > > > converts > > > > > > > > > the > > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using code > > found > > > > > inside > > > > > > > Avro > > > > > > > > > > > > library) and saves that Avro schema as the schema. > It's > > > not > > > > > > > saving > > > > > > > > > the > > > > > > > > > > > > protobuf descriptor at all. Very confusing I have to > > add > > > - > > > > > never > > > > > > > > > expected > > > > > > > > > > > > that. > > > > > > > > > > > > This explains why In the > > ProtobufSchemaCompatibilityCheck > > > > > they > > > > > > > just > > > > > > > > > > > extend > > > > > > > > > > > > the Avro without doing any translation. > > > > > > > > > > > > > > > > > > > > > > > > Thanks for that. > > > > > > > > > > > > > > > > > > > > > > > > Now thatI finally understand this, I can say that: > you > > > > *must* > > > > > > > explain > > > > > > > > > > > that > > > > > > > > > > > > in the motivation part in your PIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both use > > > > > `SchemaData` > > > > > > > > > > > converted > > > > > > > > > > > > > to `org.apache.avro.Schema`.* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Actually those links don't really help. > > > > > > > > > > > > The main link that helps is: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm sorry - I don't understand. > > > > > > > > > > > > > > I understand the different compatibility check > > > > > strategies. > > > > > > > > > > > > > > If you just spell them out here, then as you say, > > > just > > > > > > > translate > > > > > > > > > the > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run > the > > > > Avro > > > > > > > > > > > > > > compatibility validation, no? > > > > > > > > > > > > > > I believe the answer is no, since you may want to > > > > verify > > > > > > > > > different > > > > > > > > > > > > things > > > > > > > > > > > > > > when it comes to Protobuf, which are different > then > > > > Avro. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. > > > > > > > > > > > > > *ProtobufSchema is different from > > ProtobufNativeSchema > > > in > > > > > that > > > > > > > it > > > > > > > > > uses > > > > > > > > > > > > > avro-protobuf.* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview > > > > > > > > > > > > > *ProtobufNativeSchema needs a native compatibility > > > check, > > > > > but > > > > > > > > > there is > > > > > > > > > > > no > > > > > > > > > > > > > official or third party implementation. So this PIP > > > does > > > > > not > > > > > > > use > > > > > > > > > > > > > avro-protobuf for protobuf compatibility checking.* > > > > > > > > > > > > > > > > > > > > > > > > > > 2. *By the way, this is implemented in much the > same > > > way > > > > > that > > > > > > > > > Apache > > > > > > > > > > > avro > > > > > > > > > > > > > does compatibility checking.* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java > > > > > > > > > > > > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java > > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of > > > > > `canRead()` are > > > > > > > > > > > > > writtenSchema and readSchema. We only need to > change > > > the > > > > > order > > > > > > > of > > > > > > > > > > > > arguments > > > > > > > > > > > > > we pass to `canRead()`.* > > > > > > > > > > > > > ```java > > > > > > > > > > > > > private void > > > validateWithStrategy(Descriptors.Descriptor > > > > > > > > > toValidate, > > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws > > > > > > > > > > > > ProtoBufCanReadCheckException > > > > > > > > > > > > > { > > > > > > > > > > > > > switch (strategy) { > > > > > > > > > > > > > case CanReadExistingStrategy -> > > canRead(fromDescriptor, > > > > > > > > > toValidate); > > > > > > > > > > > > > case CanBeReadByExistingStrategy -> > > canRead(toValidate, > > > > > > > > > > > fromDescriptor); > > > > > > > > > > > > > case CanBeReadMutualStrategy -> { > > > > > > > > > > > > > canRead(toValidate, fromDescriptor); > > > > > > > > > > > > > canRead(fromDescriptor, toValidate); > > > > > > > > > > > > > } > > > > > > > > > > > > > } > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > private void canRead(Descriptors.Descriptor > > > > writtenSchema, > > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws > > > > > > > > > > > ProtoBufCanReadCheckException { > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema, > > > > > > > > > > > > > readSchema); > > > > > > > > > > > > > } > > > > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I get that you want to take inspiration from the > > existing > > > > > Avro > > > > > > > Schema > > > > > > > > > > > > compatibility check, to do your code design. > > > > > > > > > > > > I also understand you *won't* use any existing avro > > code > > > > for > > > > > > > that. > > > > > > > > > > > > I also understand, you have to write the validation > > check > > > > on > > > > > your > > > > > > > > > own, > > > > > > > > > > > > since there is no 3rd party to explain that. > > > > > > > > > > > > > > > > > > > > > > > > The only thing I can't understand are the actual > rules > > > as I > > > > > wrote > > > > > > > > > before, > > > > > > > > > > > > since they are written confusingly. > > > > > > > > > > > > So, I repeat what I asked before: > > > > > > > > > > > > > > > > > > > > > > > > I think you should structure the validation rules > > > > > differently: > > > > > > > > > > > > > > > > > > > > > > > > * Backward checks > > > > > > > > > > > > ** List down rules, where use newSchema (the schema > > used > > > by > > > > > > > producer > > > > > > > > > or > > > > > > > > > > > > consumer) and existingSchema (last schema used) > > > > > > > > > > > > * Forward > > > > > > > > > > > > ** List down rules, where use newSchema (the schema > > used > > > by > > > > > > > producer > > > > > > > > > or > > > > > > > > > > > > consumer) and existingSchema (last schema used) > > > > > > > > > > > > > > > > > > > > > > > > Once that's accomplished I will be able to understand > > the > > > > > > > different > > > > > > > > > > > > validation rules for each compatibility check. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> 于2023年3月1日周三 > > > > 21:19写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu < > > > > > > > > > liusinan1...@gmail.com> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I read it and they look identical. What's the > > > > > difference > > > > > > > > > between > > > > > > > > > > > > > them? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all > > > > > implemented > > > > > > > > > based > > > > > > > > > > > on > > > > > > > > > > > > > AVRO. > > > > > > > > > > > > > > > > What do you mean, they are all implemented > > based > > > on > > > > > > > Avro? You > > > > > > > > > > > mean > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro > > Schema, > > > > and > > > > > > > then > > > > > > > > > you > > > > > > > > > > > use > > > > > > > > > > > > > Avro > > > > > > > > > > > > > > > > compatibility validation? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck` > > > > > > > > > > > > > > > They all extends > > > `AvroSchemaBasedCompatibilityCheck`, > > > > > the > > > > > > > > > > > > > > > `checkCompatible()` is the same implementation > > with > > > > > `AVRO`. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema > > > descriptor > > > > > can > > > > > > > be > > > > > > > > > > > > validated > > > > > > > > > > > > > > for backward compatibility check using Avro based > > > > > > > compatibility > > > > > > > > > > > rules? > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it > is > > > > > actually a > > > > > > > > > > > Protobuf > > > > > > > > > > > > > > descriptor? > > > > > > > > > > > > > > Is there some translation happening? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you should structure the validation > rules > > > > > > > differently: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The Compatibility check strategy is described > on > > > the > > > > > > > website > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy > > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers > > > using > > > > > > > schema > > > > > > > > > V3 can > > > > > > > > > > > > > process > > > > > > > > > > > > > > > data written by producers using the last schema > > > > version > > > > > > > V2. So > > > > > > > > > V2 > > > > > > > > > > > is > > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema". > > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): > > Consumers > > > > > using > > > > > > > the > > > > > > > > > last > > > > > > > > > > > > > schema > > > > > > > > > > > > > > > version V2 can process data written by > producers > > > > using > > > > > a > > > > > > > new > > > > > > > > > schema > > > > > > > > > > > > V3, > > > > > > > > > > > > > > > even though they may not be able to use the > full > > > > > > > capabilities > > > > > > > > > of > > > > > > > > > > > the > > > > > > > > > > > > > new > > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is > > > > > "readSchema". > > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are > > both > > > > > > > backward and > > > > > > > > > > > > forward > > > > > > > > > > > > > > > compatible. > > > > > > > > > > > > > > > Schema can evolve. The old version schema and > the > > > new > > > > > > > version > > > > > > > > > > > schema > > > > > > > > > > > > > should > > > > > > > > > > > > > > > be well understood. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm sorry - I don't understand. > > > > > > > > > > > > > > I understand the different compatibility check > > > > > strategies. > > > > > > > > > > > > > > If you just spell them out here, then as you say, > > > just > > > > > > > translate > > > > > > > > > the > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run > the > > > > Avro > > > > > > > > > > > > > > compatibility validation, no? > > > > > > > > > > > > > > I believe the answer is no, since you may want to > > > > verify > > > > > > > > > different > > > > > > > > > > > > things > > > > > > > > > > > > > > when it comes to Protobuf, which are different > then > > > > Avro. > > > > > > > > > > > > > > > > > > > > > > > > > > > > At the current state, I can't understand your > > design > > > at > > > > > all. > > > > > > > > > Please > > > > > > > > > > > > help > > > > > > > > > > > > > > clarify that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So each strategy should have its own section. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema > > and > > > > > > > readSchema. > > > > > > > > > As > > > > > > > > > > > > we've > > > > > > > > > > > > > > > just described, we only need to change the > order > > of > > > > > > > arguments > > > > > > > > > we > > > > > > > > > > > pass > > > > > > > > > > > > > to > > > > > > > > > > > > > > > `canRead()`. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> > > 于2023年2月27日周一 > > > > > 20:49写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > And you can see the difference between > > ProtoBuf > > > > and > > > > > > > > > > > > ProtoBufNative: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I read it and they look identical. What's > the > > > > > difference > > > > > > > > > between > > > > > > > > > > > > > them? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are > all > > > > > > > implemented > > > > > > > > > based > > > > > > > > > > > > on > > > > > > > > > > > > > > > AVRO. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What do you mean, they are all implemented > > based > > > on > > > > > > > Avro? You > > > > > > > > > > > mean > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro > > Schema, > > > > and > > > > > > > then > > > > > > > > > you > > > > > > > > > > > use > > > > > > > > > > > > > Avro > > > > > > > > > > > > > > > > compatibility validation? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Here are the basic compatibility rules > we've > > > > > defined:* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think you should structure the validation > > rules > > > > > > > > > differently: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Backward checks > > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the > > > schema > > > > > used > > > > > > > by > > > > > > > > > > > > producer > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema > used) > > > > > > > > > > > > > > > > * Forward > > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the > > > schema > > > > > used > > > > > > > by > > > > > > > > > > > > producer > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema > used) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So each strategy should have its own section. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm saying this since you used > "writttenSchema" > > > > word > > > > > but > > > > > > > it > > > > > > > > > > > > > represents > > > > > > > > > > > > > > > > something completely different if it's > backward > > > or > > > > > > > forward > > > > > > > > > check. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Once you'll have that structure like that, I > > > > > personally > > > > > > > will > > > > > > > > > be > > > > > > > > > > > > able > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > read and understand it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The motivation and problem statement are now > > > good - > > > > > > > thanks > > > > > > > > > for > > > > > > > > > > > > > improving > > > > > > > > > > > > > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu < > > > > > > > > > > > liusinan1...@gmail.com> > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This > time > > > I've > > > > > added > > > > > > > > > some > > > > > > > > > > > > > background > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > some explanations. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The compatibility check rules are already > > > written > > > > > in > > > > > > > the > > > > > > > > > > > > > > > Implementation. > > > > > > > > > > > > > > > > > ProtoBufNative implements the same canRead > > > method > > > > > as > > > > > > > Apache > > > > > > > > > > > Avro. > > > > > > > > > > > > > > > > > It does this by checking whether the schema > > for > > > > > > > writing and > > > > > > > > > > > > reading > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > compatible. I also indicate whether the > > > > > writtenSchema > > > > > > > and > > > > > > > > > > > > > readSchema of > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > Backward, Forward, and Full strategies are > > the > > > > old > > > > > or > > > > > > > the > > > > > > > > > new > > > > > > > > > > > > > version > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > the schema. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> > > > > 于2023年2月26日周日 > > > > > > > 23:24写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of > > > > background > > > > > > > > > knowledge, > > > > > > > > > > > so > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > need > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > add IMO for people to understand it. You > > > don't > > > > > need > > > > > > > to > > > > > > > > > > > explain > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > entire > > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very > least a > > > few > > > > > > > > > paragraphs > > > > > > > > > > > > > detailing > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > > > you need to know, to put you in context: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Start by saying Pulsar as a built-in > > > > schema > > > > > > > registry > > > > > > > > > > > > inside > > > > > > > > > > > > > > > Pulsar > > > > > > > > > > > > > > > > > > broker. > > > > > > > > > > > > > > > > > > - Every time the client updates the > > > > > schema, it > > > > > > > > > uploads > > > > > > > > > > > it > > > > > > > > > > > > > to > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > broker. When that happens, it has a > > > > feature > > > > > > > which > > > > > > > > > > > > validates > > > > > > > > > > > > > if > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > new > > > > > > > > > > > > > > > > > > schema version is compatible with > the > > > > > previous > > > > > > > > > > > versions. > > > > > > > > > > > > > There > > > > > > > > > > > > > > > > > > are 4 types > > > > > > > > > > > > > > > > > > of compatibility: Full, ... > (complete > > > and > > > > > > > explain > > > > > > > > > each > > > > > > > > > > > > one > > > > > > > > > > > > > > > > briefly) > > > > > > > > > > > > > > > > > > - Also explain Pulsar Schema registry > > > > supports > > > > > > > various > > > > > > > > > > > > schema > > > > > > > > > > > > > > > > > > protocols: Avro, protobuf native, ... > > > > > (complete > > > > > > > the > > > > > > > > > > > rest), > > > > > > > > > > > > > each > > > > > > > > > > > > > > > > > > protocol > > > > > > > > > > > > > > > > > > has a schema which dictates how to > > > serialize > > > > > and > > > > > > > > > > > deserialize > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > message > > > > > > > > > > > > > > > > > > content into typed object. > > > > > > > > > > > > > > > > > > - Explain in short what is protobuf > > native > > > > > > > (compare > > > > > > > > > > > protobuf > > > > > > > > > > > > > > > > > non-native) > > > > > > > > > > > > > > > > > > - Please don't paste code instead of > > > > > explaining. > > > > > > > > > > > > > > > > > > - Explain that protobuf native > > current > > > > > > > validation > > > > > > > > > check > > > > > > > > > > > > is > > > > > > > > > > > > > only > > > > > > > > > > > > > > > > > > composed of checking the root > message > > > > name > > > > > is > > > > > > > the > > > > > > > > > same > > > > > > > > > > > > > between > > > > > > > > > > > > > > > > > > the current > > > > > > > > > > > > > > > > > > schema version and the new version. > > > > > > > > > > > > > > > > > > - Explain briefly what is a root > > > > message > > > > > > > and its > > > > > > > > > > > name. > > > > > > > > > > > > > > > > > > - Explain the problem (list > > scenarios) > > > > > that we > > > > > > > have > > > > > > > > > > > > because > > > > > > > > > > > > > > > > > protobuf > > > > > > > > > > > > > > > > > > native schema only supports FULL > > > > > compatibility > > > > > > > > > > > > validation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding high level design - as in what > > you > > > > > plan to > > > > > > > do. > > > > > > > > > > > > > > > > > > I suggest you add "High Level Design" and > > in > > > it > > > > > > > detail > > > > > > > > > how > > > > > > > > > > > you > > > > > > > > > > > > > plan > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > validate, per protobuf version, per > > > > compatibility > > > > > > > check > > > > > > > > > > > > > (backward, > > > > > > > > > > > > > > > > > forward, > > > > > > > > > > > > > > > > > > full,...). > > > > > > > > > > > > > > > > > > I tried reading the implementation - for > > me , > > > > > it's > > > > > > > all > > > > > > > > > over > > > > > > > > > > > the > > > > > > > > > > > > > > > place. > > > > > > > > > > > > > > > > > Can > > > > > > > > > > > > > > > > > > you please list in order what I wrote > > above, > > > > and > > > > > > > list the > > > > > > > > > > > > > validation > > > > > > > > > > > > > > > > > rules > > > > > > > > > > > > > > > > > > with a good explanation why you validate > it > > > > like > > > > > > > that? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Lastly, one you have all the validation > > rules > > > > > clearly > > > > > > > > > stated, > > > > > > > > > > > > you > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > > > it to document it properly so users can > > know > > > > what > > > > > > > > > validation > > > > > > > > > > > to > > > > > > > > > > > > > > > expect. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Asaf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan > Liu < > > > > > > > > > > > > > liusinan1...@gmail.com> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code > and > > > > > > > described the > > > > > > > > > > > > design > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > improve > > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema > compatibility > > > > > checks. > > > > > > > You > > > > > > > > > can > > > > > > > > > > > > have a > > > > > > > > > > > > > > > look. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mes...@gmail.com> > > > > > 于2023年2月22日周三 > > > > > > > > > 21:16写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I read it but you're almost directly > > > diving > > > > > into > > > > > > > the > > > > > > > > > > > code - > > > > > > > > > > > > > it > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > take > > > > > > > > > > > > > > > > > > > me > > > > > > > > > > > > > > > > > > > > hours just to reverse engineer your > > > design. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you please include a "High Level > > > > Design" > > > > > > > section > > > > > > > > > in > > > > > > > > > > > > which > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > > explain > > > > > > > > > > > > > > > > > > > > how you plan to tackle any issue? > > > > > > > > > > > > > > > > > > > > If I can read that section and > explain > > to > > > > > someone > > > > > > > > > else > > > > > > > > > > > how > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > > work, > > > > > > > > > > > > > > > > > > > > it means the section is complete. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan > > > Liu < > > > > > > > > > > > > > > > liusinan1...@gmail.com> > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss: > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565 > > > > > > > > > > > > > > > > > > > . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We can talk about the current > design > > > > here. > > > > > > > > > Especially > > > > > > > > > > > for > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > field > > > > > > > > > > > > > > > > > > > type > > > > > > > > > > > > > > > > > > > > > change check rules, please give > your > > > > > valuable > > > > > > > > > advice. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > Sinan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >