Hi Enrico, Great questions. The primary objective of the CloudEvents specification was to improve interoperability across technologies. The specification was created to enable technologies to create bindings or adapters to support interchange, and it seems to be gaining momentum. Azure and GCP have invested heavily into it, and AWS and RedHat have also made investments. A number of CNCF technologies are adding support as well. If Pulsar supports CloudEvents, then as new technologies support CloudEvents, we get interoperability with them for free. The book *Nail It Then Scale It* mentions that 3rd party integrations are key to getting adoption, and CloudEvents could help Pulsar accomplish that. (I should mention that Kafka already has support for CloudEvents.)
Another notable benefit to adopting the CloudEvents specification is support for the JSON Schema specification. Pulsar internally was standardized on Avro to simplify the architecture for schemas. Since that decision was made, the JSON Schema specification has matured considerably. (Adoption of JSON Schema by OpenAPI 3.1 and AsyncAPI is evidence of that maturity.) I've come across companies that have data quality needs that require validating each message. For example, companies using Pulsar for financial processing often need to verify that each message conforms to the consumers' data contracts or risk finanancial impact to customers as producers evolve. The Change Management burden without built-in message validation can be substantial. I documented a number of business cases in this comment: https://github.com/cloudevents/spec/issues/1052#issuecomment-1249260590. (For context, that thread was about adding validation to the CloudEvents header, but validation of the message body is already supported in CloudEvents, and the scenarios I listed are still applicable here.) I also learned that adoption for JSON Schema hasn't been more widespread because they haven't created a "release" version of the spec (see this comment <https://github.com/json-schema-org/json-schema-spec/pull/1277#issuecomment-1223038171>); but, I learned this situation was more of a technicality of their relationship with IETF, and they're progressing <https://github.com/json-schema-org/json-schema-spec/pull/1277#issuecomment-1261365741> towards on a resolution. Jerry Peng raised the point that Avro has utilities for schema compatability checks whereas JSON Schema does not. I spoke with the maintainers of JSON Schema, and they said a workaround is to validate a mesage against a prior version of the schema. With that said, the current implementation of Avro in Pulsar doesn't actually validate the messages, so there's currently no way to guarantee in Pulsar that a message is compatable with a new schema version anyway. So, I'm not sure how much benefit we're getting from the compatability checking in Avro. (I can see how it could be useful when mapping a Pulsar schema to database tables, but it doesn't guarantee that the messages themselves are compatable with the table definition, which seems to be the bigger issue.) Content-sensitive apps would benefit from a Pulsar feature that allows invalid messages to be sent to an "invalid message" topic for alerting, inspection, and re-processing. Jerry also raised the performance impact of validating every message; however, not every topic needs to validate *every *message. Some topics might benefit from a statistical validation where a percentage (let's say only 1%) of messages are validated. Anyway, these are implementation details that could be worked out. I think the business cases I linked above will help explain the need. I hope this helps. Devin G. Bost On Wed, Sep 7, 2022 at 4:58 AM Enrico Olivelli <eolive...@gmail.com> wrote: > Devin, > thanks for bringing up this discussion. > > I have one high level question: what is the goal that we want to achieve ? > something like: > 1) Use CloudEvents format natively in Pulsar Schema registry, so that > Pulsar clients can register their schema using that format > 2) Publish on some HTTP endpoint the Schemas saved in the Pulsar > Schema Registry in a way that non-Pulsar clients (like WebServices) > can consume Pulsar messages > 3) other > > I agree that supporting CloudEvents would be great in Pulsar and we > should do something. > > If you have a real world use case to share we can start by that use > case, that will help a lot > > Enrico > > > Il giorno lun 5 set 2022 alle ore 18:07 Devin Bost > <devin.b...@gmail.com> ha scritto: > > > > Maybe this is something we could discuss as part of Pulsar 3.0? > > Seems like there's a pretty big difference between SchemaInfo and > > CloudEvents in terms of the fields. > > > > CloudEvents requires: > > id: String > > source: URI-reference > > specversion: String > > type: String > > > > and optionally: > > datacontenttype: String > > dataschema: URI (compliant with JSON Schema specification 07) > > subject: String > > time: Timestamp > > > > For JSON, CloudEvents uses the JSON Schema spec for validation. > > > > In contrast, Pulsar's SchemaInfo has: > > name: String > > schema: byte[] > > type: SchemaType > > properties: Map<String, String> > > propertiesSet: bool > > timestamp: long > > > > > > -- > > Devin Bost > > Sent from mobile > > Cell: 801-400-4602 > > > > On Fri, Sep 2, 2022, 5:33 PM Devin Bost <devin.b...@gmail.com> wrote: > > > > > Hi recently discovered the discussion around creating a CloudEvent > binding > > > for Pulsar. https://github.com/cloudevents/spec/pull/237 > > > > > > It appears that Pulsar doesn't meet their minimum requirements due to > lack > > > of a standard protocol. See > https://github.com/cloudevents/spec/pull/254 > > > > > > Their comment says: > > > > > > "For a protocol or encoding to qualify for a core CloudEvents event > format > > > or protocol binding, it must belong to either one of the following > > > categories: > > > - The protocol has a formal status as a standard with a > widely-recognized > > > multi-vendor protocol standardization body (e.g. W3C, IETF, OASIS, ISO) > > > - The protocol has a "de-facto standard" status for its ecosystem > category > > > which means it is used so widely that it is considered a standard for a > > > given application. Practically, we would like to see at least one open > > > source implementation and at least a dozen independent vendors using > it in > > > their products/services. " > > > > > > As CloudEvents is gaining momentum within CNCF, this may become a > problem. > > > > > > Has their been any discussion around standardization and how we might > meet > > > this requirement? > > > > > > -- > > > Devin Bost > > > Sent from mobile > > > Cell: 801-400-4602 > > > >