Re: Data quality problem

Dave Fisher Fri, 11 Nov 2022 10:32:33 -0800

> On Nov 11, 2022, at 6:56 AM, Elliot West 
> <elliot.w...@streamnative.io.INVALID> wrote:
> 
> Hey Devin,
> 
> *"Kafka conforms to the JSON Schema specification"*
> Only when using Confluent's Schema Registry.

If that is true then Apache Kafka does NOT conform while Confluent does. Can 
you point to some documentation?

> 
> *"if a producer makes a change or omission, such as in a value used for
> tracking, it might not surface until way down the line"*
> So let me understand this: Although the producer has a schema, it does not
> use it for validation of JSON (as would implicitly occur for Avro? Is this
> correct?
> 
> I agree that robust support for schema, certainly at the edges, is a
> cornerstone for a data system. I also agree that it would be better to
> adopt existing standards rather than implement them in a bespoke manner.
> 
> I'd be interested to hear your thoughts on concrete improvements that you
> believe would be necessary - for example:
> 
> * Producer validation of JSON occurs using "JSON Schema"
> * Evolutions of JSON Schema conform to ...
> * Users can declare topic schema using a JSON Schema document
> * Users can query topic schema and have a JSON schema document returned to
> them
> 
> Thanks,
> 
> Elliot.
> 
> 
> 
> 
> 
> 
> On Thu, 10 Nov 2022 at 16:51, Devin Bost <devin.b...@gmail.com> wrote:
> 
>> One of the areas where Kafka has an advantage over Pulsar is around data
>> quality. Kafka conforms to the JSON Schema specification, which enables
>> integration with any technology that conforms to the standard, such as for
>> data validation, discoverability, lineage, versioning, etc.
>> Pulsar's implementation is non-compliant with the standard, and producers
>> and consumers have no built-in way in Pulsar to validate that values in
>> their messages match expectations. As a consequence, if a producer makes a
>> change or omission, such as in a value used for tracking, it might not
>> surface until way down the line, and then it can be very difficult to track
>> down the source of the problem, which kills the agility of teams
>> responsible for maintaining apps using Pulsar. It's also bad PR because
>> then incidents are associated with Pulsar, even though the business might
>> not understand that the data problem wasn't necessarily caused by Pulsar.
>> 
>> What's the right way for us to address this problem?
>> 
>> --
>> Devin Bost
>> Sent from mobile
>> Cell: 801-400-4602
>> 
> 
> 
> -- 
> 
> Elliot West
> 
> Senior Platform Engineer
> 
> elliot.w...@streamnative.io
> 
> streamnative.io
> 
> <https://github.com/streamnative>
> <https://www.linkedin.com/company/streamnative>
> <https://twitter.com/streamnativeio>
Re: Data quality problem

Reply via email to