Re: Data quality problem

Matteo Merli Fri, 11 Nov 2022 15:28:17 -0800

Kafka does not have a schema registry to begin with. Confluent does have
it.


On Fri, Nov 11, 2022 at 10:32 AM Dave Fisher <w...@apache.org> wrote:

>
>
> > On Nov 11, 2022, at 6:56 AM, Elliot West 
> > <elliot.w...@streamnative.io.INVALID>
> wrote:
> >
> > Hey Devin,
> >
> > *"Kafka conforms to the JSON Schema specification"*
> > Only when using Confluent's Schema Registry.
>
> If that is true then Apache Kafka does NOT conform while Confluent does.
> Can you point to some documentation?
>
> >
> > *"if a producer makes a change or omission, such as in a value used for
> > tracking, it might not surface until way down the line"*
> > So let me understand this: Although the producer has a schema, it does
> not
> > use it for validation of JSON (as would implicitly occur for Avro? Is
> this
> > correct?
> >
> > I agree that robust support for schema, certainly at the edges, is a
> > cornerstone for a data system. I also agree that it would be better to
> > adopt existing standards rather than implement them in a bespoke manner.
> >
> > I'd be interested to hear your thoughts on concrete improvements that you
> > believe would be necessary - for example:
> >
> > * Producer validation of JSON occurs using "JSON Schema"
> > * Evolutions of JSON Schema conform to ...
> > * Users can declare topic schema using a JSON Schema document
> > * Users can query topic schema and have a JSON schema document returned
> to
> > them
> >
> > Thanks,
> >
> > Elliot.
> >
> >
> >
> >
> >
> >
> > On Thu, 10 Nov 2022 at 16:51, Devin Bost <devin.b...@gmail.com> wrote:
> >
> >> One of the areas where Kafka has an advantage over Pulsar is around data
> >> quality. Kafka conforms to the JSON Schema specification, which enables
> >> integration with any technology that conforms to the standard, such as
> for
> >> data validation, discoverability, lineage, versioning, etc.
> >> Pulsar's implementation is non-compliant with the standard, and
> producers
> >> and consumers have no built-in way in Pulsar to validate that values in
> >> their messages match expectations. As a consequence, if a producer
> makes a
> >> change or omission, such as in a value used for tracking, it might not
> >> surface until way down the line, and then it can be very difficult to
> track
> >> down the source of the problem, which kills the agility of teams
> >> responsible for maintaining apps using Pulsar. It's also bad PR because
> >> then incidents are associated with Pulsar, even though the business
> might
> >> not understand that the data problem wasn't necessarily caused by
> Pulsar.
> >>
> >> What's the right way for us to address this problem?
> >>
> >> --
> >> Devin Bost
> >> Sent from mobile
> >> Cell: 801-400-4602
> >>
> >
> >
> > --
> >
> > Elliot West
> >
> > Senior Platform Engineer
> >
> > elliot.w...@streamnative.io
> >
> > streamnative.io
> >
> > <https://github.com/streamnative>
> > <https://www.linkedin.com/company/streamnative>
> > <https://twitter.com/streamnativeio>
>
> --
--
Matteo Merli
<matteo.me...@gmail.com>

Re: Data quality problem

Reply via email to