Data quality problem

Devin Bost Thu, 10 Nov 2022 08:51:21 -0800

One of the areas where Kafka has an advantage over Pulsar is around data
quality. Kafka conforms to the JSON Schema specification, which enables
integration with any technology that conforms to the standard, such as for
data validation, discoverability, lineage, versioning, etc.
Pulsar's implementation is non-compliant with the standard, and producers
and consumers have no built-in way in Pulsar to validate that values in
their messages match expectations. As a consequence, if a producer makes a
change or omission, such as in a value used for tracking, it might not
surface until way down the line, and then it can be very difficult to track
down the source of the problem, which kills the agility of teams
responsible for maintaining apps using Pulsar. It's also bad PR because
then incidents are associated with Pulsar, even though the business might
not understand that the data problem wasn't necessarily caused by Pulsar.


What's the right way for us to address this problem?

--
Devin Bost
Sent from mobile
Cell: 801-400-4602

Data quality problem

Reply via email to