Hi all,

I'd like to propose deprecating the legacy Jackson JsonSchema format
support for SchemaType.JSON and enforcing strict Avro schema validation by
default.

Background

In Pulsar 2.0, JSONSchema originally used Jackson's JsonSchemaGenerator to
produce schema definitions in the JSON Schema Draft standard (e.g.,
{"type":"object","properties":{...}}). In Pulsar 2.1 (commit 1893323bc2, PR
#2071), we standardized on Avro schema format for all
  structured schemas, including SchemaType.JSON. The schema definition
stored in SchemaInfo.schema was changed to Avro format (e.g.,
{"type":"record","fields":[...]}), while the message payload remains plain
JSON.

To maintain backward compatibility with schemas created during the 2.0 era,
fallback logic was added in several places to accept the old Jackson format:

- StructSchemaDataValidator — falls back to Jackson JsonSchema parsing when
Avro parsing fails
- JsonSchemaCompatibilityCheck — silently allows mixed old/new format
combinations
- ProducerImpl — sends old format to brokers below protocol v13

The Problem

This fallback is too lenient. It accepts any valid JSON as a schema
definition for SchemaType.JSON, not just the legacy Jackson format. This
has caused real issues for non-Java clients (e.g., the Rust client) where
users accidentally register a JSON Schema Draft 2020-12

1. The broker's StructSchemaDataValidator accepts it (Avro parse fails →
Jackson fallback succeeds because it accepts any JSON)
2. The broker's compatibility check allows it (empty block for
Avro→JsonSchema or JsonSchema→JsonSchema path)
3. But when a Java consumer uses AutoConsumeSchema or GenericJsonSchema, it
fails with SchemaParseException: Type not supported: object because
AvroBaseStructSchema strictly requires Avro format — no fallback

The result is that the broker stores a schema that no Java consumer can
read.

Proposal

  1. Add a broker configuration (e.g., schemaJsonAllowLegacyJacksonFormat,
default false) to control whether the old Jackson JsonSchema format is
accepted for SchemaType.JSON.
  2. When disabled (default), both StructSchemaDataValidator and
JsonSchemaCompatibilityCheck will strictly require valid Avro schema format
for SchemaType.JSON, consistent with what the consumer side already
requires.
  3. When enabled, the current backward-compatible behavior is preserved
for users who still have topics with legacy 2.0-era schemas.
  4. Document clearly that schema_data for SchemaType.JSON must be an Avro
schema definition, which is important for non-Java client implementations
that construct schema definitions manually.

Impact

- The legacy Jackson format has been superseded since Pulsar 2.1 (2018).
Any active topics with old-format schemas have likely been migrated or
recreated by now.
- The Java client's JSONSchema.of() has been generating Avro format since
2.1, so Java producers are unaffected.
- Non-Java clients will get a clear error at producer registration time
instead of a confusing consumer-side failure.
- Users who genuinely need the old format can opt in via the configuration
flag.

Looking forward to your thoughts.

Thanks,
Penghui

Reply via email to