+1

-Lari

On Wed, 11 Mar 2026 at 01:22, PengHui Li <[email protected]> wrote:
>
> Hi all,
>
> I'd like to propose deprecating the legacy Jackson JsonSchema format
> support for SchemaType.JSON and enforcing strict Avro schema validation by
> default.
>
> Background
>
> In Pulsar 2.0, JSONSchema originally used Jackson's JsonSchemaGenerator to
> produce schema definitions in the JSON Schema Draft standard (e.g.,
> {"type":"object","properties":{...}}). In Pulsar 2.1 (commit 1893323bc2, PR
> #2071), we standardized on Avro schema format for all
>   structured schemas, including SchemaType.JSON. The schema definition
> stored in SchemaInfo.schema was changed to Avro format (e.g.,
> {"type":"record","fields":[...]}), while the message payload remains plain
> JSON.
>
> To maintain backward compatibility with schemas created during the 2.0 era,
> fallback logic was added in several places to accept the old Jackson format:
>
> - StructSchemaDataValidator — falls back to Jackson JsonSchema parsing when
> Avro parsing fails
> - JsonSchemaCompatibilityCheck — silently allows mixed old/new format
> combinations
> - ProducerImpl — sends old format to brokers below protocol v13
>
> The Problem
>
> This fallback is too lenient. It accepts any valid JSON as a schema
> definition for SchemaType.JSON, not just the legacy Jackson format. This
> has caused real issues for non-Java clients (e.g., the Rust client) where
> users accidentally register a JSON Schema Draft 2020-12
>
> 1. The broker's StructSchemaDataValidator accepts it (Avro parse fails →
> Jackson fallback succeeds because it accepts any JSON)
> 2. The broker's compatibility check allows it (empty block for
> Avro→JsonSchema or JsonSchema→JsonSchema path)
> 3. But when a Java consumer uses AutoConsumeSchema or GenericJsonSchema, it
> fails with SchemaParseException: Type not supported: object because
> AvroBaseStructSchema strictly requires Avro format — no fallback
>
> The result is that the broker stores a schema that no Java consumer can
> read.
>
> Proposal
>
>   1. Add a broker configuration (e.g., schemaJsonAllowLegacyJacksonFormat,
> default false) to control whether the old Jackson JsonSchema format is
> accepted for SchemaType.JSON.
>   2. When disabled (default), both StructSchemaDataValidator and
> JsonSchemaCompatibilityCheck will strictly require valid Avro schema format
> for SchemaType.JSON, consistent with what the consumer side already
> requires.
>   3. When enabled, the current backward-compatible behavior is preserved
> for users who still have topics with legacy 2.0-era schemas.
>   4. Document clearly that schema_data for SchemaType.JSON must be an Avro
> schema definition, which is important for non-Java client implementations
> that construct schema definitions manually.
>
> Impact
>
> - The legacy Jackson format has been superseded since Pulsar 2.1 (2018).
> Any active topics with old-format schemas have likely been migrated or
> recreated by now.
> - The Java client's JSONSchema.of() has been generating Avro format since
> 2.1, so Java producers are unaffected.
> - Non-Java clients will get a clear error at producer registration time
> instead of a confusing consumer-side failure.
> - Users who genuinely need the old format can opt in via the configuration
> flag.
>
> Looking forward to your thoughts.
>
> Thanks,
> Penghui

Reply via email to