+1 -Lari
On Wed, 11 Mar 2026 at 01:22, PengHui Li <[email protected]> wrote: > > Hi all, > > I'd like to propose deprecating the legacy Jackson JsonSchema format > support for SchemaType.JSON and enforcing strict Avro schema validation by > default. > > Background > > In Pulsar 2.0, JSONSchema originally used Jackson's JsonSchemaGenerator to > produce schema definitions in the JSON Schema Draft standard (e.g., > {"type":"object","properties":{...}}). In Pulsar 2.1 (commit 1893323bc2, PR > #2071), we standardized on Avro schema format for all > structured schemas, including SchemaType.JSON. The schema definition > stored in SchemaInfo.schema was changed to Avro format (e.g., > {"type":"record","fields":[...]}), while the message payload remains plain > JSON. > > To maintain backward compatibility with schemas created during the 2.0 era, > fallback logic was added in several places to accept the old Jackson format: > > - StructSchemaDataValidator — falls back to Jackson JsonSchema parsing when > Avro parsing fails > - JsonSchemaCompatibilityCheck — silently allows mixed old/new format > combinations > - ProducerImpl — sends old format to brokers below protocol v13 > > The Problem > > This fallback is too lenient. It accepts any valid JSON as a schema > definition for SchemaType.JSON, not just the legacy Jackson format. This > has caused real issues for non-Java clients (e.g., the Rust client) where > users accidentally register a JSON Schema Draft 2020-12 > > 1. The broker's StructSchemaDataValidator accepts it (Avro parse fails → > Jackson fallback succeeds because it accepts any JSON) > 2. The broker's compatibility check allows it (empty block for > Avro→JsonSchema or JsonSchema→JsonSchema path) > 3. But when a Java consumer uses AutoConsumeSchema or GenericJsonSchema, it > fails with SchemaParseException: Type not supported: object because > AvroBaseStructSchema strictly requires Avro format — no fallback > > The result is that the broker stores a schema that no Java consumer can > read. > > Proposal > > 1. Add a broker configuration (e.g., schemaJsonAllowLegacyJacksonFormat, > default false) to control whether the old Jackson JsonSchema format is > accepted for SchemaType.JSON. > 2. When disabled (default), both StructSchemaDataValidator and > JsonSchemaCompatibilityCheck will strictly require valid Avro schema format > for SchemaType.JSON, consistent with what the consumer side already > requires. > 3. When enabled, the current backward-compatible behavior is preserved > for users who still have topics with legacy 2.0-era schemas. > 4. Document clearly that schema_data for SchemaType.JSON must be an Avro > schema definition, which is important for non-Java client implementations > that construct schema definitions manually. > > Impact > > - The legacy Jackson format has been superseded since Pulsar 2.1 (2018). > Any active topics with old-format schemas have likely been migrated or > recreated by now. > - The Java client's JSONSchema.of() has been generating Avro format since > 2.1, so Java producers are unaffected. > - Non-Java clients will get a clear error at producer registration time > instead of a confusing consumer-side failure. > - Users who genuinely need the old format can opt in via the configuration > flag. > > Looking forward to your thoughts. > > Thanks, > Penghui
