Boyang Jerry Peng created SPARK-56043:
-----------------------------------------
Summary: Wrap NullPointerException from Avro 1.12.x
ParseContext.resolve() in SchemaParseException
Key: SPARK-56043
URL: https://issues.apache.org/jira/browse/SPARK-56043
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 4.1.1, 4.0.2, 4.0.1, 4.0.0, 4.1.0, 4.2.0, 4.1.2, 4.0.3
Reporter: Boyang Jerry Peng
After the Avro 1.11.3 → 1.12.0 upgrade (SPARK-49014),
{{Schema.Parser().parse()}} can throw {{NullPointerException}} from
{{ParseContext.resolve()}} for certain invalid user-provided schemas. In Avro
1.11.x, the same schemas threw {{{}SchemaParseException{}}}, which was caught
by the existing {{NonFatal(e)}} handler in {{AvroDataToCatalyst.nullSafeEval}}
and reported as {{{}MALFORMED_AVRO_MESSAGE{}}}. The NPE bypasses nothing — it's
still {{NonFatal}} — but it surfaces as an unclassified internal error rather
than a schema parse error because it was never expected from schema parsing.
*Root cause:* AVRO-3666 replaced the {{Names}} lookup (which threw
{{{}SchemaParseException("Undefined name: ..."){}}}) with
{{{}ParseContext.resolve(){}}}, which calls
{{Objects.requireNonNull(oldSchemas.get(fullName))}} — throwing a raw
{{NullPointerException}} when named types can't be resolved.
*Verified behavioral difference* (standalone test against both jars):
||Schema pattern||Avro 1.11.3||Avro 1.12.1||
|Undefined named type in record
field|{{SchemaParseException}}|{{AvroTypeException}}|
|Undefined type in union|{{SchemaParseException}}|{{AvroTypeException}}|
|Bare string reference
({{{}"com.test.Missing"{}}})|{{SchemaParseException}}|*{{NullPointerException}}*|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]