gaurav7261 opened a new pull request, #3415:
URL: https://github.com/apache/parquet-java/pull/3415
<!--
Thanks for opening a pull request!
If you're new to Parquet-Java, information on how to contribute can be
found here: https://parquet.apache.org/docs/contribution-guidelines/contributing
Please open a GitHub issue for this pull request:
https://github.com/apache/parquet-java/issues/new/choose
and format pull request title as below:
GH-${GITHUB_ISSUE_ID}: ${SUMMARY}
or simply use the title below if it is a minor issue:
MINOR: ${SUMMARY}
-->
### Rationale for this change
Every consumer of `parquet-variant` currently has to independently
implement JSON-to-Variant parsing. Apache Spark has one in its `common/variant`
module
([source](https://github.com/apache/spark/tree/master/common/variant)),
our Kafka Connect S3 sink connector had to write one, and any other project
(Flink, Trino, DuckDB-Java, etc.) would need to do the
same. Since `VariantBuilder` already provides all the low-level
`append*()` primitives, `parseJson()` is the natural completion of that API — a
canonical, reusable entry point for the most common use
case: converting a JSON string into a Variant.
### What changes are included in this PR?
- **`parquet-variant/pom.xml`**: Added `jackson-core` (compile) and
`parquet-jackson` (runtime) dependencies, following the same pattern as
`parquet-hadoop`.
- **`VariantBuilder.java`**: Added two public static methods:
- `parseJson(String json)` — convenience method that creates a Jackson
streaming parser internally.
- `parseJson(JsonParser parser)` — for callers who already have a
positioned parser (e.g., reading from a stream).
- Internal helpers ported from Apache Spark's production
`VariantBuilder.buildJson`:
- `buildJson()` — recursive single-pass streaming parser handling
OBJECT, ARRAY, STRING, NUMBER_INT, NUMBER_FLOAT, TRUE, FALSE, NULL.
- `appendSmallestLong()` — selects the smallest integer type
(BYTE/SHORT/INT/LONG) based on value range.
- `tryAppendDecimal()` — decimal-first encoding for floating-point
numbers; falls back to double only for scientific notation or values exceeding
DECIMAL16 precision (38 digits).
- **`TestVariantParseJson.java`**: 32 new tests covering all primitive
types, objects (empty, simple, nested, null values, sorted keys, duplicate
keys), arrays (empty, simple, nested, mixed types), and
edge cases (unicode, escaped strings, deeply nested documents, scientific
notation, integer overflow to decimal, malformed JSON).
### Are these changes tested?
Yes. 32 new tests in `TestVariantParseJson`.
### Are there any user-facing changes?
Yes. Two new public static methods on `VariantBuilder`:
- `VariantBuilder.parseJson(String json)` — returns a `Variant`
- `VariantBuilder.parseJson(JsonParser parser)` — returns a `Variant`
These are additive API additions with no breaking changes to existing APIs.
Closes #3414
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]