gaurav7261 opened a new pull request, #3415:
URL: https://github.com/apache/parquet-java/pull/3415

   <!--
     Thanks for opening a pull request!
     If you're new to Parquet-Java, information on how to contribute can be 
found here: https://parquet.apache.org/docs/contribution-guidelines/contributing
     Please open a GitHub issue for this pull request: 
https://github.com/apache/parquet-java/issues/new/choose
     and format pull request title as below:
         GH-${GITHUB_ISSUE_ID}: ${SUMMARY}
     or simply use the title below if it is a minor issue:
         MINOR: ${SUMMARY}
     -->
     ### Rationale for this change
     Every consumer of `parquet-variant` currently has to independently 
implement JSON-to-Variant parsing. Apache Spark has one in its `common/variant` 
module 
     ([source](https://github.com/apache/spark/tree/master/common/variant)), 
our Kafka Connect S3 sink connector had to write one, and any other project 
(Flink, Trino, DuckDB-Java, etc.) would need to do the 
     same. Since `VariantBuilder` already provides all the low-level 
`append*()` primitives, `parseJson()` is the natural completion of that API — a 
canonical, reusable entry point for the most common use 
     case: converting a JSON string into a Variant.
     ### What changes are included in this PR?
     - **`parquet-variant/pom.xml`**: Added `jackson-core` (compile) and 
`parquet-jackson` (runtime) dependencies, following the same pattern as 
`parquet-hadoop`.
     - **`VariantBuilder.java`**: Added two public static methods:
       - `parseJson(String json)` — convenience method that creates a Jackson 
streaming parser internally.
       - `parseJson(JsonParser parser)` — for callers who already have a 
positioned parser (e.g., reading from a stream).
       - Internal helpers ported from Apache Spark's production 
`VariantBuilder.buildJson`:
         - `buildJson()` — recursive single-pass streaming parser handling 
OBJECT, ARRAY, STRING, NUMBER_INT, NUMBER_FLOAT, TRUE, FALSE, NULL.
         - `appendSmallestLong()` — selects the smallest integer type 
(BYTE/SHORT/INT/LONG) based on value range.
         - `tryAppendDecimal()` — decimal-first encoding for floating-point 
numbers; falls back to double only for scientific notation or values exceeding 
DECIMAL16 precision (38 digits).
     - **`TestVariantParseJson.java`**: 32 new tests covering all primitive 
types, objects (empty, simple, nested, null values, sorted keys, duplicate 
keys), arrays (empty, simple, nested, mixed types), and 
     edge cases (unicode, escaped strings, deeply nested documents, scientific 
notation, integer overflow to decimal, malformed JSON).
     ### Are these changes tested?
     Yes. 32 new tests in `TestVariantParseJson`.
     ### Are there any user-facing changes?
     Yes. Two new public static methods on `VariantBuilder`:
     - `VariantBuilder.parseJson(String json)` — returns a `Variant`
     - `VariantBuilder.parseJson(JsonParser parser)` — returns a `Variant`
     These are additive API additions with no breaking changes to existing APIs.
     Closes #3414


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to