[I] Use FixedSizeBinary instead of Binary for int96 conversion when convertInt96ToArrowTimestamp is false [parquet-java]

2024-11-28 Thread via GitHub
doki23 opened a new issue, #3088: URL: https://github.com/apache/parquet-java/issues/3088 ### Describe the enhancement requested ```java public TypeMapping convertINT96(PrimitiveTypeName primitiveTypeName) throws RuntimeException { if (convertInt96ToArrowTimestamp) { re

[I] Required field 'num_values' was not found in serialized data! [parquet-java]

2024-11-28 Thread via GitHub
wardlican opened a new issue, #3084: URL: https://github.com/apache/parquet-java/issues/3084 ### Describe the bug, including details regarding any error messages, version, and platform. When using iceberg, we encountered a situation where a parquet file we wrote could not be read. Wh

[PR] GH-3083: Make DELTA_LENGTH_BYTE_ARRAY default encoding for binary [parquet-java]

2024-11-28 Thread via GitHub
raunaqmorarka opened a new pull request, #3085: URL: https://github.com/apache/parquet-java/pull/3085 ### Rationale for this change The current default for V1 pages is PLAIN encoding. This encoding mixes string length with string data. This is inefficient for for skipping N va

Re: [PR] GH-3086: Allow for empty beans [parquet-java]

2024-11-28 Thread via GitHub
Fokko merged PR #3087: URL: https://github.com/apache/parquet-java/pull/3087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.

[I] `ParquetMetadata` JSON serialization is failing [parquet-java]

2024-11-28 Thread via GitHub
Fokko opened a new issue, #3086: URL: https://github.com/apache/parquet-java/issues/3086 ### Describe the bug, including details regarding any error messages, version, and platform. Discovered by plugging in RC1 into Spark: https://github.com/apache/spark/pull/48970 Failing te

Re: [PR] GH-3083: Make DELTA_LENGTH_BYTE_ARRAY default encoding for binary [parquet-java]

2024-11-28 Thread via GitHub
raunaqmorarka commented on PR #3085: URL: https://github.com/apache/parquet-java/pull/3085#issuecomment-2506285935 > Hey @raunaqmorarka thanks for raising this. I think we want to [discuss on the devlist](https://lists.apache.org/list.html?d...@parquet.apache.org) first if we want to change

Re: [PR] GH-3078: Use Hadoop FileSystem.openFile() to open files [parquet-java]

2024-11-28 Thread via GitHub
gszadovszky commented on code in PR #3079: URL: https://github.com/apache/parquet-java/pull/3079#discussion_r1861685918 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/wrapped/io/FutureIO.java: ## @@ -70,6 +70,29 @@ public static T awaitFuture(final Future future

Re: [PR] GH-3083: Make DELTA_LENGTH_BYTE_ARRAY default encoding for binary [parquet-java]

2024-11-28 Thread via GitHub
Fokko commented on PR #3085: URL: https://github.com/apache/parquet-java/pull/3085#issuecomment-2506204690 Hey @raunaqmorarka thanks for raising this. I think we want to [discuss on the devlist](https://lists.apache.org/list.html?d...@parquet.apache.org) first if we want to change behavior.

[PR] GH-3086: Allow for empty beans [parquet-java]

2024-11-28 Thread via GitHub
Fokko opened a new pull request, #3087: URL: https://github.com/apache/parquet-java/pull/3087 ### Rationale for this change Please check the issue: https://github.com/apache/parquet-java/issues/3086 ### What changes are included in this PR? ### Are these changes teste