[I] Use FixedSizeBinary instead of Binary for int96 conversion when convertInt96ToArrowTimestamp is false [parquet-java]

2024-11-28 Thread via GitHub
doki23 opened a new issue, #3088: URL: https://github.com/apache/parquet-java/issues/3088 ### Describe the enhancement requested ```java public TypeMapping convertINT96(PrimitiveTypeName primitiveTypeName) throws RuntimeException { if (convertInt96ToArrowTimestamp) { re

Re: [PR] GH-3086: Allow for empty beans [parquet-java]

2024-11-28 Thread via GitHub
Fokko merged PR #3087: URL: https://github.com/apache/parquet-java/pull/3087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.

Re: [PR] GH-3083: Make DELTA_LENGTH_BYTE_ARRAY default encoding for binary [parquet-java]

2024-11-28 Thread via GitHub
raunaqmorarka commented on PR #3085: URL: https://github.com/apache/parquet-java/pull/3085#issuecomment-2506285935 > Hey @raunaqmorarka thanks for raising this. I think we want to [discuss on the devlist](https://lists.apache.org/list.html?d...@parquet.apache.org) first if we want to change

[PR] GH-3086: Allow for empty beans [parquet-java]

2024-11-28 Thread via GitHub
Fokko opened a new pull request, #3087: URL: https://github.com/apache/parquet-java/pull/3087 ### Rationale for this change Please check the issue: https://github.com/apache/parquet-java/issues/3086 ### What changes are included in this PR? ### Are these changes teste

Re: [PR] GH-3083: Make DELTA_LENGTH_BYTE_ARRAY default encoding for binary [parquet-java]

2024-11-28 Thread via GitHub
Fokko commented on PR #3085: URL: https://github.com/apache/parquet-java/pull/3085#issuecomment-2506204690 Hey @raunaqmorarka thanks for raising this. I think we want to [discuss on the devlist](https://lists.apache.org/list.html?d...@parquet.apache.org) first if we want to change behavior.

[I] `ParquetMetadata` JSON serialization is failing [parquet-java]

2024-11-28 Thread via GitHub
Fokko opened a new issue, #3086: URL: https://github.com/apache/parquet-java/issues/3086 ### Describe the bug, including details regarding any error messages, version, and platform. Discovered by plugging in RC1 into Spark: https://github.com/apache/spark/pull/48970 Failing te

[PR] GH-3083: Make DELTA_LENGTH_BYTE_ARRAY default encoding for binary [parquet-java]

2024-11-28 Thread via GitHub
raunaqmorarka opened a new pull request, #3085: URL: https://github.com/apache/parquet-java/pull/3085 ### Rationale for this change The current default for V1 pages is PLAIN encoding. This encoding mixes string length with string data. This is inefficient for for skipping N va

[I] Required field 'num_values' was not found in serialized data! [parquet-java]

2024-11-28 Thread via GitHub
wardlican opened a new issue, #3084: URL: https://github.com/apache/parquet-java/issues/3084 ### Describe the bug, including details regarding any error messages, version, and platform. When using iceberg, we encountered a situation where a parquet file we wrote could not be read. Wh

Re: [PR] GH-3078: Use Hadoop FileSystem.openFile() to open files [parquet-java]

2024-11-28 Thread via GitHub
gszadovszky commented on code in PR #3079: URL: https://github.com/apache/parquet-java/pull/3079#discussion_r1861685918 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/wrapped/io/FutureIO.java: ## @@ -70,6 +70,29 @@ public static T awaitFuture(final Future future