Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-12-04 Thread via GitHub
wgtmac commented on PR #466: URL: https://github.com/apache/parquet-format/pull/466#issuecomment-2519277990 Will merge this by the end of this week if no objection. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-12-04 Thread via GitHub
pitrou commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1868965095 ## LogicalTypes.md: ## @@ -684,44 +703,61 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868977523 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868866777 ## VariantShredding.md: ## @@ -25,290 +25,318 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [I] `ParquetMetadata` JSON serialization is failing [parquet-java]

2024-12-04 Thread via GitHub
Fokko closed issue #3086: `ParquetMetadata` JSON serialization is failing URL: https://github.com/apache/parquet-java/issues/3086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1869037182 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1869037182 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-12-04 Thread via GitHub
pitrou commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1869082194 ## LogicalTypes.md: ## @@ -684,44 +689,58 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-12-04 Thread via GitHub
wgtmac commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1869110188 ## LogicalTypes.md: ## @@ -684,44 +689,58 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] GH-472: Add shredding version [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on PR #474: URL: https://github.com/apache/parquet-format/pull/474#issuecomment-2516303681 CC @rdblue @gene-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] GH-463: Add more types - time, nano timestamps, UUID to Variant spec [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on PR #464: URL: https://github.com/apache/parquet-format/pull/464#issuecomment-2516307414 This LGTM, @RussellSpitzer any more comments. Also, CC @gene-db @rdblue in case there are any concerns. -- This is an automated message from the Apache Git Service. To re

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-12-04 Thread via GitHub
pitrou commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1869146424 ## LogicalTypes.md: ## @@ -684,44 +689,58 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] [ignore] HADOOP-19087. Release Hadoop 3.4.1: test branch [parquet-java]

2024-12-04 Thread via GitHub
steveloughran closed pull request #2996: [ignore] HADOOP-19087. Release Hadoop 3.4.1: test branch URL: https://github.com/apache/parquet-java/pull/2996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [ignore] HADOOP-19087. Release Hadoop 3.4.1: test branch [parquet-java]

2024-12-04 Thread via GitHub
steveloughran commented on PR #2996: URL: https://github.com/apache/parquet-java/pull/2996#issuecomment-2516976339 closing; all good now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-12-04 Thread via GitHub
wgtmac commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1869077618 ## LogicalTypes.md: ## @@ -684,44 +689,58 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] MINOR: Bump version to 1.16.0-SNAPSHOT [parquet-java]

2024-12-04 Thread via GitHub
wgtmac commented on PR #3097: URL: https://github.com/apache/parquet-java/pull/3097#issuecomment-2517510829 We can bump it to 1.16.0-SNAPSHOT for now. A major version bump is something serious to discuss. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] MINOR: Bump version to 1.16.0-SNAPSHOT [parquet-java]

2024-12-04 Thread via GitHub
wgtmac merged PR #3097: URL: https://github.com/apache/parquet-java/pull/3097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet

Re: [PR] GH-3078: Use Hadoop FileSystem.openFile() to open files [parquet-java]

2024-12-04 Thread via GitHub
steveloughran commented on PR #3079: URL: https://github.com/apache/parquet-java/pull/3079#issuecomment-2517815867 shaves a HEAD request! for s3a it tells things to seek properly rather than having to guess afterwards. FWIW there's a "whole-file" read policy, we use this in had

[PR] MINOR: Clarify offsets etc are unsigned integers [parquet-format]

2024-12-04 Thread via GitHub
emkornfield opened a new pull request, #475: URL: https://github.com/apache/parquet-format/pull/475 ### Rationale for this change We should clarify whether metadata integers are signed or unsigned. ### What changes are included in this PR? Clarify signedness for Varia

Re: [PR] MINOR: Clarify offsets etc are unsigned integers [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on PR #475: URL: https://github.com/apache/parquet-format/pull/475#issuecomment-2518262268 @gene-db is unsigned correct or should these be signed? CC @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] MINOR: Clarify offsets etc are unsigned integers [parquet-format]

2024-12-04 Thread via GitHub
gene-db commented on code in PR #475: URL: https://github.com/apache/parquet-format/pull/475#discussion_r1870179241 ## VariantEncoding.md: ## @@ -88,9 +88,9 @@ metadata |header | +---+ ``` -The metadata is encoded first with the

Re: [PR] MINOR: Clarify offsets etc are unsigned integers [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on PR #475: URL: https://github.com/apache/parquet-format/pull/475#issuecomment-2518422458 Thanks for the quick review @gene-db I'll merge this end of week unless there are more comments. @aihuaxu -- This is an automated message from the Apache Git Service. To

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868938364 ## VariantShredding.md: ## @@ -25,290 +25,318 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-04 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868952437 ## VariantEncoding.md: ## @@ -416,14 +444,36 @@ Field names are case-sensitive. Field names are required to be unique for each object. It is an error for an