Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1859233788 ## LogicalTypes.md: ## @@ -609,9 +609,20 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1859240321 ## LogicalTypes.md: ## @@ -684,44 +702,67 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1859237857 ## LogicalTypes.md: ## @@ -609,9 +609,20 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted

Re: [PR] GH-3070: Add Variant logical type annotation to parquet-java [parquet-java]

2024-11-26 Thread via GitHub
wgtmac commented on PR #3072: URL: https://github.com/apache/parquet-java/pull/3072#issuecomment-2501022328 Usually we need two reference implementations for spec changes like this. I'm not sure if there is any chance to have another implementation ready in a timely manner. IMO, at least pa

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859075924 ## VariantEncoding.md: ## @@ -39,13 +39,41 @@ Another motivation for the representation is that (aside from metadata) each nes For example, in a Variant containin

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859077883 ## VariantShredding.md: ## @@ -25,276 +25,302 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859080002 ## VariantEncoding.md: ## @@ -39,13 +39,41 @@ Another motivation for the representation is that (aside from metadata) each nes For example, in a Variant containin

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859083339 ## VariantEncoding.md: ## @@ -39,13 +39,41 @@ Another motivation for the representation is that (aside from metadata) each nes For example, in a Variant containin

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859084567 ## VariantEncoding.md: ## @@ -39,13 +39,41 @@ Another motivation for the representation is that (aside from metadata) each nes For example, in a Variant containin

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859086423 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859087543 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859093933 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859095957 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859099929 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859092517 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] MINOR: Use `exec-maven-plugin.version` property [parquet-java]

2024-11-26 Thread via GitHub
Fokko merged PR #3047: URL: https://github.com/apache/parquet-java/pull/3047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859141628 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859148222 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

[PR] MINOR: Revert `buildnumber-maven-plugin` to 3.2.0 [parquet-java]

2024-11-26 Thread via GitHub
Fokko opened a new pull request, #3082: URL: https://github.com/apache/parquet-java/pull/3082 ### Rationale for this change During verification of the 1.15.0 release, @gszadovszky noticed that this specific version caused issues, therefore it is better to revert it for now. ###

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859127325 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859130304 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859147187 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859151894 ## VariantEncoding.md: ## @@ -416,14 +444,36 @@ Field names are case-sensitive. Field names are required to be unique for each object. It is an error for an objec

[I] HadoopStreams to support ByteBufferPositionedReadable input streams [parquet-java]

2024-11-26 Thread via GitHub
steveloughran opened a new issue, #3080: URL: https://github.com/apache/parquet-java/issues/3080 ### Describe the enhancement requested If a stream declares in its StreamCapabilities that it supports ByteBufferPositionedReadable, then use it for `readFully(ByteBuffer)` All st

Re: [I] HadoopStreams to support ByteBufferPositionedReadable input streams [parquet-java]

2024-11-26 Thread via GitHub
steveloughran commented on issue #3080: URL: https://github.com/apache/parquet-java/issues/3080#issuecomment-2501825209 I'm implementing this, with tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859139239 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859143649 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

[PR] MINOR: Add shading for JDK22 specific classes [parquet-java]

2024-11-26 Thread via GitHub
Fokko opened a new pull request, #3081: URL: https://github.com/apache/parquet-java/pull/3081 ### Rationale for this change JDK 22 specific classes were added in Jackson, but we forgot to shade them explicitly as pointed out in: https://github.com/apache/parquet-java/blob/8fa7

Re: [PR] GH-3070: Add Variant logical type annotation to parquet-java [parquet-java]

2024-11-26 Thread via GitHub
aihuaxu commented on PR #3072: URL: https://github.com/apache/parquet-java/pull/3072#issuecomment-2501372540 I see. Per guideline, we need to have the implementation in parquet-java and then another one. Do we usually include the implementation with this annotation change or should be separ

[PR] GH-3078: Use Hadoop FileSystem.openFile() to open files [parquet-java]

2024-11-26 Thread via GitHub
steveloughran opened a new pull request, #3079: URL: https://github.com/apache/parquet-java/pull/3079 ### Rationale for this change ### What changes are included in this PR? * Open files with FileSystem.openFile(), passing in file status * And read policy of "parq

Re: [PR] GH-2943: Remove hadoop-2 support [parquet-java]

2024-11-26 Thread via GitHub
Fokko merged PR #3061: URL: https://github.com/apache/parquet-java/pull/3061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.

Re: [I] Remove support for Hadoop <3.3 [parquet-java]

2024-11-26 Thread via GitHub
Fokko closed issue #2943: Remove support for Hadoop <3.3 URL: https://github.com/apache/parquet-java/issues/2943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] HadoopInputFile to pass down FileStatus when opening file [parquet-java]

2024-11-26 Thread via GitHub
steveloughran closed pull request #2955: HadoopInputFile to pass down FileStatus when opening file URL: https://github.com/apache/parquet-java/pull/2955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] HadoopInputFile to pass down FileStatus when opening file [parquet-java]

2024-11-26 Thread via GitHub
steveloughran commented on PR #2955: URL: https://github.com/apache/parquet-java/pull/2955#issuecomment-2501251041 Superceded by #3079 now reflection is not needed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859059592 ## VariantShredding.md: ## @@ -25,276 +25,302 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859061998 ## VariantEncoding.md: ## @@ -416,14 +444,36 @@ Field names are case-sensitive. Field names are required to be unique for each object. It is an error for an objec

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859071155 ## VariantShredding.md: ## @@ -25,276 +25,302 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859108674 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-11-26 Thread via GitHub
rdblue commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1859117065 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous value

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
wgtmac commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1859989177 ## LogicalTypes.md: ## @@ -684,44 +702,67 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
mapleFU commented on PR #466: URL: https://github.com/apache/parquet-format/pull/466#issuecomment-2502968117 > The rules part is looking good, but I think that spending time documenting what people did incorrectly years ago makes the doc more confusing and increases chances that people will

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
wgtmac commented on PR #466: URL: https://github.com/apache/parquet-format/pull/466#issuecomment-2502982189 @rdblue Thanks for your review! I have removed all unnecessary changes. Please take a look again. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
wgtmac commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1859998523 ## LogicalTypes.md: ## @@ -684,44 +689,58 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
wgtmac commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1859997738 ## LogicalTypes.md: ## @@ -684,44 +689,58 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-c

Re: [PR] GH-3070: Add Variant logical type annotation to parquet-java [parquet-java]

2024-11-26 Thread via GitHub
wgtmac commented on PR #3072: URL: https://github.com/apache/parquet-java/pull/3072#issuecomment-2502503713 I think it should be in one change. The parquet-format cannot be released without concrete PoC implementation in parquet-java. Without that release, separate changes may break CI and

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-11-26 Thread via GitHub
wgtmac commented on code in PR #466: URL: https://github.com/apache/parquet-format/pull/466#discussion_r1859970898 ## LogicalTypes.md: ## @@ -609,9 +609,20 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted

Re: [PR] MINOR: Add `doap.rdf` file for release tracking [parquet-java]

2024-11-26 Thread via GitHub
Fokko closed pull request #3001: MINOR: Add `doap.rdf` file for release tracking URL: https://github.com/apache/parquet-java/pull/3001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] GH-3070: Add Variant logical type annotation to parquet-java [parquet-java]

2024-11-26 Thread via GitHub
Fokko commented on PR #3072: URL: https://github.com/apache/parquet-java/pull/3072#issuecomment-2500124168 @aihuaxu I agree with @emkornfield that the `iceberg-java` implementation should be able to read and write the variant type. It would also be great to drop some example parquet f