Re: [PR] PARQUET-3031: Support to transfer input stream when building ParquetFileReader [parquet-java]

2024-12-03 Thread via GitHub
dongjoon-hyun commented on PR #3030: URL: https://github.com/apache/parquet-java/pull/3030#issuecomment-2515015359 For the record, I merged @Fokko 's Parquet 1.15.0 PR to Apache Spark repository. To @turboFei and @wangyum , if you want, you can make a PR to use this new technique in

[I] Add shredding version to Variant logical annotation [parquet-format]

2024-12-03 Thread via GitHub
emkornfield opened a new issue, #473: URL: https://github.com/apache/parquet-format/issues/473 ### Describe the enhancement requested Shredding is a complex topic and we will likely want flexibility to evolve in a forward compatible way, adding version is an easy way to guarantee this

Re: [I] Add new optional type parameters Offset to TIMESTAMP [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on issue #458: URL: https://github.com/apache/parquet-format/issues/458#issuecomment-2516319351 @ryancasburn-KAI thank you for the feature as @wgtmac stated, it seems like Delta encoding should be sufficient for this use-case with the exclusion of the Nanoseconds issue

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868845406 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868866777 ## VariantShredding.md: ## @@ -25,290 +25,318 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868824200 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [I] Column/field description [parquet-format]

2024-12-03 Thread via GitHub
simonaubertbd commented on issue #447: URL: https://github.com/apache/parquet-format/issues/447#issuecomment-2516369956 Hello @emkornfield and thanks for the answer. 1/I may disagree about the utility : Parquet has become in the last month a standard for data transfer just like csv (but

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868820400 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868836731 ## VariantShredding.md: ## @@ -25,290 +25,318 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868840473 ## VariantShredding.md: ## @@ -25,276 +25,302 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] GH-465: Clarify backward-compatibility rules on LIST type [parquet-format]

2024-12-03 Thread via GitHub
wgtmac commented on PR #466: URL: https://github.com/apache/parquet-format/pull/466#issuecomment-2516027768 @pitrou @gszadovszky @rdblue Do you have any concern with the latest change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] GH-472: Add shredding version [parquet-format]

2024-12-03 Thread via GitHub
emkornfield opened a new pull request, #474: URL: https://github.com/apache/parquet-format/pull/474 ### Rationale for this change Shredding of variants has a lot of potential evolutions having a version helps track any future version. Note the binary format is already versioned beca

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868856218 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [I] Column/field description [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on issue #447: URL: https://github.com/apache/parquet-format/issues/447#issuecomment-2516324150 @simonaubertbd I don't think this exists in parquet. I'm not against adding it but with the rise in popularity of table formats (which do include description) the utility m

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868818298 ## VariantEncoding.md: ## @@ -39,13 +39,42 @@ Another motivation for the representation is that (aside from metadata) each nes For example, in a Variant cont

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868833706 ## VariantShredding.md: ## @@ -25,276 +25,302 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868836731 ## VariantShredding.md: ## @@ -25,290 +25,318 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [I] Use Hadoop FileSystem.openFile() to open files [parquet-java]

2024-12-03 Thread via GitHub
gszadovszky closed issue #3078: Use Hadoop FileSystem.openFile() to open files URL: https://github.com/apache/parquet-java/issues/3078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] GH-3078: Use Hadoop FileSystem.openFile() to open files [parquet-java]

2024-12-03 Thread via GitHub
gszadovszky merged PR #3079: URL: https://github.com/apache/parquet-java/pull/3079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@pa

Re: [PR] GH-463: Add more types - time, nano timestamps, UUID to Variant spec [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on PR #464: URL: https://github.com/apache/parquet-format/pull/464#issuecomment-2516309427 Will merge end of week if there aren't more comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868819244 ## VariantEncoding.md: ## @@ -39,13 +39,41 @@ Another motivation for the representation is that (aside from metadata) each nes For example, in a Variant cont

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868818645 ## VariantEncoding.md: ## @@ -39,13 +39,41 @@ Another motivation for the representation is that (aside from metadata) each nes For example, in a Variant cont

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-03 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1868839388 ## VariantShredding.md: ## @@ -25,290 +25,316 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

[I] parquet-cli reports nested columns as null [parquet-java]

2024-12-03 Thread via GitHub
acdha opened a new issue, #3095: URL: https://github.com/apache/parquet-java/issues/3095 ### Describe the bug, including details regarding any error messages, version, and platform. Using Parquet CLI 1.15.0 via Mac Homebrew, I noticed some surprising behaviour with the `parquet-cli`

[PR] GH-3080: HadoopStreams to support ByteBufferPositionedReadable [parquet-java]

2024-12-03 Thread via GitHub
steveloughran opened a new pull request, #3096: URL: https://github.com/apache/parquet-java/pull/3096 ### Rationale for this change If a stream declares in its StreamCapabilities that it supports `ByteBufferPositionedReadable`, then use that API for `readFully(ByteBuffer)`

Re: [PR] MINOR: Bump version to 1.16.0-SNAPSHOT [parquet-java]

2024-12-03 Thread via GitHub
wgtmac commented on PR #3097: URL: https://github.com/apache/parquet-java/pull/3097#issuecomment-2516205281 Is it time to bump the version to 2.0-SNAPSHOT? @Fokko @gszadovszky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] MINOR: Bump version to 1.16.0-SNAPSHOT [parquet-java]

2024-12-03 Thread via GitHub
wgtmac opened a new pull request, #3097: URL: https://github.com/apache/parquet-java/pull/3097 ### Rationale for this change The snapshot version should be bumped after releasing 1.15.0. ### What changes are included in this PR? Bump version to 1.16.0-SNAPSHOT and update

Re: [PR] MINOR: Bump version to 1.16.0-SNAPSHOT [parquet-java]

2024-12-03 Thread via GitHub
Fokko commented on PR #3097: URL: https://github.com/apache/parquet-java/pull/3097#issuecomment-2516246355 The docs suggest running the major release in October: https://parquet.apache.org/docs/contribution-guidelines/releasing/#release-cadence -- This is an automated message from the

Re: [PR] MINOR: Bump version to 1.16.0-SNAPSHOT [parquet-java]

2024-12-03 Thread via GitHub
Fokko commented on PR #3097: URL: https://github.com/apache/parquet-java/pull/3097#issuecomment-2516239862 Ah, I wanted to do this as well, thanks for picking this up @wgtmac. Regarding `2.0-SNAPSHOT`, that might be a good question to discuss at the sync later today. -- This is an automa