[I] AvroParquetWriter cache old aws config even after close, clean and create new writer [parquet-java]

2024-12-20 Thread via GitHub
CandiceSu opened a new issue, #3106: URL: https://github.com/apache/parquet-java/issues/3106 ### Describe the bug, including details regarding any error messages, version, and platform. Hi we created one AvroParquetWriter to write to aws s3. This is how we create the writer:

Re: [I] How to disable statistics in version 1.13.1? [parquet-java]

2024-12-20 Thread via GitHub
felipepessoto commented on issue #3103: URL: https://github.com/apache/parquet-java/issues/3103#issuecomment-2557895492 Thanks. I've set `parquet.statistics.truncate.length` to 1. The `parquet.columnindex.truncate.length` I wasn't sure how it works but using the `parquet.statistics.truncate

Re: [PR] PARQUET-34: implement Size() filter for repeated columns [parquet-java]

2024-12-20 Thread via GitHub
emkornfield commented on PR #3098: URL: https://github.com/apache/parquet-java/pull/3098#issuecomment-2557642415 I can try to look in more detail but stats can certainly be used here, I imagine they are most useful for repeated fieds when trying to discriminate between repeated fields that

Re: [PR] Simplify Variant shredding and refactor for clarity [parquet-format]

2024-12-20 Thread via GitHub
emkornfield commented on code in PR #461: URL: https://github.com/apache/parquet-format/pull/461#discussion_r1894292367 ## VariantShredding.md: ## @@ -25,290 +25,320 @@ The Variant type is designed to store and process semi-structured data efficiently, even with heterogeneous

Re: [PR] GH-3080: HadoopStreams to support ByteBufferPositionedReadable [parquet-java]

2024-12-20 Thread via GitHub
steveloughran commented on PR #3096: URL: https://github.com/apache/parquet-java/pull/3096#issuecomment-2557662530 I'm away until 2025; will reply to comments then. Thanks for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on