Re: [PR] Add fixed(L) type to variant spec [parquet-format]

2025-01-17 Thread via GitHub
aihuaxu commented on PR #481: URL: https://github.com/apache/parquet-format/pull/481#issuecomment-2599507994 > @aihuaxu I am not sure why we need this type in the Variant binary encoding. Doesn't this just duplicate the `binary` type? We don't need two different ways to store binary data.

Re: [PR] PARQUET-2471: Add GEOMETRY and GEOGRAPHY logical types [parquet-format]

2025-01-17 Thread via GitHub
wgtmac commented on code in PR #240: URL: https://github.com/apache/parquet-format/pull/240#discussion_r1920929554 ## src/main/thrift/parquet.thrift: ## @@ -417,6 +498,7 @@ union LogicalType { 14: UUIDType UUID // no compatible ConvertedType 15: Float16Type FLOAT

Re: [PR] GH-3123: Omit level histogram for some max levels [parquet-java]

2025-01-17 Thread via GitHub
wgtmac commented on code in PR #3124: URL: https://github.com/apache/parquet-java/pull/3124#discussion_r1920927836 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java: ## @@ -67,8 +67,16 @@ public static class Builder { private Builder(P

Re: [PR] GH-1452: implement Size() filter for repeated columns [parquet-java]

2025-01-17 Thread via GitHub
clairemcginty commented on code in PR #3098: URL: https://github.com/apache/parquet-java/pull/3098#discussion_r1920679469 ## parquet-column/src/main/java/org/apache/parquet/filter2/predicate/Operators.java: ## @@ -505,6 +505,82 @@ public R filter( } } + public static

Re: [PR] Add fixed(L) type to variant spec [parquet-format]

2025-01-17 Thread via GitHub
gene-db commented on PR #481: URL: https://github.com/apache/parquet-format/pull/481#issuecomment-2599055117 @aihuaxu I am not sure why we need this type in the Variant binary encoding. Doesn't this just duplicate the `binary` type? We don't need two different ways to store binary data. -

Re: [PR] GH-1452: implement Size() filter for repeated columns [parquet-java]

2025-01-17 Thread via GitHub
clairemcginty commented on code in PR #3098: URL: https://github.com/apache/parquet-java/pull/3098#discussion_r1920641849 ## parquet-hadoop/src/main/java/org/apache/parquet/filter2/dictionarylevel/DictionaryFilter.java: ## @@ -493,6 +494,39 @@ public > Boolean visit(Contains co

Re: [PR] GH-1452: implement Size() filter for repeated columns [parquet-java]

2025-01-17 Thread via GitHub
clairemcginty commented on code in PR #3098: URL: https://github.com/apache/parquet-java/pull/3098#discussion_r1920639113 ## parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/ColumnIndexBuilder.java: ## @@ -378,6 +379,11 @@ public > PrimitiveIterator.Of

Re: [PR] PARQUET-2471: Add GEOMETRY and GEOGRAPHY logical types [parquet-format]

2025-01-17 Thread via GitHub
paleolimbot commented on code in PR #240: URL: https://github.com/apache/parquet-format/pull/240#discussion_r1920531034 ## Geospatial.md: ## @@ -0,0 +1,151 @@ + + +Geospatial Definitions + + +This document contains the specification of geospatial types and statistics. + +# B

Re: [PR] GH-3123: Omit level histogram for some max levels [parquet-java]

2025-01-17 Thread via GitHub
etseidl commented on code in PR #3124: URL: https://github.com/apache/parquet-java/pull/3124#discussion_r1920464511 ## parquet-column/src/main/java/org/apache/parquet/column/statistics/SizeStatistics.java: ## @@ -67,8 +67,16 @@ public static class Builder { private Builder(

Re: [PR] GH-3123: Omit level histogram for some max levels [parquet-java]

2025-01-17 Thread via GitHub
wgtmac commented on PR #3124: URL: https://github.com/apache/parquet-java/pull/3124#issuecomment-2598720818 @emkornfield Could you please take a look? cc @etseidl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] GH-3123: Omit level histogram for some max levels [parquet-java]

2025-01-17 Thread via GitHub
wgtmac opened a new pull request, #3124: URL: https://github.com/apache/parquet-java/pull/3124 ### Rationale for this change The level histogram of size statistics can be omitted without loss of precision if its max_definition_level is 1 or 0, or max_repetition_level is 0. ###

[I] Omit level histogram for some max levels without loss [parquet-java]

2025-01-17 Thread via GitHub
wgtmac opened a new issue, #3123: URL: https://github.com/apache/parquet-java/issues/3123 ### Describe the enhancement requested According the spec of SizeStatistics, we can omit level histogram without loss of precision when max_repetition_level is 0 or max_definition_level is 0 or