etseidl commented on code in PR #576:
URL: https://github.com/apache/parquet-format/pull/576#discussion_r3344273125
##########
Encodings.md:
##########
@@ -82,7 +83,7 @@ written first, before the data pages of the column chunk.
Dictionary page format: the entries in the dictionary using the
[plain](#PLAIN) encoding.
Data page format: the bit width used to encode the entry ids stored as 1 byte
(max bit width = 32),
-followed by the values encoded using RLE/Bit packed described above (with the
given bit width).
+followed by the values encoded using RLE/Bit-Packed described above (with the
given bit width).
Review Comment:
```suggestion
followed by the values encoded using the RLE/Bit-Packing described above
(with the given bit width).
```
##########
Encodings.md:
##########
@@ -68,7 +69,7 @@ stores the data in the following format:
For native types, this outputs the data as little endian. Floating
point types are encoded in IEEE.
-For the byte array type, it encodes the length as a 4 byte little
+For the byte array type, it encodes the length as a 4-byte little
endian, followed by the bytes.
Review Comment:
```suggestion
endian integer, followed by the bytes.
```
##########
LogicalTypes.md:
##########
@@ -544,7 +545,8 @@ are found during reading, they must be ignored.
## Embedded Types
-Embedded types do not have type-specific orderings.
+Embedded types do not have type-specific orderings beyond the unsigned
+byte-wise comparison of their physical type (`BYTE_ARRAY`).
Review Comment:
This isn't correct since `VARIANT`, `GEOMETRY`, and `GEOGRAPHY` have
undefined orderings. Maybe instead
```suggestion
Embedded types do not have type-specific orderings unless otherwise
specified.
```
##########
LogicalTypes.md:
##########
@@ -606,7 +608,7 @@ optional group variant_shredded (VARIANT(1)) {
### GEOMETRY
`GEOMETRY` is used for geospatial features in the Well-Known Binary (WKB)
format
-with linear/planar edges interpolation. It must annotate a `BYTE_ARRAY`
Review Comment:
As with the other PR, I think "edges" is deliberate
##########
Encodings.md:
##########
@@ -322,8 +323,6 @@ The delta encoding algorithm described above stores a bit
width per miniblock an
Supported Types: BYTE_ARRAY
-This encoding is always preferred over PLAIN for byte array columns.
-
Review Comment:
This isn't a typo, you're actually changing the spec here. I think this
needs an actual discussion.
##########
LogicalTypes.md:
##########
@@ -243,7 +243,7 @@ comparison.
*Compatibility*
-To support compatibility with older readers, implementations of parquet-format
should
+To support compatibility with older readers, implementations of parquet-format
must
Review Comment:
Another spec change.
##########
Compression.md:
##########
@@ -89,7 +89,7 @@ switch to the newer, interoperable `LZ4_RAW` codec.
### ZSTD
A codec based on the Zstandard format defined by
-[RFC 8478](https://tools.ietf.org/html/rfc8478). If any ambiguity arises
+[RFC 8878](https://tools.ietf.org/html/rfc8878). If any ambiguity arises
Review Comment:
Nice catch!
##########
LogicalTypes.md:
##########
@@ -219,15 +219,15 @@ scale stores the number of digits of that value that are
to the right of the
decimal point, and the precision stores the maximum number of digits supported
in the unscaled value.
-If not specified, the scale is 0. Scale must be zero or a positive integer less
-than or equal to the precision. Precision is required and must be a non-zero
positive
+If not specified, the scale is 0. Scale must be a non-negative integer less
+than or equal to the precision. Precision is required and must be a positive
Review Comment:
I think this change is less clear, even if correct.
##########
README.md:
##########
@@ -172,7 +172,7 @@ be computed from the schema (i.e. how much nesting there
is). This defines the
maximum number of bits required to store the levels (levels are defined for all
values in the column).
-Two encodings for the levels are supported BIT_PACKED and RLE. Only RLE is now
used as it supersedes BIT_PACKED.
+Two encodings for the levels are supported: `BIT_PACKED` and `RLE`. Only `RLE`
is now used as it supersedes `BIT_PACKED`.
Review Comment:
```suggestion
Two encodings for the levels are supported: `BIT_PACKED` and `RLE`. Only
`RLE` is currently used as it supersedes `BIT_PACKED`.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]