aihuaxu commented on code in PR #481: URL: https://github.com/apache/parquet-format/pull/481#discussion_r1918976484
########## VariantEncoding.md: ########## @@ -399,6 +399,7 @@ The Decimal type contains a scale, but no precision. The implied precision of a | Timestamp | timestamp with time zone | `22` | TIMESTAMP(isAdjustedToUTC=true, NANOS) | 8-byte little-endian | | TimestampNTZ | timestamp without time zone | `23` | TIMESTAMP(isAdjustedToUTC=false, NANOS) | 8-byte little-endian | | UUID | uuid | `24` | UUID | 16-byte big-endian | +| Fixed(L) | Byte array of length L | `25` | FIXED_LEN_BYTE_ARRAY[L] | 4 byte little-endian size L, followed by length-L big-endian bytes | Review Comment: big-endian bytes: this is to keep in sync with the others like UUID which is a fixed(16). And it makes sense to write a bytes in big endian since the engine can write the bytes in the buffer in order, not requiring buffering the whole string. The required size: I initially avoided adding the fixed(L) type because I believed we couldn't support fixed(L) if we try to include L in the type description, as there wouldn't be enough bits available to represent the length, given that only 5 bits are allocated for the type. The way here to add the fixed(L) type is to add the length in the value field - we are duplicating the length for each value but I don't see other ways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org