emkornfield commented on code in PR #481: URL: https://github.com/apache/parquet-format/pull/481#discussion_r1918991446
########## VariantEncoding.md: ########## @@ -399,6 +399,7 @@ The Decimal type contains a scale, but no precision. The implied precision of a | Timestamp | timestamp with time zone | `22` | TIMESTAMP(isAdjustedToUTC=true, NANOS) | 8-byte little-endian | | TimestampNTZ | timestamp without time zone | `23` | TIMESTAMP(isAdjustedToUTC=false, NANOS) | 8-byte little-endian | | UUID | uuid | `24` | UUID | 16-byte big-endian | +| Fixed(L) | Byte array of length L | `25` | FIXED_LEN_BYTE_ARRAY[L] | 4 byte little-endian size L, followed by length-L big-endian bytes | Review Comment: - I'm not sure endianness makes sense for fixed(L), endianess only applies to multi-bytes structures? Fixed(L) each bytes is independent. - I think the current proposal is reasonable and matches how things like decimal with arbitrary precisions are encoded. It is also consistent with string representation, if we are worried about overhead of 4 bytes then we could use a variable width encoding schema (or have two types Short-fixed(L) with 1 byte and fixed(L) with 4 buytes. Unfortunately, IIUC we can't have a 'short-fixed L' like we have for string because I think we are already use the entire number range there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org