etseidl commented on code in PR #578: URL: https://github.com/apache/parquet-format/pull/578#discussion_r3344520269
########## BinaryProtocolExtensions.md: ########## @@ -26,11 +26,11 @@ The extension mechanism of the `binary` Thrift field-id `32767` has some desirab * The content of the extension is freeform and can be encoded in any format. This format is not restricted to Thrift. * Extensions can be appended to existing Thrift serialized structs [without requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation (or changes to the thrift IDL). -Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers which MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers. +Because only one field-id is reserved the extension bytes themselves require disambiguation; otherwise readers will not be able to decode extensions safely. This is left to implementers who MUST put enough unique state in their extension bytes for disambiguation. This can be relatively easily achieved by adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) at the start or end of the extension bytes. The extension does not specify a disambiguation mechanism to allow more flexibility to implementers. Review Comment: Gads this could use some line breaks 😅 ########## CONTRIBUTING.md: ########## @@ -73,35 +73,35 @@ The general steps for adding features to the format are as follows: fit for inclusion (for example, they were submitted as a pull request against the target repository and committers gave positive reviews). Reports on the benefits from closed source implementations are welcome and can help lend - weight to features desirability but are not sufficient for acceptance of a + weight to a feature's desirability but are not sufficient for acceptance of a new feature. Unless otherwise discussed, it is expected the implementations will be developed from their respective main branch (i.e. backporting is not required), to demonstrate that the feature is mergeable to its implementation. -3. Ratification: After the first two steps are complete a formal vote is held on +3. Ratification: After the first two steps are complete, a formal vote is held on [email protected] to officially ratify the feature. After the vote - passes the format change is merged into the `parquet-format` repository and + passes, the format change is merged into the `parquet-format` repository and it is expected the changes from step 2 will also be merged soon after (implementations should not be merged until the addition has been merged to `parquet-format`). -#### General guidelines/preferences on additions. +#### General guidelines/preferences on additions 1. To the greatest extent possible changes should have an option for forward compatibility (old readers can still read files). The [compatibility and feature enablement](#compatibility-and-feature-enablement) section below provides more details on expectations for changes that break compatibility. 2. New encodings should be fully specified in this repository and not - rely on an external dependencies for implementation (i.e. `parquet-format` is + rely on an external dependency for implementation (i.e. `parquet-format` is Review Comment: ```suggestion rely on external dependencies for implementation (i.e. `parquet-format` is ``` ########## Encryption.md: ########## @@ -136,9 +136,9 @@ one IV is ever repeated, then the implementation may be vulnerable"*. *"Complian requirement is crucial to the security of GCM"*. The bulk of modules in a Parquet file are page headers and data pages. Therefore, one encryption -key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet files encrypted with -multiple keys (footer and column keys), the constraint on the number of invocations is applied -to each key separately. +key shall not be used for more than 2^32 total module encryptions, as per the NIST specification. Review Comment: Perhaps we should still point out that 2^32 modules means in practice 2^31 pages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
