iemejia commented on code in PR #578:
URL: https://github.com/apache/parquet-format/pull/578#discussion_r3345418382


##########
BinaryProtocolExtensions.md:
##########
@@ -26,11 +26,11 @@ The extension mechanism of the `binary` Thrift field-id 
`32767` has some desirab
 * The content of the extension is freeform and can be encoded in any format. 
This format is not restricted to Thrift.  
 * Extensions can be appended to existing Thrift serialized structs [without 
requiring Thrift libraries](#appending-extensions-to-thrift) for manipulation 
(or changes to the thrift IDL).
 
-Because only one field-id is reserved the extension bytes themselves require 
disambiguation; otherwise readers will not be able to decode extensions safely. 
This is left to implementers which MUST put enough unique state in their 
extension bytes for disambiguation. This can be relatively easily achieved by 
adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) 
at the start or end of the extension bytes. The extension does not specify a 
disambiguation mechanism to allow more flexibility to implementers.
+Because only one field-id is reserved the extension bytes themselves require 
disambiguation; otherwise readers will not be able to decode extensions safely. 
This is left to implementers who MUST put enough unique state in their 
extension bytes for disambiguation. This can be relatively easily achieved by 
adding a [UUID](https://en.wikipedia.org/wiki/Universally\_unique\_identifier) 
at the start or end of the extension bytes. The extension does not specify a 
disambiguation mechanism to allow more flexibility to implementers.

Review Comment:
   Done, broke the paragraph into shorter lines.



##########
Encryption.md:
##########
@@ -136,9 +136,9 @@ one IV is ever repeated, then the implementation may be 
vulnerable"*. *"Complian
 requirement is crucial to the security of GCM"*.
 
 The bulk of modules in a Parquet file are page headers and data pages. 
Therefore, one encryption 
-key shall not not be used for more than 2^31 (~2 billion) pages. In Parquet 
files encrypted with 
-multiple keys (footer and column keys), the constraint on the number of 
invocations is applied 
-to each key separately.
+key shall not be used for more than 2^32 total module encryptions, as per the 
NIST specification.

Review Comment:
   Good point. Added a sentence clarifying that since each data page requires 
two module encryptions (header + data), 2^32 modules means in practice no more 
than 2^31 pages per key.



##########
CONTRIBUTING.md:
##########
@@ -73,35 +73,35 @@ The general steps for adding features to the format are as 
follows:
    fit for inclusion (for example, they were submitted as a pull request 
against
    the target repository and committers gave positive reviews). Reports on the
    benefits from closed source implementations are welcome and can help lend
-   weight to features desirability but are not sufficient for acceptance of a
+   weight to a feature's desirability but are not sufficient for acceptance of 
a
    new feature.
 
 Unless otherwise discussed, it is expected the implementations will be 
developed
 from their respective main branch (i.e. backporting is not required), to
 demonstrate that the feature is mergeable to its implementation.
 
-3. Ratification: After the first two steps are complete a formal vote is held 
on
+3. Ratification: After the first two steps are complete, a formal vote is held 
on
    [email protected] to officially ratify the feature.  After the vote
-   passes the format change is merged into the `parquet-format` repository and
+   passes, the format change is merged into the `parquet-format` repository and
    it is expected the changes from step 2 will also be merged soon after
    (implementations should not be merged until the addition has been merged to
    `parquet-format`).
 
-#### General guidelines/preferences on additions.
+#### General guidelines/preferences on additions
 
 1. To the greatest extent possible changes should have an option for forward
    compatibility (old readers can still read files). The [compatibility and
    feature enablement](#compatibility-and-feature-enablement) section below 
    provides more details on expectations for changes that break compatibility.
 
 2. New encodings should be fully specified in this repository and not
-   rely on an external dependencies for implementation (i.e. `parquet-format` 
is
+   rely on an external dependency for implementation (i.e. `parquet-format` is

Review Comment:
   Applied, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to