iemejia opened a new pull request, #572:
URL: https://github.com/apache/parquet-format/pull/572

   ## Summary
   
   This PR fixes numerous typos, grammar issues, inconsistencies, and minor 
errors across the Parquet format specification documents. The changes span 13 
files with 4 focused cleanup commits.
   
   ## Changes
   
   ### Commit 1: Fix specification inconsistencies, typos, and errors
   - **BloomFilter.md**: Fix `block_check` pseudocode (`setBit` -> `isSet`); 
fix struct name to match thrift
   - **parquet.thrift**: Fix typos ("to be be", "documention", "not 
necessary"); remove off-by-one in DataPageHeaderV2 comment
   - **README.md**: Fix repetition level value for non-nested columns (1 -> 0); 
update defunct Twitter CoC links to ASF
   - **LogicalTypes.md**: Fix embedded types ordering contradiction; add 
nanosecond to TIME precision
   - **VariantEncoding.md**: Fix BINARY -> BYTE_ARRAY; add decimal endianness 
note
   - **Compression.md**: Fix ZSTD RFC reference (8478 -> 8878)
   - **Encryption.md**: Fix double-negative; align GCM invocation limit to NIST
   - **Encodings.md**: Remove misleading "always preferred" claim for 
DELTA_LENGTH_BYTE_ARRAY
   
   ### Commit 2: Fix more specification inconsistencies and clarify ambiguous 
descriptions
   - **PageIndex.md, parquet.thrift**: Fix double-quote typo
   - **VariantShredding.md**: Fix Python syntax error; replace BINARY with 
BYTE_ARRAY
   - **BloomFilter.md**: Include missing `bloom_filter_length` field
   - **Encodings.md**: "bitwidth of each block" -> "each miniblock"
   - **LogicalTypes.md**: Align DECIMAL precision/scale wording with thrift
   - **Geospatial.md**: Use uppercase edge-interpolation algorithm names to 
match thrift enum
   - **VariantEncoding.md**: Label undocumented reserved bits; fix decimal 
implied-precision formula
   
   ### Commit 3: Fix additional typos, grammar, invalid HTML, and consistency 
issues (28 fixes)
   - **CONTRIBUTING.md**: 7 typos (docuemnt, interopability, libaries, etc.)
   - **Encryption.md**: 6 fixes (plural agreement, explictly, smart quotes, 
double spaces)
   - **LogicalTypes.md**: 7 fixes (invalid `<tr colspan=3>`, NaN casing, 
grammar)
   - **parquet.thrift**: 4 fixes (article agreement, terminal periods, 
BIT_PACKED comment)
   - **Encodings.md, Compression.md, PageIndex.md, VariantEncoding.md**: Minor 
fixes
   
   ### Commit 4: Fix additional typos, grammar, hyphenation, and consistency 
issues (52 fixes)
   - **parquet.thrift**: Article agreement, edge interpolation, proper noun 
capitalization
   - **Geospatial.md**: Compound adjectives, comma splices, heading formatting
   - **LogicalTypes.md**: Grammar, Oxford commas, "can not" -> "cannot"
   - **README.md**: Plural agreement, compound adjective hyphenation, proper 
nouns
   - **BinaryProtocolExtensions.md**: FileMetaData casing, FlatBuffers 
capitalization
   - **Encodings.md, Compression.md, Encryption.md, BloomFilter.md, 
PageIndex.md, VariantShredding.md, VariantEncoding.md**: Various grammar, 
punctuation, and consistency fixes
   
   ## Validation
   - Thrift definition compiles cleanly after all changes
   - No semantic/behavioral changes to the format specification
   - All fixes are documentation-only (typos, grammar, consistency, correctness 
of descriptions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to