alamb commented on code in PR #577:
URL: https://github.com/apache/parquet-format/pull/577#discussion_r3350059798
##########
BloomFilter.md:
##########
@@ -307,21 +307,29 @@ union BloomFilterCompression {
* Bloom filter header is stored at beginning of Bloom filter data of each
column
* and followed by its bitset.
**/
-struct BloomFilterPageHeader {
- /** The size of bitset in bytes **/
+struct BloomFilterHeader {
+ /** The size of bitset in bytes. **/
1: required i32 numBytes;
/** The algorithm for setting bits. **/
2: required BloomFilterAlgorithm algorithm;
/** The hash function used for Bloom filter. **/
3: required BloomFilterHash hash;
- /** The compression used in the Bloom filter **/
+ /** The compression used in the Bloom filter. **/
4: required BloomFilterCompression compression;
}
struct ColumnMetaData {
...
/** Byte offset from beginning of file to Bloom filter data. **/
14: optional i64 bloom_filter_offset;
+
Review Comment:
Verified it is in
https://github.com/apache/parquet-format/blob/a7d9dd9bbffb4e45838d8e51747a4d48055d3d0a/src/main/thrift/parquet.thrift#L934-L940
##########
BloomFilter.md:
##########
@@ -122,7 +122,7 @@ boolean block_check(block b, unsigned int32 x) {
for i in [0..7] {
for j in [0..31] {
if (masked.getWord(i).isSet(j)) {
- if (not b.getWord(i).setBit(j)) {
+ if (not b.getWord(i).isSet(j)) {
Review Comment:
that is a nice find
##########
BloomFilter.md:
##########
@@ -307,21 +307,29 @@ union BloomFilterCompression {
* Bloom filter header is stored at beginning of Bloom filter data of each
column
* and followed by its bitset.
**/
-struct BloomFilterPageHeader {
- /** The size of bitset in bytes **/
+struct BloomFilterHeader {
Review Comment:
I verified this matches what is in
https://github.com/apache/parquet-format/blob/a7d9dd9bbffb4e45838d8e51747a4d48055d3d0a/src/main/thrift/parquet.thrift#L798
##########
BloomFilter.md:
##########
@@ -307,21 +307,29 @@ union BloomFilterCompression {
* Bloom filter header is stored at beginning of Bloom filter data of each
column
* and followed by its bitset.
**/
-struct BloomFilterPageHeader {
- /** The size of bitset in bytes **/
+struct BloomFilterHeader {
+ /** The size of bitset in bytes. **/
Review Comment:
Any chance you can update parquet.thrift to match these changes?
https://github.com/apache/parquet-format/blob/a7d9dd9bbffb4e45838d8e51747a4d48055d3d0a/src/main/thrift/parquet.thrift#L798-L806
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]