linliu-code commented on code in PR #13563:
URL: https://github.com/apache/hudi/pull/13563#discussion_r2237332023
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -455,6 +456,45 @@ public final class HoodieMetadataConfig extends
HoodieConfig {
+ "The index name either starts with or matches exactly can be one
of the following: "
+
StringUtils.join(Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toList()),
", "));
+ // Configs that control the bloom filter that is written to the file footer
+ public static final ConfigProperty<String> BLOOM_FILTER_TYPE = ConfigProperty
+ .key(String.format("%s.%s", METADATA_PREFIX, "bloom.index.filter.type"))
+ .defaultValue(BloomFilterTypeCode.DYNAMIC_V0.name())
+ .withValidValues(BloomFilterTypeCode.SIMPLE.name(),
BloomFilterTypeCode.DYNAMIC_V0.name())
+ .markAdvanced()
+ .withDocumentation(BloomFilterTypeCode.class);
+
+ public static final ConfigProperty<String> BLOOM_FILTER_NUM_ENTRIES_VALUE =
ConfigProperty
+ .key(String.format("%s.%s", METADATA_PREFIX, "index.bloom.num_entries"))
+ .defaultValue("60000")
+ .markAdvanced()
+ .withDocumentation("Only applies if index type is BLOOM. "
+ + "This is the number of entries to be stored in
the bloom filter. "
+ + "The rationale for the default: Assume the
maxParquetFileSize is 128MB and averageRecordSize is 1kb and "
+ + "hence we approx a total of 130K records in a
file. The default (60000) is roughly half of this approximation. "
+ + "Warning: Setting this very low, will generate
a lot of false positives and index lookup "
+ + "will have to scan a lot more files than it has
to and setting this to a very high number will "
+ + "increase the size every base file linearly
(roughly 4KB for every 50000 entries). "
+ + "This config is also used with DYNAMIC bloom
filter which determines the initial size for the bloom.");
+
+ public static final ConfigProperty<String> BLOOM_FILTER_FPP_VALUE =
ConfigProperty
+ .key(String.format("%s.%s", METADATA_PREFIX, "index.bloom.fpp"))
+ .defaultValue("0.000000001")
Review Comment:
How is the value calculated?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]