hoshinojyunn opened a new pull request, #64652:
URL: https://github.com/apache/doris/pull/64652
Assumption / semantic boundary:
A named bloom filter index and a property-managed bloom filter index cannot
be defined on the same column. The two metadata forms are managed separately,
and users must use `DROP INDEX` for named bloom filter indexes and `ALTER TABLE
SET("bloom_filter_columns" = ...)` for property-managed bloom filter indexes.
Problem Summary:
Previously Doris only supported bloom filter indexes through table
properties such as `"bloom_filter_columns"` and `"bloom_filter_fpp"`. That made
bloom filter indexes behave differently from other index types and prevented
users from managing them with named `INDEX` / `CREATE INDEX` / `DROP INDEX`
syntax.
This change adds named `USING BLOOMFILTER` syntax in the Nereids parser and
FE DDL pipeline, keeps the existing table-property bloom filter behavior for
legacy metadata, and enforces that legacy bloom filter columns and named bloom
filter indexes cannot be defined on the same column. The schema change path now
distinguishes legacy bloom filter management from named bloom filter
management, while FE->BE materialization continues to apply the table-level
bloom filter fpp to both forms.
The patch also fixes bloom filter materialization on shadow columns and
ensures tablet metadata marks named bloom filter columns correctly on the BE
side. FE unit tests and bloom filter regression tests are added to cover parser
analysis, semantic validation, schema change checks, FE->BE task generation,
and named bloom filter DDL behavior.
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
**Assumption / semantic boundary:**
A named bloom filter index and a property-managed bloom filter index cannot
be
defined on the same column. They are tracked as two separate metadata forms.
`DROP INDEX` only applies to named bloom filter indexes, while
`ALTER TABLE SET("bloom_filter_columns" = ...)` only applies to
property-managed bloom filter indexes.
Bloom filter indexes in Doris were historically managed only by table
properties:
- `"bloom_filter_columns"`
- `"bloom_filter_fpp"`
That created three problems:
1. Bloom filter indexes behaved differently from other index types and could
not use
named `INDEX` / `CREATE INDEX` / `DROP INDEX` syntax.
2. FE schema change logic only tracked the legacy property-based bloom filter
definition, which made the metadata model awkward once named bloom filter
indexes
were introduced.
3. FE->BE materialization needed to recognize both legacy bloom filter
columns and
named bloom filter indexes, including schema change shadow columns.
This PR introduces named bloom filter indexes with `USING BLOOMFILTER`,
supports both
inline table-definition syntax and standalone `CREATE INDEX`, and keeps
compatibility
with legacy table-property bloom filters. To avoid semantic ambiguity, a
column cannot
be managed by both legacy bloom filter properties and a named bloom filter
index at
the same time.
### Main changes
1. Parser and analysis
- Add `BLOOMFILTER` keyword in the Nereids lexer/parser.
- Parse `USING BLOOMFILTER` in table index definitions and `CREATE INDEX`.
- Extend `IndexDefinition` semantic checks for bloom filter indexes.
- Reject index-level properties on bloom filter indexes and keep bloom
filter fpp
as a table-level property.
2. FE DDL and schema change semantics
- Add helpers to extract named bloom filter columns from index metadata.
- Keep legacy property-managed bloom filter columns and named bloom
filter indexes
as separate metadata sources.
- Reject defining both forms on the same column during create/alter.
- Preserve legacy `ALTER TABLE SET("bloom_filter_columns" = ...)`
semantics for
property-managed bloom filter creation and deletion.
- Restrict `DROP INDEX` to named bloom filter indexes and return a
clearer error if
the target refers to a legacy property-managed bloom filter column.
3. FE->BE materialization
- Materialize bloom filter flags when either legacy or named bloom filter
metadata
applies to a column.
- Reuse the table-level bloom filter fpp for both forms.
- Normalize shadow column names before bloom filter matching so
schema-changed
columns inherit the expected bloom filter metadata.
- Mark named bloom filter columns correctly in tablet metadata on the BE
side.
4. Tests
- Add FE parser/analyzer/DDL/materialization unit tests.
- Add regression coverage for named bloom filter DDL and schema change
behavior.
### Release note
Users can now create and manage bloom filter indexes with named index syntax
while
continuing to use legacy table-property bloom filter definitions. Doris
rejects
conflicting legacy and named bloom filter definitions on the same column.
## Test Execution
1. Build
- Command: `./build.sh --be --fe`
- Result: success, `Successfully build Doris`
2. FE unit tests
- Command:
`./run-fe-ut.sh --run
org.apache.doris.catalog.ColumnBloomFilterMaterializationTest,org.apache.doris.catalog.CreateTableWithBloomFilterIndexTest,org.apache.doris.cloud.datasource.CloudInternalCatalogBloomFilterMaterializationTest,org.apache.doris.nereids.parser.NereidsParserTest,org.apache.doris.nereids.trees.plans.commands.IndexDefinitionTest,org.apache.doris.task.AgentTaskTest,org.apache.doris.alter.SchemaChangeHandlerTest`
- Result: `Tests run: 191, Failures: 0, Errors: 0, Skipped: 0`
3. BE unit tests: bloom filter focused
- Command:
`./run-be-ut.sh --run
--filter='*BloomFilter*:*bloom_filter*:*test_write_bf_with_finalize'`
- Result: `Running 90 tests from 12 test suites`, `90 tests passed`
4. BE unit tests: tablet metadata / schema / protobuf conversion
- Command:
`./run-be-ut.sh --run
--filter='TabletSchemaTest.*:TabletMetaTest.*:PbConvert.*:TabletIndexTest.*:TabletSchemaIndexTest.*'`
- Result: `Running 48 tests from 5 test suites`, `48 tests passed`
5. Regression tests
- Command:
`./run-regression-test.sh --run -d bloom_filter_p0 -s test_bloom_filter`
- Result: `Test 1 suites, failed 0 suites, fatal 0 scripts, skipped 0
scripts`
- Coverage note: verified legacy bloom filter DDL plus `ALTER TABLE
SET("bloom_filter_fpp"=...)`
by asserting BE `BloomFilterIndexWriter::create` receives `0.03`, then
`0.02` during schema
change rewrite, then `0.03` again for newly inserted rowsets after the
fpp restore.
- Command:
`./run-regression-test.sh --run -d bloom_filter_p0 -s
test_bloom_filter_named_index`
- Result: `Test 1 suites, failed 0 suites, fatal 0 scripts, skipped 0
scripts`
- Coverage note: verified named `USING BLOOMFILTER` inline DDL,
standalone `CREATE INDEX`,
`DROP INDEX`, conflict checks with legacy `bloom_filter_columns`, and
table-level
`bloom_filter_fpp` propagation to named bloom filter indexes.
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]