hoshinojyunn opened a new pull request, #64652:
URL: https://github.com/apache/doris/pull/64652

   Assumption / semantic boundary:
   A named bloom filter index and a property-managed bloom filter index cannot 
be defined on the same column. The two metadata forms are managed separately, 
and users must use `DROP INDEX` for named bloom filter indexes and `ALTER TABLE 
SET("bloom_filter_columns" = ...)` for property-managed bloom filter indexes.
   
   Problem Summary:
   Previously Doris only supported bloom filter indexes through table 
properties such as `"bloom_filter_columns"` and `"bloom_filter_fpp"`. That made 
bloom filter indexes behave differently from other index types and prevented 
users from managing them with named `INDEX` / `CREATE INDEX` / `DROP INDEX` 
syntax.
   
   This change adds named `USING BLOOMFILTER` syntax in the Nereids parser and 
FE DDL pipeline, keeps the existing table-property bloom filter behavior for 
legacy metadata, and enforces that legacy bloom filter columns and named bloom 
filter indexes cannot be defined on the same column. The schema change path now 
distinguishes legacy bloom filter management from named bloom filter 
management, while FE->BE materialization continues to apply the table-level 
bloom filter fpp to both forms.
   
   The patch also fixes bloom filter materialization on shadow columns and 
ensures tablet metadata marks named bloom filter columns correctly on the BE 
side. FE unit tests and bloom filter regression tests are added to cover parser 
analysis, semantic validation, schema change checks, FE->BE task generation, 
and named bloom filter DDL behavior.
   
   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   **Assumption / semantic boundary:**
   
   A named bloom filter index and a property-managed bloom filter index cannot 
be
   defined on the same column. They are tracked as two separate metadata forms.
   `DROP INDEX` only applies to named bloom filter indexes, while
   `ALTER TABLE SET("bloom_filter_columns" = ...)` only applies to
   property-managed bloom filter indexes.
   
   
   Bloom filter indexes in Doris were historically managed only by table 
properties:
   
   - `"bloom_filter_columns"`
   - `"bloom_filter_fpp"`
   
   That created three problems:
   
   1. Bloom filter indexes behaved differently from other index types and could 
not use
      named `INDEX` / `CREATE INDEX` / `DROP INDEX` syntax.
   2. FE schema change logic only tracked the legacy property-based bloom filter
      definition, which made the metadata model awkward once named bloom filter 
indexes
      were introduced.
   3. FE->BE materialization needed to recognize both legacy bloom filter 
columns and
      named bloom filter indexes, including schema change shadow columns.
   
   This PR introduces named bloom filter indexes with `USING BLOOMFILTER`, 
supports both
   inline table-definition syntax and standalone `CREATE INDEX`, and keeps 
compatibility
   with legacy table-property bloom filters. To avoid semantic ambiguity, a 
column cannot
   be managed by both legacy bloom filter properties and a named bloom filter 
index at
   the same time.
   
   ### Main changes
   
   1. Parser and analysis
      - Add `BLOOMFILTER` keyword in the Nereids lexer/parser.
      - Parse `USING BLOOMFILTER` in table index definitions and `CREATE INDEX`.
      - Extend `IndexDefinition` semantic checks for bloom filter indexes.
      - Reject index-level properties on bloom filter indexes and keep bloom 
filter fpp
        as a table-level property.
   
   2. FE DDL and schema change semantics
      - Add helpers to extract named bloom filter columns from index metadata.
      - Keep legacy property-managed bloom filter columns and named bloom 
filter indexes
        as separate metadata sources.
      - Reject defining both forms on the same column during create/alter.
      - Preserve legacy `ALTER TABLE SET("bloom_filter_columns" = ...)` 
semantics for
        property-managed bloom filter creation and deletion.
      - Restrict `DROP INDEX` to named bloom filter indexes and return a 
clearer error if
        the target refers to a legacy property-managed bloom filter column.
   
   3. FE->BE materialization
      - Materialize bloom filter flags when either legacy or named bloom filter 
metadata
        applies to a column.
      - Reuse the table-level bloom filter fpp for both forms.
      - Normalize shadow column names before bloom filter matching so 
schema-changed
        columns inherit the expected bloom filter metadata.
      - Mark named bloom filter columns correctly in tablet metadata on the BE 
side.
   
   4. Tests
      - Add FE parser/analyzer/DDL/materialization unit tests.
      - Add regression coverage for named bloom filter DDL and schema change 
behavior.
   
   ### Release note
   
   Users can now create and manage bloom filter indexes with named index syntax 
while
   continuing to use legacy table-property bloom filter definitions. Doris 
rejects
   conflicting legacy and named bloom filter definitions on the same column.
   
   ## Test Execution
   
   1. Build
      - Command: `./build.sh --be --fe`
      - Result: success, `Successfully build Doris`
   
   2. FE unit tests
      - Command:
        `./run-fe-ut.sh --run 
org.apache.doris.catalog.ColumnBloomFilterMaterializationTest,org.apache.doris.catalog.CreateTableWithBloomFilterIndexTest,org.apache.doris.cloud.datasource.CloudInternalCatalogBloomFilterMaterializationTest,org.apache.doris.nereids.parser.NereidsParserTest,org.apache.doris.nereids.trees.plans.commands.IndexDefinitionTest,org.apache.doris.task.AgentTaskTest,org.apache.doris.alter.SchemaChangeHandlerTest`
      - Result: `Tests run: 191, Failures: 0, Errors: 0, Skipped: 0`
   
   3. BE unit tests: bloom filter focused
      - Command:
        `./run-be-ut.sh --run 
--filter='*BloomFilter*:*bloom_filter*:*test_write_bf_with_finalize'`
      - Result: `Running 90 tests from 12 test suites`, `90 tests passed`
   
   4. BE unit tests: tablet metadata / schema / protobuf conversion
      - Command:
        `./run-be-ut.sh --run 
--filter='TabletSchemaTest.*:TabletMetaTest.*:PbConvert.*:TabletIndexTest.*:TabletSchemaIndexTest.*'`
      - Result: `Running 48 tests from 5 test suites`, `48 tests passed`
   
   5. Regression tests
      - Command:
        `./run-regression-test.sh --run -d bloom_filter_p0 -s test_bloom_filter`
      - Result: `Test 1 suites, failed 0 suites, fatal 0 scripts, skipped 0 
scripts`
      - Coverage note: verified legacy bloom filter DDL plus `ALTER TABLE 
SET("bloom_filter_fpp"=...)`
        by asserting BE `BloomFilterIndexWriter::create` receives `0.03`, then 
`0.02` during schema
        change rewrite, then `0.03` again for newly inserted rowsets after the 
fpp restore.
      - Command:
        `./run-regression-test.sh --run -d bloom_filter_p0 -s 
test_bloom_filter_named_index`
      - Result: `Test 1 suites, failed 0 suites, fatal 0 scripts, skipped 0 
scripts`
      - Coverage note: verified named `USING BLOOMFILTER` inline DDL, 
standalone `CREATE INDEX`,
        `DROP INDEX`, conflict checks with legacy `bloom_filter_columns`, and 
table-level
        `bloom_filter_fpp` propagation to named bloom filter indexes.
   
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to