Ryan19929 opened a new pull request, #49519:
URL: https://github.com/apache/doris/pull/49519
### What problem does this PR solve?
Issue Number: close #35411
Related PR: #xxx
Problem Summary:
This pull request adds support for the IK Tokenizer to the project. Key
changes include migrating the ik tokenizer from Java to C++, updating the
inverted index parser to work with IK, and adding new test cases for the IK
analyzer.
## IK Integration:
- `be/CMakeLists.txt`: Added installation of IK dict files to the output
directory.
## Inverted Index Parser Updates:
- `be/src/olap/inverted_index_parser.cpp`: Added support for the PARSER_IK
type in the `inverted_index_parser_type_to_string`,
`get_inverted_index_parser_type_from_string` and
`get_parser_mode_string_from_properties` functions.
- `be/src/olap/inverted_index_parser.h`: Defined PARSER_IK in the
InvertedIndexParserType enum and added the corresponding string constant.
- `be/src/olap/rowset/segment_v2/inverted_index/analyzer/analyzer.cpp`:
Included the IKAnalyzer header and updated the create_analyzer function to
handle the PARSER_IK type.
- `be/src/vec/functions/function_tokenize.cpp`: Update error message.
-
`fe/fe-core/src/main/java/org/apache/doris/analysis/InvertedIndexUtil.java`:
Added support for the IK analyzer.
## Test Cases
- `regression-test/suites/inverted_index_p0/test_ik_analyzer.groovy`: Added
a new test suite for the IK analyzer, including table creation, data insertion,
and query validation.
- `regression-test/suites/inverted_index_p0/test_tokenize.groovy`: Added
test cases for tokenizing text using the IK parser.
- `regression-test/data/inverted_index_p0/test_ik_analyzer.out`: Added
expected output for the IK analyzer test cases.
- `regression-test/data/inverted_index_p0/test_tokenize.out`: Added expected
output for the tokenization test cases using the IK parser.
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [X] Regression test
- [X] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [X] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [X] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR should
merge into -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]