zhiqiang-hhhh opened a new issue, #54859:
URL: https://github.com/apache/doris/issues/54859

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Description
   
   为了更好地追踪 ann 索引的开发进度,这里列出 ann 索引的一些 todo :
    
   1. 支持 IVF 算法(需要调研 IVF 算法的第一个学习阶段如何实现)难度较大
   2. 实现 Rerank 功能(需要调研这个功能在其他系统里的实现,怎么在doris里实现)难度较大
   3. CREATE INDEX/BUILD INDEX (开发难度不大,但是需要调试和大量测试,尤其是Cloud环境)
   4. 性能:AnnIndexCache 
(需要详细设计缓存策略,AnnIndex为了性能需要全内存,并且导入或者compaction后需要立刻更新cache)
   5. 性能:IndexOnly-Scan(索引分析过后消除不必要的DataPage的IO,实现难度较大, doris 这里的代码很复杂,需要理解 
SegmengIterator 的延迟物化,虚拟列)
   6. 性能:目前 l2_distance_approximate 和 inner_product_approximate 的返回值类型是 
NullableDouble,原因是为了复用已有距离函数的实现,理论上这里最好改成 Float 作为返回值(难度较小)
   
   ------
   1. **Support IVF algorithm** — Requires research on how to implement the 
first training phase of the IVF algorithm; high difficulty.
   2. **Implement Rerank feature** — Requires research on how this feature is 
implemented in other systems and how to integrate it into Doris; high 
difficulty.
   3. **CREATE INDEX / BUILD INDEX** — Development difficulty is low, but 
requires debugging and extensive testing, especially in the Cloud environment.
   4. **Performance: AnnIndexCache** — Needs a detailed cache strategy design. 
For performance reasons, AnnIndex must be fully in-memory, and the cache must 
be updated immediately after data import or compaction.
   5. **Performance: IndexOnly-Scan** — After index analysis, eliminate 
unnecessary DataPage I/O. Implementation difficulty is high, as Doris code here 
is complex and requires understanding of `SegmentIterator`’s late 
materialization and virtual columns.
   6. **Performance: Change return type of `l2_distance_approximate` and 
`inner_product_approximate`** — Currently the return type is `NullableDouble` 
to reuse existing distance function implementations. Ideally, it should be 
changed to `Float` as the return type; low difficulty.
   
   
   
   
   ### Solution
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to