Thank you Zhao Yang for starting this topic After reading the short design doc, I have a few questions
1) SASI was pretty inefficient indexing wide partitions because the index structure only retains the partition token, not the clustering colums. As per design doc SAI has row id mapping to partition offset, can we hope that indexing wide partition will be more efficient with SAI ? One detail that worries me is that in the beggining of the design doc, it is said that the matching rows are post filtered while scanning the partition. Can you confirm or infirm that SAI is efficient with wide partitions and provides the partition offsets to the matching rows ? 2) About space efficiency, one of the biggest drawback of SASI was the huge space required for index structure when using CONTAINS logic because of the decomposition of text columns into n-grams. Will SAI suffer from the same issue in future iterations ? I'm anticipating a bit 3) If I'm querying using SAI and providing complete partition key, will it be more efficient than querying without partition key. In other words, does SAI provide any optimisation when partition key is specified ? Regards Duy Hai DOAN Le mar. 18 août 2020 à 11:39, Mick Semb Wever <m...@apache.org> a écrit : > > > > We are looking forward to the community's feedback and suggestions. > > > > > What comes immediately to mind is testing requirements. It has been > mentioned already that the project's testability and QA guidelines are > inadequate to successfully introduce new features and refactorings to the > codebase. During the 4.0 beta phase this was intended to be addressed, i.e. > defining more specific QA guidelines for 4.0-rc. This would be an important > step towards QA guidelines for all changes and CEPs post-4.0. > > Questions from me > - How will this be tested, how will its QA status and lifecycle be > defined? (per above) > - With existing C* code needing to be changed, what is the proposed plan > for making those changes ensuring maintained QA, e.g. is there separate QA > cycles planned for altering the SPI before adding a new SPI implementation? > - Despite being out of scope, it would be nice to have some idea from the > CEP author of when users might still choose afresh 2i or SASI over SAI, > - Who fills the roles involved? Who are the contributors in this DataStax > team? Who is the shepherd? Are there other stakeholders willing to be > involved? > - Is there a preference to use gdoc instead of the project's wiki, and > why? (the CEP process suggest a wiki page, and feedback on why another > approach is considered better helps evolve the CEP process itself) > > cheers, > Mick >