Hi, I’d like to restart this thread.
We merged the row-aware branch to the SAI codebase just before Christmas and have subsequently updated the CEP to reflect these changes. I would like to move the discussion forward as to how we move this CEP towards a vote. MikeA > On 16 Sep 2021, at 19:49, DuyHai Doan <doanduy...@gmail.com> wrote: > > Good new Mike that row based indexing will be available, this was a major > lacking from SASI at that time ! > > Le jeu. 16 sept. 2021 à 15:38, Mike Adamson <madam...@datastax.com > <mailto:madam...@datastax.com>> a > écrit : > >> Hi, >> >> Just to keep this thread up to date with development progress, we will be >> adding row-aware support to SAI in the next few weeks. This is currently >> going through the final stages of review and testing. >> >> This feature also adds on-disk versioning to SAI. This allows SAI to >> support multiple on-disk formats during upgrades. >> >> I am mentioning this now because the CEP mentions “Partition Based >> Iteration” as an initial feature. We will change that to “Row Based >> Iteration” when the feature is merged. >> >> MikeA >> >>> On 15 Sep 2021, at 19:42, Caleb Rackliffe <calebrackli...@gmail.com> >> wrote: >>> >>> Hey there, >>> >>> In the spirit of trying to get as many possible objections to a >> successful >>> vote out of the way, I've added a "Challenges" section to the CEP: >>> >>> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index#CEP7:StorageAttachedIndex-Challenges >> < >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index#CEP7:StorageAttachedIndex-Challenges >> >> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index#CEP7:StorageAttachedIndex-Challenges> >>> >>> >>> Most of you will be familiar with these, but I think we need to be as >>> open/candid as possible about the potential risk they pose to SAI's >> broader >>> usability. I've described them from the point of view that they are not >>> intractable, but if anyone thinks they are, let's hash that disagreement >>> out. >>> >>> Thanks! >>> >>> On Thu, Sep 9, 2021 at 11:13 AM Patrick McFadin <pmcfa...@gmail.com >> <mailto:pmcfa...@gmail.com <mailto:pmcfa...@gmail.com>>> wrote: >>> >>>> +1 on introducing this in an incremental manner and after reading >> through >>>> CASSANDRA-16092 that seems like a perfect place to start. I see that >> work >>>> on that Jira has stopped until direction for CEP-7 has been voted in. >>>> >>>> I say start the vote and let's get this really valuable developer >> feature >>>> underway. >>>> >>>> Patrick >>>> >>>> On Tue, Sep 7, 2021 at 10:40 AM Caleb Rackliffe < >> calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> >>>> wrote: >>>> >>>>> So this thread stalled almost a year ago. (Wow, time flies when you're >>>>> trying to release 4.0.) My synthesis of the conversation to this point >> is >>>>> that while there are some open questions about testing >>>>> methodology/"definition of done" and our choice of particular on-disk >>>> data >>>>> structures, neither of these should be a serious obstacle to moving >>>> forward >>>>> w/ a vote. Having said that, is there anything left around the CEP that >>>> we >>>>> feel should prevent it from moving to a vote? >>>>> >>>>> In terms of how we would proceed from the point a vote passes, it seems >>>>> like there have been enough concerns around the proposed/necessary >>>> breaking >>>>> changes to the 2i API, that we will start development by introducing >>>>> components as incrementally as possible into a long-running feature >>>> branch >>>>> off trunk. (This work would likely start w/ *CASSANDRA-16092* >>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16092 >>>>> <https://issues.apache.org/jira/browse/CASSANDRA-16092>>, which we >> could >>>>> resolve as a sub-task of the SAI epic without interfering with other >>>> trunk >>>>> development likely destined for a 4.x minor, etc.) >>>>> >>>>> On Thu, Sep 24, 2020 at 2:47 AM Jasonstack Zhao Yang < >>>>> jasonstack.z...@gmail.com <mailto:jasonstack.z...@gmail.com>> wrote: >>>>> >>>>>>>> Question is: is this planned as a next step? >>>>>>>> If yes, how are we going to mark SAI as experimental until it gets >>>>>>>> row offsets? Also, it is likely that index format is going to change >>>>>> when >>>>>>>> row offsets are added, so my concern is that we may have to support >>>>> two >>>>>>>> versions of a format for a smooth migration. >>>>>> >>>>>> The goal is to support row-level index when merging SAI, I will update >>>>> the >>>>>> CEP about it. >>>>>> >>>>>>>> I think switching to row >>>>>>>> offsets also has a huge impact on interaction with SPRC and has some >>>>>>>> potential for optimisations. >>>>>> >>>>>> Can you share more details on the optimizations? >>>>>> >>>>>> >>>>>> >>>>>> On Thu, 24 Sep 2020 at 15:20, Oleksandr Petrov < >>>>> oleksandr.pet...@gmail.com <mailto:oleksandr.pet...@gmail.com> >>>>>>> >>>>>> wrote: >>>>>> >>>>>>>> But for improving overall index read performance, I think improving >>>>>> base >>>>>>> table read perf (because SAI/SASI executes LOTS of >>>>>>> SinglePartitionReadCommand after searching on-disk index) is more >>>>>> effective >>>>>>> than switching from Trie to Prefix BTree. >>>>>>> >>>>>>> I haven't suggested switching to Prefix B-Tree or any other >>>> structure, >>>>>> the >>>>>>> question was about rationale and motivation of picking one over the >>>>>> other, >>>>>>> which I am curious about for personal reasons/interests that lie >>>>> outside >>>>>> of >>>>>>> Cassandra. Having this listed in CEP could have been helpful for >>>> future >>>>>>> guidance. It's ok if this question is outside of the CEP scope. >>>>>>> >>>>>>> I also agree that there are many areas that require improvement >>>> around >>>>>> the >>>>>>> read/write path and 2i, many of which (even outside of base table >>>>> format >>>>>> or >>>>>>> read perf) can yield positive performance results. >>>>>>> >>>>>>>> FWIW, I personally look forward to receiving that contribution when >>>>> the >>>>>>> time is right. >>>>>>> >>>>>>> I am very excited for this contribution, too, and it looks like very >>>>>> solid >>>>>>> work. >>>>>>> >>>>>>> I have one more question, about "Upon resolving partition keys, rows >>>>> are >>>>>>> loaded using Cassandra’s internal partition read command across >>>>> SSTables >>>>>>> and are post filtered". One of the criticisms of SASI and reasons for >>>>>>> marking it as experimental was CASSANDRA-11990. I think switching to >>>>> row >>>>>>> offsets also has a huge impact on interaction with SPRC and has some >>>>>>> potential for optimisations. Question is: is this planned as a next >>>>> step? >>>>>>> If yes, how are we going to mark SAI as experimental until it gets >>>>>>> row offsets? Also, it is likely that index format is going to change >>>>> when >>>>>>> row offsets are added, so my concern is that we may have to support >>>> two >>>>>>> versions of a format for a smooth migration. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 24, 2020 at 6:53 AM Jasonstack Zhao Yang < >>>>>>> jasonstack.z...@gmail.com <mailto:jasonstack.z...@gmail.com>> wrote: >>>>>>> >>>>>>>>>> I think CEP should be more upfront with "eventually replace >>>>>>>>>> it" bit, since it raises the question about what the people who >>>>> are >>>>>>>> using >>>>>>>>>> other index implementations can expect. >>>>>>>> >>>>>>>> Will update the CEP to emphasize: SAI will replace other indexes. >>>>>>>> >>>>>>>>>> Unfortunately, I do not have an >>>>>>>>>> implementation sitting around for a direct comparison, but I can >>>>>>> imagine >>>>>>>>>> situations when B-Trees may perform better because of simpler >>>>>>>> construction. >>>>>>>>>> Maybe we should even consider prototyping a prefix B-Tree to >>>> have >>>>> a >>>>>>> more >>>>>>>>>> fair comparison. >>>>>>>> >>>>>>>> As long as prefix BTree supports range/prefix aggregation (which is >>>>>> used >>>>>>> to >>>>>>>> speed up >>>>>>>> range/prefix query when matching entire subtree), we can plug it in >>>>> and >>>>>>>> compare. It won't >>>>>>>> affect the CEP design which focuses on sharing data across indexes >>>>> and >>>>>>>> posting aggregation. >>>>>>>> >>>>>>>> But for improving overall index read performance, I think improving >>>>>> base >>>>>>>> table read perf >>>>>>>> (because SAI/SASI executes LOTS of SinglePartitionReadCommand >>>> after >>>>>>>> searching on-disk index) >>>>>>>> is more effective than switching from Trie to Prefix BTree. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, 24 Sep 2020 at 05:33, Benedict Elliott Smith < >>>>>>> bened...@apache.org <mailto:bened...@apache.org>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> FWIW, I personally look forward to receiving that contribution >>>> when >>>>>> the >>>>>>>>> time is right. >>>>>>>>> >>>>>>>>> On 23/09/2020, 18:45, "Josh McKenzie" <jmcken...@apache.org >>>>>>>>> <mailto:jmcken...@apache.org>> >>>>> wrote: >>>>>>>>> >>>>>>>>> talking about that would involve some bits of information >>>>>> DataStax >>>>>>>>> might >>>>>>>>> not be ready to share? >>>>>>>>> >>>>>>>>> At the risk of derailing, I've been poking and prodding this >>>>> week >>>>>>> at >>>>>>>> we >>>>>>>>> contributors at DS getting our act together w/a draft CEP for >>>>>>>> donating >>>>>>>>> the >>>>>>>>> trie-based indices to the ASF project. >>>>>>>>> >>>>>>>>> More to come; the intention is certainly to contribute that >>>>> code. >>>>>>> The >>>>>>>>> lack >>>>>>>>> of a destination to merge it into (i.e. no 5.0-dev branch) is >>>>>>>> removing >>>>>>>>> significant urgency from the process as well (not to open a >>>> 3rd >>>>>>>>> Pandora's >>>>>>>>> box), but there's certainly an interrelatedness to the >>>>>>> conversations >>>>>>>>> going >>>>>>>>> on. >>>>>>>>> >>>>>>>>> --- >>>>>>>>> Josh McKenzie >>>>>>>>> >>>>>>>>> >>>>>>>>> Sent via Superhuman < >>>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__sprh.mn_-3Fvip-3Djmckenzie-40apache.org&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=W153pibedwV7j_YCKUR0MVt-tPDUbvaHukx68pAo9zc&m=epkiu_3NED8CL23Ylg9qVnK7VfGLJGsT28TGXN6Wmc4&s=gJ7VsN1vFUYz0czKFU8Dv28TViVbCWWF1zE3ZQlxtWc&e= >> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__sprh.mn_-3Fvip-3Djmckenzie-40apache.org&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=W153pibedwV7j_YCKUR0MVt-tPDUbvaHukx68pAo9zc&m=epkiu_3NED8CL23Ylg9qVnK7VfGLJGsT28TGXN6Wmc4&s=gJ7VsN1vFUYz0czKFU8Dv28TViVbCWWF1zE3ZQlxtWc&e=> >> < >> https://urldefense.proofpoint.com/v2/url?u=https-3A__sprh.mn_-3Fvip-3Djmckenzie-40apache.org&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=W153pibedwV7j_YCKUR0MVt-tPDUbvaHukx68pAo9zc&m=epkiu_3NED8CL23Ylg9qVnK7VfGLJGsT28TGXN6Wmc4&s=gJ7VsN1vFUYz0czKFU8Dv28TViVbCWWF1zE3ZQlxtWc&e= >> >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__sprh.mn_-3Fvip-3Djmckenzie-40apache.org&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=W153pibedwV7j_YCKUR0MVt-tPDUbvaHukx68pAo9zc&m=epkiu_3NED8CL23Ylg9qVnK7VfGLJGsT28TGXN6Wmc4&s=gJ7VsN1vFUYz0czKFU8Dv28TViVbCWWF1zE3ZQlxtWc&e=>> >> >>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe < >>>>>>>>> calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> As long as we can construct the on-disk indexes >>>>>>>> efficiently/directly >>>>>>>>> from >>>>>>>>>> a Memtable-attached index on flush, there's room to try >>>> other >>>>>>> data >>>>>>>>>> structures. Most of the innovation in SAI is around the >>>>> layout >>>>>> of >>>>>>>>> postings >>>>>>>>>> (something we can expand on if people are interested) and >>>>>> having >>>>>>> a >>>>>>>>>> natively row-oriented design that scales w/ multiple >>>> indexed >>>>>>>> columns >>>>>>>>> on >>>>>>>>>> single SSTables. There are some broader implications of >>>> using >>>>>> the >>>>>>>>> trie that >>>>>>>>>> reach outside SAI itself, but talking about that would >>>>> involve >>>>>>> some >>>>>>>>> bits of >>>>>>>>>> information DataStax might not be ready to share? >>>>>>>>>> >>>>>>>>>> On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < >>>>>>>> jeremiah.jordan@ >>>>>>>>>> gmail.com <http://gmail.com/>> wrote: >>>>>>>>>> >>>>>>>>>> Short question: looking forward, how are we going to >>>> maintain >>>>>>> three >>>>>>>>> 2i >>>>>>>>>> implementations: SASI, SAI, and 2i? >>>>>>>>>> >>>>>>>>>> I think one of the goals stated in the CEP is for SAI to >>>> have >>>>>>>> parity >>>>>>>>> with >>>>>>>>>> 2i such that it could eventually replace it. >>>>>>>>>> >>>>>>>>>> On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov < >>>>>>>>>> >>>>>>>>>> oleksandr.pet...@gmail.com <mailto:oleksandr.pet...@gmail.com>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Short question: looking forward, how are we going to >>>> maintain >>>>>>> three >>>>>>>>> 2i >>>>>>>>>> implementations: SASI, SAI, and 2i? >>>>>>>>>> >>>>>>>>>> Another thing I think this CEP is missing is rationale and >>>>>>>> motivation >>>>>>>>>> about why trie-based indexes were chosen over, say, B-Tree. >>>>> We >>>>>>> did >>>>>>>>> have a >>>>>>>>>> short discussion about this on Slack, but both arguments >>>> that >>>>>>> I've >>>>>>>>> heard >>>>>>>>>> (space-saving and keeping a small subset of nodes in >>>> memory) >>>>>> work >>>>>>>>> only >>>>>>>>>> >>>>>>>>>> for >>>>>>>>>> >>>>>>>>>> the most primitive implementation of a B-Tree. >>>> Fully-occupied >>>>>>>> prefix >>>>>>>>>> >>>>>>>>>> B-Tree >>>>>>>>>> >>>>>>>>>> can have similar properties. There's been a lot of research >>>>> on >>>>>>>>> B-Trees >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> optimisations in those. Unfortunately, I do not have an >>>>>>>>> implementation >>>>>>>>>> sitting around for a direct comparison, but I can imagine >>>>>>>> situations >>>>>>>>> when >>>>>>>>>> B-Trees may perform better because of simpler >>>>>>>>>> >>>>>>>>>> construction. >>>>>>>>>> >>>>>>>>>> Maybe we should even consider prototyping a prefix B-Tree >>>> to >>>>>>> have a >>>>>>>>> more >>>>>>>>>> fair comparison. >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> -- Alex >>>>>>>>>> >>>>>>>>>> On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang < >>>>>>>>> jasonstack.zhao@ >>>>>>>>>> gmail.com <http://gmail.com/>> wrote: >>>>>>>>>> >>>>>>>>>> Thank you Patrick for hosting Cassandra Contributor Meeting >>>>> for >>>>>>>> CEP-7 >>>>>>>>>> >>>>>>>>>> SAI. >>>>>>>>>> >>>>>>>>>> The recorded video is available here: >>>>>>>>>> >>>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/ >>>>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/> >>>>>>>>>> 2020-09-01+Apache+Cassandra+Contributor+Meeting >>>>>>>>>> >>>>>>>>>> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang < >>>>>>>>> jasonstack.zhao@gmail. >>>>>>>>>> com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thank you, Charles and Patrick >>>>>>>>>> >>>>>>>>>> On Tue, 1 Sep 2020 at 04:56, Charles Cao < >>>>> caohair...@gmail.com <mailto:caohair...@gmail.com> >>>>>>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thank you, Patrick! >>>>>>>>>> >>>>>>>>>> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin < >>>>>>>> pmcfa...@gmail.com <mailto:pmcfa...@gmail.com> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I just moved it to 8AM for this meeting to better >>>> accommodate >>>>>>> APAC. >>>>>>>>>> >>>>>>>>>> Please >>>>>>>>>> >>>>>>>>>> see the update here: >>>>>>>>>> >>>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/ >>>>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/> >>>>>>>>>> 2020-08-01+Apache+Cassandra+Contributor+Meeting >>>>>>>>>> >>>>>>>>>> Patrick >>>>>>>>>> >>>>>>>>>> On Mon, Aug 31, 2020 at 10:04 AM Charles Cao < >>>>>>> caohair...@gmail.com <mailto:caohair...@gmail.com> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Patrick, >>>>>>>>>> >>>>>>>>>> 11AM PST is a bad time for the people in the APAC timezone. >>>>> Can >>>>>>> we >>>>>>>>> move it >>>>>>>>>> to 7 or 8AM PST in the morning to accommodate their needs ? >>>>>>>>>> >>>>>>>>>> ~Charles >>>>>>>>>> >>>>>>>>>> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin < >>>>>>>> pmcfa...@gmail.com <mailto:pmcfa...@gmail.com> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Meeting scheduled. >>>>>>>>>> >>>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/ >>>>>>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/> >>>>>>>>>> 2020-08-01+Apache+Cassandra+Contributor+Meeting >>>>>>>>>> >>>>>>>>>> Tuesday September 1st, 11AM PST. I added a basic bullet for >>>>> the >>>>>>>>>> >>>>>>>>>> agenda >>>>>>>>>> >>>>>>>>>> but >>>>>>>>>> >>>>>>>>>> if there is more, edit away. >>>>>>>>>> >>>>>>>>>> Patrick >>>>>>>>>> >>>>>>>>>> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang < >>>>>>>>> jasonstack.zhao@ >>>>>>>>>> gmail.com <http://gmail.com/>> wrote: >>>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova < >>>>>>>>>> >>>>>>>>>> e.dimitr...@gmail.com <mailto:e.dimitr...@gmail.com>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe < >>>>>>>>>> >>>>>>>>>> calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin < >>>>>>>>>> >>>>>>>>>> pmcfa...@gmail.com <mailto:pmcfa...@gmail.com>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> This is related to the discussion Jordan and I had about >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> contributor >>>>>>>>>> >>>>>>>>>> Zoom call. Instead of open mic for any issue, call it >>>>>>>>>> >>>>>>>>>> based >>>>>>>>>> >>>>>>>>>> on a >>>>>>>>>> >>>>>>>>>> discussion >>>>>>>>>> >>>>>>>>>> thread or threads for higher bandwidth discussion. >>>>>>>>>> >>>>>>>>>> I would be happy to schedule on for next week to >>>>>>>>>> >>>>>>>>>> specifically >>>>>>>>>> >>>>>>>>>> discuss >>>>>>>>>> >>>>>>>>>> CEP-7. I can attach the recorded call to the CEP after. >>>>>>>>>> >>>>>>>>>> +1 or -1? >>>>>>>>>> >>>>>>>>>> Patrick >>>>>>>>>> >>>>>>>>>> On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie < >>>>>>>>>> >>>>>>>>>> jmcken...@apache.org <mailto:jmcken...@apache.org>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Does community plan to open another discussion or CEP >>>>>>>>>> >>>>>>>>>> on >>>>>>>>>> >>>>>>>>>> modularization? >>>>>>>>>> >>>>>>>>>> We probably should have a discussion on the ML or >>>>>>>>>> >>>>>>>>>> monthly >>>>>>>>>> >>>>>>>>>> contrib >>>>>>>>>> >>>>>>>>>> call >>>>>>>>>> >>>>>>>>>> about it first to see how aligned the interested >>>>>>>>>> >>>>>>>>>> contributors >>>>>>>>>> >>>>>>>>>> are. >>>>>>>>>> >>>>>>>>>> Could >>>>>>>>>> >>>>>>>>>> do >>>>>>>>>> >>>>>>>>>> that through CEP as well but CEP's (at least thus far >>>>>>>>>> >>>>>>>>>> sans k8s >>>>>>>>>> >>>>>>>>>> operator) >>>>>>>>>> >>>>>>>>>> tend to start with a strong, deeply thought out point of >>>>>>>>>> >>>>>>>>>> view >>>>>>>>>> >>>>>>>>>> being >>>>>>>>>> >>>>>>>>>> expressed. >>>>>>>>>> >>>>>>>>>> On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang < >>>>>>>>>> >>>>>>>>>> jasonstack.z...@gmail.com <mailto:jasonstack.z...@gmail.com>> wrote: >>>>>>>>>> >>>>>>>>>> SASI's performance, specifically the search in the >>>>>>>>>> >>>>>>>>>> B+ >>>>>>>>>> >>>>>>>>>> tree >>>>>>>>>> >>>>>>>>>> component, >>>>>>>>>> >>>>>>>>>> depends a lot on the component file's header being >>>>>>>>>> >>>>>>>>>> available >>>>>>>>>> >>>>>>>>>> in >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> pagecache. SASI benefits from (needs) nodes with >>>>>>>>>> >>>>>>>>>> lots of >>>>>>>>>> >>>>>>>>>> RAM. >>>>>>>>>> >>>>>>>>>> Is >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> bound >>>>>>>>>> >>>>>>>>>> to this same or similar limitation? >>>>>>>>>> >>>>>>>>>> SAI also benefits from larger memory because SAI puts >>>>>>>>>> >>>>>>>>>> block >>>>>>>>>> >>>>>>>>>> info >>>>>>>>>> >>>>>>>>>> on >>>>>>>>>> >>>>>>>>>> heap >>>>>>>>>> >>>>>>>>>> for searching on-disk components and having >>>>>>>>>> >>>>>>>>>> cross-index >>>>>>>>>> >>>>>>>>>> files on >>>>>>>>>> >>>>>>>>>> page >>>>>>>>>> >>>>>>>>>> cache >>>>>>>>>> >>>>>>>>>> improves read performance of different indexes on the >>>>>>>>>> >>>>>>>>>> same >>>>>>>>>> >>>>>>>>>> table. >>>>>>>>>> >>>>>>>>>> Flushing of SASI can be CPU+IO intensive, to the >>>>>>>>>> >>>>>>>>>> point of >>>>>>>>>> >>>>>>>>>> saturation, >>>>>>>>>> >>>>>>>>>> pauses, and crashes on the node. SSDs are a must, >>>>>>>>>> >>>>>>>>>> along >>>>>>>>>> >>>>>>>>>> with >>>>>>>>>> >>>>>>>>>> a >>>>>>>>>> >>>>>>>>>> bit >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> tuning, just to avoid bringing down your cluster. >>>>>>>>>> >>>>>>>>>> Beyond >>>>>>>>>> >>>>>>>>>> reducing >>>>>>>>>> >>>>>>>>>> space >>>>>>>>>> >>>>>>>>>> requirements, does SAI improve on these things? >>>>>>>>>> >>>>>>>>>> Like >>>>>>>>>> >>>>>>>>>> SASI how >>>>>>>>>> >>>>>>>>>> does >>>>>>>>>> >>>>>>>>>> SAI, >>>>>>>>>> >>>>>>>>>> in >>>>>>>>>> >>>>>>>>>> its own way, change/narrow the recommendations on >>>>>>>>>> >>>>>>>>>> node >>>>>>>>>> >>>>>>>>>> hardware >>>>>>>>>> >>>>>>>>>> specs? >>>>>>>>>> >>>>>>>>>> SAI won't crash the node during compaction and >>>>>>>>>> >>>>>>>>>> requires >>>>>>>>>> >>>>>>>>>> less >>>>>>>>>> >>>>>>>>>> CPU/IO. >>>>>>>>>> >>>>>>>>>> * SAI defines global memory limit for compaction >>>>>>>>>> >>>>>>>>>> instead of >>>>>>>>>> >>>>>>>>>> per-index >>>>>>>>>> >>>>>>>>>> memory limit used by SASI. >>>>>>>>>> >>>>>>>>>> For example, compactions are running on 10 tables >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> each >>>>>>>>>> >>>>>>>>>> has >>>>>>>>>> >>>>>>>>>> 10 >>>>>>>>>> >>>>>>>>>> indexes. SAI will cap the >>>>>>>>>> >>>>>>>>>> memory usage with global limit while SASI may use up >>>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>> 100 * >>>>>>>>>> >>>>>>>>>> per-index >>>>>>>>>> >>>>>>>>>> limit. >>>>>>>>>> >>>>>>>>>> * After flushing in-memory segments to disk, SAI won't >>>>>>>>>> >>>>>>>>>> merge >>>>>>>>>> >>>>>>>>>> on-disk >>>>>>>>>> >>>>>>>>>> segments while SASI >>>>>>>>>> >>>>>>>>>> attempts to merge them at the end. >>>>>>>>>> >>>>>>>>>> There are pros and cons of not merging segments: >>>>>>>>>> >>>>>>>>>> ** Pros: compaction runs faster and requires fewer >>>>>>>>>> >>>>>>>>>> resources. >>>>>>>>>> >>>>>>>>>> ** Cons: small segments reduce compression ratio. >>>>>>>>>> >>>>>>>>>> * SAI on-disk format with row ids compresses better. >>>>>>>>>> >>>>>>>>>> I understand the desire in keeping out of scope >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> longer >>>>>>>>>> >>>>>>>>>> term >>>>>>>>>> >>>>>>>>>> deprecation >>>>>>>>>> >>>>>>>>>> and migration plan, but… if SASI provides >>>>>>>>>> >>>>>>>>>> functionality >>>>>>>>>> >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> doesn't, >>>>>>>>>> >>>>>>>>>> like tokenisation and DelimiterAnalyzer, yet >>>>>>>>>> >>>>>>>>>> introduces a >>>>>>>>>> >>>>>>>>>> body >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> code >>>>>>>>>> >>>>>>>>>> ~somewhat similar, shouldn't we be roughly >>>>>>>>>> >>>>>>>>>> sketching out >>>>>>>>>> >>>>>>>>>> how >>>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>> reduce >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> maintenance surface area? >>>>>>>>>> >>>>>>>>>> Agreed that we should reduce maintenance area if >>>>>>>>>> >>>>>>>>>> possible, >>>>>>>>>> >>>>>>>>>> but >>>>>>>>>> >>>>>>>>>> only >>>>>>>>>> >>>>>>>>>> very >>>>>>>>>> >>>>>>>>>> limited >>>>>>>>>> >>>>>>>>>> code base (eg. RangeIterator, QueryPlan) can be >>>>>>>>>> >>>>>>>>>> shared. >>>>>>>>>> >>>>>>>>>> The >>>>>>>>>> >>>>>>>>>> rest >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> code base >>>>>>>>>> >>>>>>>>>> is quite different because of on-disk format and >>>>>>>>>> >>>>>>>>>> cross-index >>>>>>>>>> >>>>>>>>>> files. >>>>>>>>>> >>>>>>>>>> The goal of this CEP is to get community buy-in on >>>>>>>>>> >>>>>>>>>> SAI's >>>>>>>>>> >>>>>>>>>> design. >>>>>>>>>> >>>>>>>>>> Tokenization, >>>>>>>>>> >>>>>>>>>> DelimiterAnalyzer should be straightforward to >>>>>>>>>> >>>>>>>>>> implement on >>>>>>>>>> >>>>>>>>>> top >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> SAI. >>>>>>>>>> >>>>>>>>>> Can we list what configurations of SASI will >>>>>>>>>> >>>>>>>>>> become >>>>>>>>>> >>>>>>>>>> deprecated >>>>>>>>>> >>>>>>>>>> once >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> becomes non-experimental? >>>>>>>>>> >>>>>>>>>> Except for "Like", "Tokenisation", >>>>>>>>>> >>>>>>>>>> "DelimiterAnalyzer", >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> rest >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> SASI >>>>>>>>>> >>>>>>>>>> can >>>>>>>>>> >>>>>>>>>> be replaced by SAI. >>>>>>>>>> >>>>>>>>>> Given a few bugs are open against 2i and SASI, can >>>>>>>>>> >>>>>>>>>> we >>>>>>>>>> >>>>>>>>>> provide >>>>>>>>>> >>>>>>>>>> some >>>>>>>>>> >>>>>>>>>> overview, or rough indication, of how many of them >>>>>>>>>> >>>>>>>>>> we >>>>>>>>>> >>>>>>>>>> could >>>>>>>>>> >>>>>>>>>> "triage >>>>>>>>>> >>>>>>>>>> away"? >>>>>>>>>> >>>>>>>>>> I believe most of the known bugs in 2i/SASI either >>>>>>>>>> >>>>>>>>>> have >>>>>>>>>> >>>>>>>>>> been >>>>>>>>>> >>>>>>>>>> addressed >>>>>>>>>> >>>>>>>>>> in >>>>>>>>>> >>>>>>>>>> SAI or >>>>>>>>>> >>>>>>>>>> don't apply to SAI. >>>>>>>>>> >>>>>>>>>> And, is it time for the project to start >>>>>>>>>> >>>>>>>>>> introducing new >>>>>>>>>> >>>>>>>>>> SPI >>>>>>>>>> >>>>>>>>>> implementations as separate sub-modules and jar >>>>>>>>>> >>>>>>>>>> files >>>>>>>>>> >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> are >>>>>>>>>> >>>>>>>>>> only >>>>>>>>>> >>>>>>>>>> loaded >>>>>>>>>> >>>>>>>>>> at runtime based on configuration settings? (sorry >>>>>>>>>> >>>>>>>>>> for >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> conflation >>>>>>>>>> >>>>>>>>>> on >>>>>>>>>> >>>>>>>>>> this one, but maybe it's the right time to raise >>>>>>>>>> >>>>>>>>>> it >>>>>>>>>> >>>>>>>>>> :shrug:) >>>>>>>>>> >>>>>>>>>> Agreed that modularization is the way to go and will >>>>>>>>>> >>>>>>>>>> speed up >>>>>>>>>> >>>>>>>>>> module >>>>>>>>>> >>>>>>>>>> development speed. >>>>>>>>>> >>>>>>>>>> Does community plan to open another discussion or CEP >>>>>>>>>> >>>>>>>>>> on >>>>>>>>>> >>>>>>>>>> modularization? >>>>>>>>>> >>>>>>>>>> On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever < >>>>>>>>>> >>>>>>>>>> m...@apache.org> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Adding to Duy's questions… >>>>>>>>>> >>>>>>>>>> * Hardware specs >>>>>>>>>> >>>>>>>>>> SASI's performance, specifically the search in the >>>>>>>>>> >>>>>>>>>> B+ >>>>>>>>>> >>>>>>>>>> tree >>>>>>>>>> >>>>>>>>>> component, >>>>>>>>>> >>>>>>>>>> depends a lot on the component file's header being >>>>>>>>>> >>>>>>>>>> available in >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> pagecache. SASI benefits from (needs) nodes with >>>>>>>>>> >>>>>>>>>> lots >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> RAM. >>>>>>>>>> >>>>>>>>>> Is >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> bound >>>>>>>>>> >>>>>>>>>> to this same or similar limitation? >>>>>>>>>> >>>>>>>>>> Flushing of SASI can be CPU+IO intensive, to the >>>>>>>>>> >>>>>>>>>> point of >>>>>>>>>> >>>>>>>>>> saturation, >>>>>>>>>> >>>>>>>>>> pauses, and crashes on the node. SSDs are a must, >>>>>>>>>> >>>>>>>>>> along >>>>>>>>>> >>>>>>>>>> with a >>>>>>>>>> >>>>>>>>>> bit >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> tuning, just to avoid bringing down your cluster. >>>>>>>>>> >>>>>>>>>> Beyond >>>>>>>>>> >>>>>>>>>> reducing >>>>>>>>>> >>>>>>>>>> space >>>>>>>>>> >>>>>>>>>> requirements, does SAI improve on these things? Like >>>>>>>>>> >>>>>>>>>> SASI >>>>>>>>>> >>>>>>>>>> how >>>>>>>>>> >>>>>>>>>> does >>>>>>>>>> >>>>>>>>>> SAI, >>>>>>>>>> >>>>>>>>>> in >>>>>>>>>> >>>>>>>>>> its own way, change/narrow the recommendations on >>>>>>>>>> >>>>>>>>>> node >>>>>>>>>> >>>>>>>>>> hardware >>>>>>>>>> >>>>>>>>>> specs? >>>>>>>>>> >>>>>>>>>> * Code Maintenance >>>>>>>>>> >>>>>>>>>> I understand the desire in keeping out of scope the >>>>>>>>>> >>>>>>>>>> longer >>>>>>>>>> >>>>>>>>>> term >>>>>>>>>> >>>>>>>>>> deprecation >>>>>>>>>> >>>>>>>>>> and migration plan, but… if SASI provides >>>>>>>>>> >>>>>>>>>> functionality >>>>>>>>>> >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> doesn't, >>>>>>>>>> >>>>>>>>>> like tokenisation and DelimiterAnalyzer, yet >>>>>>>>>> >>>>>>>>>> introduces a >>>>>>>>>> >>>>>>>>>> body >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> code >>>>>>>>>> >>>>>>>>>> ~somewhat similar, shouldn't we be roughly sketching >>>>>>>>>> >>>>>>>>>> out >>>>>>>>>> >>>>>>>>>> how to >>>>>>>>>> >>>>>>>>>> reduce >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> maintenance surface area? >>>>>>>>>> >>>>>>>>>> Can we list what configurations of SASI will become >>>>>>>>>> >>>>>>>>>> deprecated >>>>>>>>>> >>>>>>>>>> once >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> becomes non-experimental? >>>>>>>>>> >>>>>>>>>> Given a few bugs are open against 2i and SASI, can >>>>>>>>>> >>>>>>>>>> we >>>>>>>>>> >>>>>>>>>> provide >>>>>>>>>> >>>>>>>>>> some >>>>>>>>>> >>>>>>>>>> overview, or rough indication, of how many of them >>>>>>>>>> >>>>>>>>>> we >>>>>>>>>> >>>>>>>>>> could >>>>>>>>>> >>>>>>>>>> "triage >>>>>>>>>> >>>>>>>>>> away"? >>>>>>>>>> >>>>>>>>>> And, is it time for the project to start introducing >>>>>>>>>> >>>>>>>>>> new >>>>>>>>>> >>>>>>>>>> SPI >>>>>>>>>> >>>>>>>>>> implementations as separate sub-modules and jar >>>>>>>>>> >>>>>>>>>> files >>>>>>>>>> >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> are >>>>>>>>>> >>>>>>>>>> only >>>>>>>>>> >>>>>>>>>> loaded >>>>>>>>>> >>>>>>>>>> at runtime based on configuration settings? (sorry >>>>>>>>>> >>>>>>>>>> for the >>>>>>>>>> >>>>>>>>>> conflation >>>>>>>>>> >>>>>>>>>> on >>>>>>>>>> >>>>>>>>>> this one, but maybe it's the right time to raise it >>>>>>>>>> >>>>>>>>>> :shrug:) >>>>>>>>>> >>>>>>>>>> regards, >>>>>>>>>> >>>>>>>>>> Mick >>>>>>>>>> >>>>>>>>>> On Tue, 18 Aug 2020 at 13:05, DuyHai Doan < >>>>>>>>>> >>>>>>>>>> doanduy...@gmail.com> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thank you Zhao Yang for starting this topic >>>>>>>>>> >>>>>>>>>> After reading the short design doc, I have a few >>>>>>>>>> >>>>>>>>>> questions >>>>>>>>>> >>>>>>>>>> 1) SASI was pretty inefficient indexing wide >>>>>>>>>> >>>>>>>>>> partitions >>>>>>>>>> >>>>>>>>>> because >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> index >>>>>>>>>> >>>>>>>>>> structure only retains the partition token, not >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> clustering >>>>>>>>>> >>>>>>>>>> colums. >>>>>>>>>> >>>>>>>>>> As >>>>>>>>>> >>>>>>>>>> per design doc SAI has row id mapping to partition >>>>>>>>>> >>>>>>>>>> offset, >>>>>>>>>> >>>>>>>>>> can >>>>>>>>>> >>>>>>>>>> we >>>>>>>>>> >>>>>>>>>> hope >>>>>>>>>> >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> indexing wide partition will be more efficient >>>>>>>>>> >>>>>>>>>> with >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> ? One >>>>>>>>>> >>>>>>>>>> detail >>>>>>>>>> >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> worries me is that in the beggining of the design >>>>>>>>>> >>>>>>>>>> doc, >>>>>>>>>> >>>>>>>>>> it is >>>>>>>>>> >>>>>>>>>> said >>>>>>>>>> >>>>>>>>>> that >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> matching rows are post filtered while scanning the >>>>>>>>>> >>>>>>>>>> partition. >>>>>>>>>> >>>>>>>>>> Can >>>>>>>>>> >>>>>>>>>> you >>>>>>>>>> >>>>>>>>>> confirm or infirm that SAI is efficient with wide >>>>>>>>>> >>>>>>>>>> partitions >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> provides >>>>>>>>>> >>>>>>>>>> the partition offsets to the matching rows ? >>>>>>>>>> >>>>>>>>>> 2) About space efficiency, one of the biggest >>>>>>>>>> >>>>>>>>>> drawback of >>>>>>>>>> >>>>>>>>>> SASI >>>>>>>>>> >>>>>>>>>> was >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> huge >>>>>>>>>> >>>>>>>>>> space required for index structure when using >>>>>>>>>> >>>>>>>>>> CONTAINS >>>>>>>>>> >>>>>>>>>> logic >>>>>>>>>> >>>>>>>>>> because >>>>>>>>>> >>>>>>>>>> of >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> decomposition of text columns into n-grams. Will >>>>>>>>>> >>>>>>>>>> SAI >>>>>>>>>> >>>>>>>>>> suffer >>>>>>>>>> >>>>>>>>>> from >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> same >>>>>>>>>> >>>>>>>>>> issue in future iterations ? I'm anticipating a >>>>>>>>>> >>>>>>>>>> bit >>>>>>>>>> >>>>>>>>>> 3) If I'm querying using SAI and providing >>>>>>>>>> >>>>>>>>>> complete >>>>>>>>>> >>>>>>>>>> partition >>>>>>>>>> >>>>>>>>>> key, >>>>>>>>>> >>>>>>>>>> will >>>>>>>>>> >>>>>>>>>> it >>>>>>>>>> >>>>>>>>>> be more efficient than querying without partition >>>>>>>>>> >>>>>>>>>> key. In >>>>>>>>>> >>>>>>>>>> other >>>>>>>>>> >>>>>>>>>> words, >>>>>>>>>> >>>>>>>>>> does >>>>>>>>>> >>>>>>>>>> SAI provide any optimisation when partition key is >>>>>>>>>> >>>>>>>>>> specified >>>>>>>>>> >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> Duy Hai DOAN >>>>>>>>>> >>>>>>>>>> Le mar. 18 août 2020 à 11:39, Mick Semb Wever < >>>>>>>>>> >>>>>>>>>> m...@apache.org> >>>>>>>>>> >>>>>>>>>> a >>>>>>>>>> >>>>>>>>>> écrit : >>>>>>>>>> >>>>>>>>>> We are looking forward to the community's >>>>>>>>>> >>>>>>>>>> feedback >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> suggestions. >>>>>>>>>> >>>>>>>>>> What comes immediately to mind is testing >>>>>>>>>> >>>>>>>>>> requirements. It >>>>>>>>>> >>>>>>>>>> has >>>>>>>>>> >>>>>>>>>> been >>>>>>>>>> >>>>>>>>>> mentioned already that the project's testability >>>>>>>>>> >>>>>>>>>> and QA >>>>>>>>>> >>>>>>>>>> guidelines >>>>>>>>>> >>>>>>>>>> are >>>>>>>>>> >>>>>>>>>> inadequate to successfully introduce new >>>>>>>>>> >>>>>>>>>> features >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> refactorings >>>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> codebase. During the 4.0 beta phase this was >>>>>>>>>> >>>>>>>>>> intended >>>>>>>>>> >>>>>>>>>> to be >>>>>>>>>> >>>>>>>>>> addressed, >>>>>>>>>> >>>>>>>>>> i.e. >>>>>>>>>> >>>>>>>>>> defining more specific QA guidelines for 4.0-rc. >>>>>>>>>> >>>>>>>>>> This >>>>>>>>>> >>>>>>>>>> would >>>>>>>>>> >>>>>>>>>> be >>>>>>>>>> >>>>>>>>>> an >>>>>>>>>> >>>>>>>>>> important >>>>>>>>>> >>>>>>>>>> step towards QA guidelines for all changes and >>>>>>>>>> >>>>>>>>>> CEPs >>>>>>>>>> >>>>>>>>>> post-4.0. >>>>>>>>>> >>>>>>>>>> Questions from me >>>>>>>>>> >>>>>>>>>> - How will this be tested, how will its QA >>>>>>>>>> >>>>>>>>>> status and >>>>>>>>>> >>>>>>>>>> lifecycle >>>>>>>>>> >>>>>>>>>> be >>>>>>>>>> >>>>>>>>>> defined? (per above) >>>>>>>>>> >>>>>>>>>> - With existing C* code needing to be changed, >>>>>>>>>> >>>>>>>>>> what >>>>>>>>>> >>>>>>>>>> is the >>>>>>>>>> >>>>>>>>>> proposed >>>>>>>>>> >>>>>>>>>> plan >>>>>>>>>> >>>>>>>>>> for making those changes ensuring maintained QA, >>>>>>>>>> >>>>>>>>>> e.g. >>>>>>>>>> >>>>>>>>>> is >>>>>>>>>> >>>>>>>>>> there >>>>>>>>>> >>>>>>>>>> separate >>>>>>>>>> >>>>>>>>>> QA >>>>>>>>>> >>>>>>>>>> cycles planned for altering the SPI before >>>>>>>>>> >>>>>>>>>> adding >>>>>>>>>> >>>>>>>>>> a >>>>>>>>>> >>>>>>>>>> new SPI >>>>>>>>>> >>>>>>>>>> implementation? >>>>>>>>>> >>>>>>>>>> - Despite being out of scope, it would be nice >>>>>>>>>> >>>>>>>>>> to have >>>>>>>>>> >>>>>>>>>> some >>>>>>>>>> >>>>>>>>>> idea >>>>>>>>>> >>>>>>>>>> from >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> CEP author of when users might still choose >>>>>>>>>> >>>>>>>>>> afresh 2i >>>>>>>>>> >>>>>>>>>> or >>>>>>>>>> >>>>>>>>>> SASI >>>>>>>>>> >>>>>>>>>> over >>>>>>>>>> >>>>>>>>>> SAI, >>>>>>>>>> >>>>>>>>>> - Who fills the roles involved? Who are the >>>>>>>>>> >>>>>>>>>> contributors >>>>>>>>>> >>>>>>>>>> in >>>>>>>>>> >>>>>>>>>> this >>>>>>>>>> >>>>>>>>>> DataStax >>>>>>>>>> >>>>>>>>>> team? Who is the shepherd? Are there other >>>>>>>>>> >>>>>>>>>> stakeholders >>>>>>>>>> >>>>>>>>>> willing >>>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>> be >>>>>>>>>> >>>>>>>>>> involved? >>>>>>>>>> >>>>>>>>>> - Is there a preference to use gdoc instead of >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>>> project's >>>>>>>>>> >>>>>>>>>> wiki, >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> why? (the CEP process suggest a wiki page, and >>>>>>>>>> >>>>>>>>>> feedback on >>>>>>>>>> >>>>>>>>>> why >>>>>>>>>> >>>>>>>>>> another >>>>>>>>>> >>>>>>>>>> approach is considered better helps evolve the >>>>>>>>>> >>>>>>>>>> CEP >>>>>>>>>> >>>>>>>>>> process >>>>>>>>>> >>>>>>>>>> itself) >>>>>>>>>> >>>>>>>>>> cheers, >>>>>>>>>> >>>>>>>>>> Mick >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>> --------------------------------------------------------------------- >>>>>>>>>> >>>>>>>>>> To unsubscribe, e-mail: >>>> dev-unsubscr...@cassandra.apache.org >>>>>> For >>>>>>>>>> additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>> --------------------------------------------------------------------- >>>>>>> To >>>>>>>>>> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For >>>>>>>>> additional >>>>>>>>>> commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> alex p >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>> --------------------------------------------------------------------- >>>>>>> To >>>>>>>>>> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For >>>>>>>>> additional >>>>>>>>>> commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> alex p