I just moved it to 8AM for this meeting to better accommodate APAC. Please see the update here: https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting
Patrick On Mon, Aug 31, 2020 at 10:04 AM Charles Cao <caohair...@gmail.com> wrote: > Patrick, > > 11AM PST is a bad time for the people in the APAC timezone. Can we > move it to 7 or 8AM PST in the morning to accommodate their needs ? > > ~Charles > > On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin <pmcfa...@gmail.com> > wrote: > > > > Meeting scheduled. > > > https://cwiki.apache.org/confluence/display/CASSANDRA/2020-08-01+Apache+Cassandra+Contributor+Meeting > > > > Tuesday September 1st, 11AM PST. I added a basic bullet for the agenda > but > > if there is more, edit away. > > > > Patrick > > > > On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang < > > jasonstack.z...@gmail.com> wrote: > > > > > +1 > > > > > > On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova < > e.dimitr...@gmail.com> > > > wrote: > > > > > > > +1 > > > > > > > > On Wed, 26 Aug 2020 at 16:48, Caleb Rackliffe < > calebrackli...@gmail.com> > > > > wrote: > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > On Wed, Aug 26, 2020, 3:45 PM Patrick McFadin <pmcfa...@gmail.com> > > > > wrote: > > > > > > > > > > > > > > > > > > > > > This is related to the discussion Jordan and I had about the > > > > contributor > > > > > > > > > > > Zoom call. Instead of open mic for any issue, call it based on a > > > > > discussion > > > > > > > > > > > thread or threads for higher bandwidth discussion. > > > > > > > > > > > > > > > > > > > > > > I would be happy to schedule on for next week to specifically > discuss > > > > > > > > > > > CEP-7. I can attach the recorded call to the CEP after. > > > > > > > > > > > > > > > > > > > > > > +1 or -1? > > > > > > > > > > > > > > > > > > > > > > Patrick > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 25, 2020 at 7:03 AM Joshua McKenzie < > > > jmcken...@apache.org> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Does community plan to open another discussion or CEP on > > > > > > > > > > > modularization? > > > > > > > > > > > > > > > > > > > > > > > > We probably should have a discussion on the ML or monthly > contrib > > > > call > > > > > > > > > > > > about it first to see how aligned the interested contributors > are. > > > > > Could > > > > > > > > > > > do > > > > > > > > > > > > that through CEP as well but CEP's (at least thus far sans k8s > > > > > operator) > > > > > > > > > > > > tend to start with a strong, deeply thought out point of view > being > > > > > > > > > > > > expressed. > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 25, 2020 at 3:26 AM Jasonstack Zhao Yang < > > > > > > > > > > > > jasonstack.z...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > >>> SASI's performance, specifically the search in the B+ > tree > > > > > > > > > > > component, > > > > > > > > > > > > > >>> depends a lot on the component file's header being > available > > > in > > > > > the > > > > > > > > > > > > > >>> pagecache. SASI benefits from (needs) nodes with lots of > RAM. > > > > Is > > > > > > > > > > > SAI > > > > > > > > > > > > > bound > > > > > > > > > > > > > >>> to this same or similar limitation? > > > > > > > > > > > > > > > > > > > > > > > > > > SAI also benefits from larger memory because SAI puts block > info > > > on > > > > > > > > > > > heap > > > > > > > > > > > > > for searching on-disk components and having cross-index > files on > > > > page > > > > > > > > > > > > cache > > > > > > > > > > > > > improves read performance of different indexes on the same > table. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> Flushing of SASI can be CPU+IO intensive, to the point of > > > > > > > > > > > saturation, > > > > > > > > > > > > > >>> pauses, and crashes on the node. SSDs are a must, along > with > > > a > > > > > bit > > > > > > > > > > > of > > > > > > > > > > > > > >>> tuning, just to avoid bringing down your cluster. Beyond > > > > reducing > > > > > > > > > > > > space > > > > > > > > > > > > > >>> requirements, does SAI improve on these things? Like > SASI how > > > > > does > > > > > > > > > > > > SAI, > > > > > > > > > > > > > in > > > > > > > > > > > > > >>> its own way, change/narrow the recommendations on node > > > hardware > > > > > > > > > > > > specs? > > > > > > > > > > > > > > > > > > > > > > > > > > SAI won't crash the node during compaction and requires less > > > > CPU/IO. > > > > > > > > > > > > > > > > > > > > > > > > > > * SAI defines global memory limit for compaction instead of > > > > per-index > > > > > > > > > > > > > memory limit used by SASI. > > > > > > > > > > > > > For example, compactions are running on 10 tables and each > has > > > 10 > > > > > > > > > > > > > indexes. SAI will cap the > > > > > > > > > > > > > memory usage with global limit while SASI may use up to > 100 * > > > > > > > > > > > per-index > > > > > > > > > > > > > limit. > > > > > > > > > > > > > > > > > > > > > > > > > > * After flushing in-memory segments to disk, SAI won't merge > > > > on-disk > > > > > > > > > > > > > segments while SASI > > > > > > > > > > > > > attempts to merge them at the end. > > > > > > > > > > > > > > > > > > > > > > > > > > There are pros and cons of not merging segments: > > > > > > > > > > > > > ** Pros: compaction runs faster and requires fewer > resources. > > > > > > > > > > > > > ** Cons: small segments reduce compression ratio. > > > > > > > > > > > > > > > > > > > > > > > > > > * SAI on-disk format with row ids compresses better. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >>> I understand the desire in keeping out of scope the > longer > > > term > > > > > > > > > > > > > deprecation > > > > > > > > > > > > > >>> and migration plan, but… if SASI provides functionality > that > > > > SAI > > > > > > > > > > > > > doesn't, > > > > > > > > > > > > > >>> like tokenisation and DelimiterAnalyzer, yet introduces a > > > body > > > > of > > > > > > > > > > > > code > > > > > > > > > > > > > >>> ~somewhat similar, shouldn't we be roughly sketching out > how > > > to > > > > > > > > > > > > reduce > > > > > > > > > > > > > the > > > > > > > > > > > > > >>> maintenance surface area? > > > > > > > > > > > > > > > > > > > > > > > > > > Agreed that we should reduce maintenance area if possible, > but > > > only > > > > > > > > > > > very > > > > > > > > > > > > > limited > > > > > > > > > > > > > code base (eg. RangeIterator, QueryPlan) can be shared. The > rest > > > of > > > > > the > > > > > > > > > > > > > code base > > > > > > > > > > > > > is quite different because of on-disk format and cross-index > > > files. > > > > > > > > > > > > > > > > > > > > > > > > > > The goal of this CEP is to get community buy-in on SAI's > design. > > > > > > > > > > > > > Tokenization, > > > > > > > > > > > > > DelimiterAnalyzer should be straightforward to implement on > top > > > of > > > > > SAI. > > > > > > > > > > > > > > > > > > > > > > > > > > >>> Can we list what configurations of SASI will become > > > deprecated > > > > > once > > > > > > > > > > > > SAI > > > > > > > > > > > > > >>> becomes non-experimental? > > > > > > > > > > > > > > > > > > > > > > > > > > Except for "Like", "Tokenisation", "DelimiterAnalyzer", the > rest > > > of > > > > > > > > > > > SASI > > > > > > > > > > > > > can > > > > > > > > > > > > > be replaced by SAI. > > > > > > > > > > > > > > > > > > > > > > > > > > >>> Given a few bugs are open against 2i and SASI, can we > provide > > > > > some > > > > > > > > > > > > > >>> overview, or rough indication, of how many of them we > could > > > > > "triage > > > > > > > > > > > > > away"? > > > > > > > > > > > > > > > > > > > > > > > > > > I believe most of the known bugs in 2i/SASI either have been > > > > > addressed > > > > > > > > > > > in > > > > > > > > > > > > > SAI or > > > > > > > > > > > > > don't apply to SAI. > > > > > > > > > > > > > > > > > > > > > > > > > > >>> And, is it time for the project to start introducing new > SPI > > > > > > > > > > > > > >>> implementations as separate sub-modules and jar files > that > > > are > > > > > only > > > > > > > > > > > > > loaded > > > > > > > > > > > > > >>> at runtime based on configuration settings? (sorry for > the > > > > > > > > > > > conflation > > > > > > > > > > > > > on > > > > > > > > > > > > > >>> this one, but maybe it's the right time to raise it > :shrug:) > > > > > > > > > > > > > > > > > > > > > > > > > > Agreed that modularization is the way to go and will speed up > > > > module > > > > > > > > > > > > > development speed. > > > > > > > > > > > > > > > > > > > > > > > > > > Does community plan to open another discussion or CEP on > > > > > > > > > > > modularization? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 24 Aug 2020 at 16:43, Mick Semb Wever < > m...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Adding to Duy's questions… > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Hardware specs > > > > > > > > > > > > > > > > > > > > > > > > > > > > SASI's performance, specifically the search in the B+ tree > > > > > component, > > > > > > > > > > > > > > depends a lot on the component file's header being > available in > > > > the > > > > > > > > > > > > > > pagecache. SASI benefits from (needs) nodes with lots of > RAM. > > > Is > > > > > SAI > > > > > > > > > > > > > bound > > > > > > > > > > > > > > to this same or similar limitation? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Flushing of SASI can be CPU+IO intensive, to the point of > > > > > saturation, > > > > > > > > > > > > > > pauses, and crashes on the node. SSDs are a must, along > with a > > > > bit > > > > > of > > > > > > > > > > > > > > tuning, just to avoid bringing down your cluster. Beyond > > > reducing > > > > > > > > > > > space > > > > > > > > > > > > > > requirements, does SAI improve on these things? Like SASI > how > > > > does > > > > > > > > > > > SAI, > > > > > > > > > > > > > in > > > > > > > > > > > > > > its own way, change/narrow the recommendations on node > hardware > > > > > > > > > > > specs? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * Code Maintenance > > > > > > > > > > > > > > > > > > > > > > > > > > > > I understand the desire in keeping out of scope the longer > term > > > > > > > > > > > > > deprecation > > > > > > > > > > > > > > and migration plan, but… if SASI provides functionality > that > > > SAI > > > > > > > > > > > > doesn't, > > > > > > > > > > > > > > like tokenisation and DelimiterAnalyzer, yet introduces a > body > > > of > > > > > > > > > > > code > > > > > > > > > > > > > > ~somewhat similar, shouldn't we be roughly sketching out > how to > > > > > > > > > > > reduce > > > > > > > > > > > > > the > > > > > > > > > > > > > > maintenance surface area? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can we list what configurations of SASI will become > deprecated > > > > once > > > > > > > > > > > SAI > > > > > > > > > > > > > > becomes non-experimental? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Given a few bugs are open against 2i and SASI, can we > provide > > > > some > > > > > > > > > > > > > > overview, or rough indication, of how many of them we could > > > > "triage > > > > > > > > > > > > > away"? > > > > > > > > > > > > > > > > > > > > > > > > > > > > And, is it time for the project to start introducing new > SPI > > > > > > > > > > > > > > implementations as separate sub-modules and jar files that > are > > > > only > > > > > > > > > > > > > loaded > > > > > > > > > > > > > > at runtime based on configuration settings? (sorry for the > > > > > conflation > > > > > > > > > > > > on > > > > > > > > > > > > > > this one, but maybe it's the right time to raise it > :shrug:) > > > > > > > > > > > > > > > > > > > > > > > > > > > > regards, > > > > > > > > > > > > > > Mick > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 18 Aug 2020 at 13:05, DuyHai Doan < > > > doanduy...@gmail.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you Zhao Yang for starting this topic > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > After reading the short design doc, I have a few > questions > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1) SASI was pretty inefficient indexing wide partitions > > > because > > > > > the > > > > > > > > > > > > > index > > > > > > > > > > > > > > > structure only retains the partition token, not the > > > clustering > > > > > > > > > > > > colums. > > > > > > > > > > > > > As > > > > > > > > > > > > > > > per design doc SAI has row id mapping to partition > offset, > > > can > > > > we > > > > > > > > > > > > hope > > > > > > > > > > > > > > that > > > > > > > > > > > > > > > indexing wide partition will be more efficient with SAI > ? One > > > > > > > > > > > detail > > > > > > > > > > > > > that > > > > > > > > > > > > > > > worries me is that in the beggining of the design doc, > it is > > > > said > > > > > > > > > > > > that > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > matching rows are post filtered while scanning the > partition. > > > > Can > > > > > > > > > > > you > > > > > > > > > > > > > > > confirm or infirm that SAI is efficient with wide > partitions > > > > and > > > > > > > > > > > > > provides > > > > > > > > > > > > > > > the partition offsets to the matching rows ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2) About space efficiency, one of the biggest drawback of > > > SASI > > > > > was > > > > > > > > > > > > the > > > > > > > > > > > > > > huge > > > > > > > > > > > > > > > space required for index structure when using CONTAINS > logic > > > > > > > > > > > because > > > > > > > > > > > > of > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > decomposition of text columns into n-grams. Will SAI > suffer > > > > from > > > > > > > > > > > the > > > > > > > > > > > > > same > > > > > > > > > > > > > > > issue in future iterations ? I'm anticipating a bit > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3) If I'm querying using SAI and providing complete > partition > > > > > key, > > > > > > > > > > > > will > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > be more efficient than querying without partition key. In > > > other > > > > > > > > > > > > words, > > > > > > > > > > > > > > does > > > > > > > > > > > > > > > SAI provide any optimisation when partition key is > specified > > > ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Duy Hai DOAN > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Le mar. 18 août 2020 à 11:39, Mick Semb Wever < > > > m...@apache.org> > > > > a > > > > > > > > > > > > > écrit : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We are looking forward to the community's feedback > and > > > > > > > > > > > > suggestions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What comes immediately to mind is testing > requirements. It > > > > has > > > > > > > > > > > been > > > > > > > > > > > > > > > > mentioned already that the project's testability and QA > > > > > > > > > > > guidelines > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > inadequate to successfully introduce new features and > > > > > > > > > > > refactorings > > > > > > > > > > > > to > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > codebase. During the 4.0 beta phase this was intended > to be > > > > > > > > > > > > > addressed, > > > > > > > > > > > > > > > i.e. > > > > > > > > > > > > > > > > defining more specific QA guidelines for 4.0-rc. This > would > > > > be > > > > > an > > > > > > > > > > > > > > > important > > > > > > > > > > > > > > > > step towards QA guidelines for all changes and CEPs > > > post-4.0. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Questions from me > > > > > > > > > > > > > > > > - How will this be tested, how will its QA status and > > > > > lifecycle > > > > > > > > > > > be > > > > > > > > > > > > > > > > defined? (per above) > > > > > > > > > > > > > > > > - With existing C* code needing to be changed, what > is the > > > > > > > > > > > > proposed > > > > > > > > > > > > > > plan > > > > > > > > > > > > > > > > for making those changes ensuring maintained QA, e.g. > is > > > > there > > > > > > > > > > > > > separate > > > > > > > > > > > > > > > QA > > > > > > > > > > > > > > > > cycles planned for altering the SPI before adding a > new SPI > > > > > > > > > > > > > > > implementation? > > > > > > > > > > > > > > > > - Despite being out of scope, it would be nice to have > > > some > > > > > idea > > > > > > > > > > > > > from > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > CEP author of when users might still choose afresh 2i > or > > > SASI > > > > > > > > > > > over > > > > > > > > > > > > > SAI, > > > > > > > > > > > > > > > > - Who fills the roles involved? Who are the > contributors > > > in > > > > > this > > > > > > > > > > > > > > > DataStax > > > > > > > > > > > > > > > > team? Who is the shepherd? Are there other stakeholders > > > > willing > > > > > > > > > > > to > > > > > > > > > > > > be > > > > > > > > > > > > > > > > involved? > > > > > > > > > > > > > > > > - Is there a preference to use gdoc instead of the > > > project's > > > > > > > > > > > wiki, > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > why? (the CEP process suggest a wiki page, and > feedback on > > > > why > > > > > > > > > > > > > another > > > > > > > > > > > > > > > > approach is considered better helps evolve the CEP > process > > > > > > > > > > > itself) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > cheers, > > > > > > > > > > > > > > > > Mick > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >