Hey Jon, The following quick test shows me that vector search is marked as experimental (it is just not in cassandra.yaml as materialized views, etc)
cqlsh:k> CREATE TABLE t (pk int, str_val text, val vector<float, 3>, PRIMARY KEY(pk)); cqlsh:k> CREATE CUSTOM INDEX ON t(val) USING 'StorageAttachedIndex'; Warnings : SAI ANN indexes on vector columns are experimental and are not recommended for production use. They don't yet support SELECT queries with: * Consistency level higher than ONE/LOCAL_ONE. * Paging. * No LIMIT clauses. * PER PARTITION LIMIT clauses. * GROUP BY clauses. * Aggregation functions. * Filters on columns without a SAI index. I do agree that there is differentiation also between experimental and beta. But I need to think more before expressing concrete opinion/suggestions here. Though I believe this conversation is healthy to have and shows the maturity of our project. Thank you, Josh! Best regards, Ekaterina On Mon, 9 Dec 2024 at 13:21, Jon Haddad <j...@rustyrazorblade.com> wrote: > The tough thing here is that MVs are marked experimental retroactively, > because by the time the problems were known, there wasn't much anyone could > do. Experimental was our way of saying "oops, we screwed up, let's put a > label on it" and the same label got applied to a bunch of new stuff > including Java 17. They're not even close to being in the same category, > but they're labeled the same and people treat them as equivalent. > > If we knew MVs were so broken before they were merged, they would have > been -1'ed. Same with incremental repair (till 4.0), and vector search > today. I would have -1'ed all three of these if it was known how poorly > they actually performed at the time they were committed. > > Side note, vector search isn't marked as experimental today, but it's not > even usable for non-trivial datasets out of the box, so it should be marked > as such at this point. > > I really wish this stuff was tested at a reasonable scale across various > failure modes before merging, because the harm it does to the community is > real. We really shouldn't be put in a position where stuff gets released, > hyped up, then we find it it's obviously not ready for real world use. I > built my tooling (tlp-cluster, now easy-cass-lab, and tlp-stress, now > easy-cass-stress), with this in mind, but sadly I haven't seen much use of > it it to verify patches. The only reason I found a memory leak in > CASSANDRA-15452 was because I used these tools on multi-TB datasets over > several days. > > > Jon > > > On Mon, Dec 9, 2024 at 9:55 AM Slater, Ben via dev < > dev@cassandra.apache.org> wrote: > >> I'm a little worried by the idea of grouping in MVs with things like a >> Java version under the same "beta" label (acknowledging that they are >> currently grouped under the same "experimental" label). >> >> To me, "beta" implies it's pretty close to production ready and there is >> an intention to get it to production ready in the near future. I don't >> think this really describes MVs as I don't see anyone looking like they are >> trying to get them to really production ready (although I could easily be >> wrong on that). >> >> Maybe there is an argument for "experimental"=this is here to get >> feedback but there's no commitment it will make it to production ready and >> "beta"=we think this is done but we'd like to see some production use >> before declaring it stable. For beta, we'll treat bugs with the same >> priority as "stable" (or at least close to)? >> >> Cheers >> Ben >> >> >> >> ------------------------------ >> *From:* Jon Haddad <j...@rustyrazorblade.com> >> *Sent:* 09 December 2024 09:43 >> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> >> *Subject:* Re: [DISCUSS] Experimental flagging (fork from Re-evaluate >> compaction defaults in 5.1/trunk) >> >> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments * >> >> >> I like this. There's a few things marked as experimental today, so I'll >> take a stab at making this more concrete, and I think we should be open to >> graduating certain things out of beta to GA at a faster cycle than a major >> release. >> >> Java versions, for example, should really move out of "beta" quickly. We >> test against it, and we're not going to drop new versions. So if we're >> looking at C* 5.0, we should move Java 17 out of experimental / beta >> immediately and call it GA. >> >> SAI and UCS should probably graduate no later than 5.1. >> >> On the other hand, MVs have enough warts I actively recommend against >> using them and should be in beta till we can actually repair them. >> >> I don't know if anyone's actually used transient replication and if it's >> even beta quality... that might actually warrant being called experimental >> still. >> >> 'ALTER ... DROP COMPACT STORAGE' is flagged as experimental. I'm not >> sure what to do with this. I advise people migrate their data for any >> Thrift -> CQL cases, mostly because the edge cases are so hard to know in >> advance, especially since by now these codebases are ancient and the >> original developers are long gone. >> >> Thoughts? >> >> Jon >> >> >> >> >> On Mon, Dec 9, 2024 at 6:28 AM Josh McKenzie <jmcken...@apache.org> >> wrote: >> >> Jon stated: >> >> Side note: I think experimental has been over-used and has lost all >> meaning. How is Java 17 experimental? Very confusing for the community. >> >> >> Dinesh followed with: >> >> Philosophically, as a project, we should wait until critical features >> like these reach a certain level of maturity prior to recommending it as a >> default. For me maturity is a function of adoption by diverse use-cases in >> production and scale. >> >> >> I'd like to discuss 2 ideas related to the above: >> >> 1. We rename / alias "experimental" to "beta". It's a word that's >> ubiquitous in our field and communicates the correct level of expectation >> to our users (API stable, may have bugs) >> 2. *All new features* go through one major (either semver MAJOR or >> MINOR) as "beta" >> >> >> To Jon's point, "experimental" was really a kludge to work around >> Materialized Views having some very sharp edges that users had to be very >> aware of. We haven't really used the flagging much (at all?) since then, >> and we don't have a formalized way to shepherd a new feature through a >> "soak" period where it can "reach a certain level of maturity". We're >> caught in a chicken-or-egg scenario with our current need to get a feature >> released more broadly to have confidence in its stability (to Dinesh's >> point). >> >> In my mind, the following feature evolution would be healthy for us and >> good for our users: >> >> 1. Beta >> 2. Generally Available >> 3. Default (where appropriate) >> >> To graduate from Beta -> GA, good UX, user facing documentation, a >> [DISCUSS] thread where we have a clear consensus of readiness, all seem >> like healthy and good steps. From GA -> Default, [DISCUSS] like we're >> having re: compaction strategies, unearthing shortcomings, edge-cases, >> documentation needs, etc. >> >> Curious what others think. >> >> ~Josh >> >>