It might be a fun experiment to retrofit the Harry tests we're currently
using (once Harry 2.0 lands in trunk from cep-15-accord) to fuzz SAI and
point them at legacy 2i (i.e. the subset of query types legacy 2i supports)
and see if we find anything interesting, but I don't even know if that
rises above something like CASSANDRA-19007
<https://issues.apache.org/jira/browse/CASSANDRA-19007> on the backlog of
things around indexes/filtering I would fix...

On Tue, Dec 10, 2024 at 10:35 AM Benedict Elliott Smith <bened...@apache.org>
wrote:

> I agree with Aleksey on how we should approach feature flags, and if we
> think 2i simply *don’t work* we should make that determination and mark
> them *broken* not *deprecated*.
>
> The only bug mentioned so far is 18656, which doesn’t clearly argue that
> the behaviour is *incorrect* rather than just undesired. The only
> breaking scenario I can think of is if we complete a bootstrap before the
> index build is complete. I am not sure if this is possible, but if it is we
> should probably fix that, and in the meantime perhaps document the flaw and
> describe work arounds (such as repairing after stopping a replica to be
> replaced). This isn’t a “remove the feature” level bug though, given my
> current understanding of it. If anything, it would be much more work than
> just fixing the bug.
>
> If there’s a longer litany of breaking behaviours, let’s enumerate them
> and consider marking the feature as unsafe.
>
> On 10 Dec 2024, at 10:29, Caleb Rackliffe <calebrackli...@gmail.com>
> wrote:
>
> I think my point here is that the hidden table 2i implementation has known
> correctness/availability/operational/resource usage issues whether it has a
> theoretical niche use-case or not from a query performance perspective.
>
> To Štefan’s question, yes, more or less. I’d like to at least see some
> success in production for the cases it was primarily designed for. That
> might not be enough to make it the default if it needs to perform better
> than the (broken) legacy 2i in global query situations. SAI is currently
> bad by design for global queries across 1000s of SSTables (LCS), so it
> would either need to be used in conjunction with a compaction strategy that
> aggressively limits the number of live SSTables, otherwise modified to
> handle that case better, or simply made the default w/ the guardrails it
> already has around these things becuase there simply isn’t a usable
> alternative.
>
> On Dec 10, 2024, at 9:13 AM, Benedict Elliott Smith <bened...@apache.org>
> wrote:
>
> 
>
> There is no reason it should ever be more capable than SAI for any
> partition/token-restricted query use-case, and I don't really see how
> there's any short-term path for any local 2i implementation in C* to be
> efficient for anything else
>
>
> While I am not personally aware of much evidence presented that SAI
> performs better than 2i for the partition-restricted case, I do believe it
> is theoretically likely to. But any deprecation discussion should include
> evidence of this as a preamble.
>
> However, there are users that want queries *not* restricted by partition
> or token, and SAI is unlikely to serve these use cases as well. Yes,
> neither perform this use case *well*, but I cannot support deprecating a
> feature when its replacement is very likely inferior for some workloads.
> Since it is hard to prove that nobody is using 2i this way (and I recall
> from the distant past that such users were known to exist), we need instead
> to prove SAI can serve these workloads acceptably before we declare it a
> suitable replacement.
>
> I think there exists a near future world where we can offer proper
> *global* secondary indexes, at which point it would be acceptable to
> deprecate 2i and recommend users switch to either global secondary indexes
> or SAI. Until then, I cannot see a good argument for it if we want to be
> considered a stable and mature product.
>
>
> On 10 Dec 2024, at 09:28, Caleb Rackliffe <calebrackli...@gmail.com>
> wrote:
>
> > I’m not convinced SAI has demonstrated a practical or theoretical
> capability to fully replace secondary indexes anyway. So it would be very
> premature to mark them deprecated.
>
> > If 2i indexes are to be marked as deprecated and SAI is beta, then what
> is actually the index implementation we stand behind in the production? It
> is like we are "abandoning" the former but the latter is not bullet-proof
> yet.
>
> The table-based 2i implementation has never been safe to use, and I don't
> think it ever will be, however we label it. (ex. CASSANDRA-18656, it's
> on-disk bloat, post-streaming rebuilds, etc.) There is no reason it should
> ever be more capable than SAI for any partition/token-restricted query
> use-case, and I don't really see how there's any short-term path for any
> local 2i implementation in C* to be efficient for anything else. There are
> presently no feature gaps on the query side.
>
> Anyway, there are still a lot of things we can improve about SAI (and
> things that already exist and are just waiting in the DS public fork)...I'm
> just not sure what reasonable use case the old 2i will be able to serve
> better.
>
> On Tue, Dec 10, 2024 at 5:41 AM Benedict <bened...@apache.org> wrote:
>
>> I’m not convinced SAI has demonstrated a practical or theoretical
>> capability to fully replace secondary indexes anyway. So it would be very
>> premature to mark them deprecated.
>>
>> On 10 Dec 2024, at 06:29, Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>> 
>>  ... then we should NOT mark it to be deprecated.
>>
>> On Tue, Dec 10, 2024 at 12:27 PM Štefan Miklošovič <
>> smikloso...@apache.org> wrote:
>>
>>> I have a hard time getting used to the "terminology" here. If 2i indexes
>>> are to be marked as deprecated and SAI is beta, then what is actually the
>>> index implementation we stand behind in the production? It is like we are
>>> "abandoning" the former but the latter is not bullet-proof yet. The signal
>>> it sends is that we don't have a non-deprecated bullet-proof index impl.
>>>
>>> Maybe it is just about the wording and people are just fine running
>>> deprecated things knowing they are production-ready, what I am used to is
>>> that if something is deprecated, then there is always a replacement which
>>> is recommended. If there isn't a recommended replacement which can fully
>>> superseed the current implementation then we should mark it to be
>>> deprecated.
>>>
>>> I understand that you are trying to find some "common ground" /
>>> expressing that we are moving towards SAI but I am not sure the wording is
>>> entirely correct or we should be careful how we frame it.
>>>
>>> On Tue, Dec 10, 2024 at 12:01 PM Mick Semb Wever <m...@apache.org> wrote:
>>>
>>>> > A possibility with SAI is to mark it beta while also marking 2i as
>>>> > deprecated (and leaving SASI as marked).  This sends a clear signal
>>>> > (imho) that SAI is the recommended solution forward but also being
>>>> > honest about its maturity and QA.
>>>>
>>>>
>>>>  (and leaving SASI as marked *experimental*)
>>>>
>>>
>
>

Reply via email to