I agree with Aleksey on how we should approach feature flags, and if we think 
2i simply don’t work we should make that determination and mark them broken not 
deprecated.

The only bug mentioned so far is 18656, which doesn’t clearly argue that the 
behaviour is incorrect rather than just undesired. The only breaking scenario I 
can think of is if we complete a bootstrap before the index build is complete. 
I am not sure if this is possible, but if it is we should probably fix that, 
and in the meantime perhaps document the flaw and describe work arounds (such 
as repairing after stopping a replica to be replaced). This isn’t a “remove the 
feature” level bug though, given my current understanding of it. If anything, 
it would be much more work than just fixing the bug.

If there’s a longer litany of breaking behaviours, let’s enumerate them and 
consider marking the feature as unsafe.

> On 10 Dec 2024, at 10:29, Caleb Rackliffe <calebrackli...@gmail.com> wrote:
> 
> I think my point here is that the hidden table 2i implementation has known 
> correctness/availability/operational/resource usage issues whether it has a 
> theoretical niche use-case or not from a query performance perspective.
> 
> To Štefan’s question, yes, more or less. I’d like to at least see some 
> success in production for the cases it was primarily designed for. That might 
> not be enough to make it the default if it needs to perform better than the 
> (broken) legacy 2i in global query situations. SAI is currently bad by design 
> for global queries across 1000s of SSTables (LCS), so it would either need to 
> be used in conjunction with a compaction strategy that aggressively limits 
> the number of live SSTables, otherwise modified to handle that case better, 
> or simply made the default w/ the guardrails it already has around these 
> things becuase there simply isn’t a usable alternative.
> 
>> On Dec 10, 2024, at 9:13 AM, Benedict Elliott Smith <bened...@apache.org> 
>> wrote:
>> 
>> 
>>> There is no reason it should ever be more capable than SAI for any 
>>> partition/token-restricted query use-case, and I don't really see how 
>>> there's any short-term path for any local 2i implementation in C* to be 
>>> efficient for anything else
>> 
>> While I am not personally aware of much evidence presented that SAI performs 
>> better than 2i for the partition-restricted case, I do believe it is 
>> theoretically likely to. But any deprecation discussion should include 
>> evidence of this as a preamble.
>> 
>> However, there are users that want queries not restricted by partition or 
>> token, and SAI is unlikely to serve these use cases as well. Yes, neither 
>> perform this use case well, but I cannot support deprecating a feature when 
>> its replacement is very likely inferior for some workloads. Since it is hard 
>> to prove that nobody is using 2i this way (and I recall from the distant 
>> past that such users were known to exist), we need instead to prove SAI can 
>> serve these workloads acceptably before we declare it a suitable replacement.
>> 
>> I think there exists a near future world where we can offer proper global 
>> secondary indexes, at which point it would be acceptable to deprecate 2i and 
>> recommend users switch to either global secondary indexes or SAI. Until 
>> then, I cannot see a good argument for it if we want to be considered a 
>> stable and mature product.
>> 
>> 
>>> On 10 Dec 2024, at 09:28, Caleb Rackliffe <calebrackli...@gmail.com> wrote:
>>> 
>>> > I’m not convinced SAI has demonstrated a practical or theoretical 
>>> > capability to fully replace secondary indexes anyway. So it would be very 
>>> > premature to mark them deprecated.
>>> 
>>> > If 2i indexes are to be marked as deprecated and SAI is beta, then what 
>>> > is actually the index implementation we stand behind in the production? 
>>> > It is like we are "abandoning" the former but the latter is not 
>>> > bullet-proof yet.
>>> 
>>> The table-based 2i implementation has never been safe to use, and I don't 
>>> think it ever will be, however we label it. (ex. CASSANDRA-18656, it's 
>>> on-disk bloat, post-streaming rebuilds, etc.) There is no reason it should 
>>> ever be more capable than SAI for any partition/token-restricted query 
>>> use-case, and I don't really see how there's any short-term path for any 
>>> local 2i implementation in C* to be efficient for anything else. There are 
>>> presently no feature gaps on the query side.
>>> 
>>> Anyway, there are still a lot of things we can improve about SAI (and 
>>> things that already exist and are just waiting in the DS public fork)...I'm 
>>> just not sure what reasonable use case the old 2i will be able to serve 
>>> better.
>>> 
>>> On Tue, Dec 10, 2024 at 5:41 AM Benedict <bened...@apache.org 
>>> <mailto:bened...@apache.org>> wrote:
>>>> I’m not convinced SAI has demonstrated a practical or theoretical 
>>>> capability to fully replace secondary indexes anyway. So it would be very 
>>>> premature to mark them deprecated.
>>>> 
>>>>> On 10 Dec 2024, at 06:29, Štefan Miklošovič <smikloso...@apache.org 
>>>>> <mailto:smikloso...@apache.org>> wrote:
>>>>> 
>>>>> 
>>>>>  ... then we should NOT mark it to be deprecated. 
>>>>> 
>>>>> On Tue, Dec 10, 2024 at 12:27 PM Štefan Miklošovič 
>>>>> <smikloso...@apache.org <mailto:smikloso...@apache.org>> wrote:
>>>>>> I have a hard time getting used to the "terminology" here. If 2i indexes 
>>>>>> are to be marked as deprecated and SAI is beta, then what is actually 
>>>>>> the index implementation we stand behind in the production? It is like 
>>>>>> we are "abandoning" the former but the latter is not bullet-proof yet. 
>>>>>> The signal it sends is that we don't have a non-deprecated bullet-proof 
>>>>>> index impl.
>>>>>> 
>>>>>> Maybe it is just about the wording and people are just fine running 
>>>>>> deprecated things knowing they are production-ready, what I am used to 
>>>>>> is that if something is deprecated, then there is always a replacement 
>>>>>> which is recommended. If there isn't a recommended replacement which can 
>>>>>> fully superseed the current implementation then we should mark it to be 
>>>>>> deprecated. 
>>>>>> 
>>>>>> I understand that you are trying to find some "common ground" / 
>>>>>> expressing that we are moving towards SAI but I am not sure the wording 
>>>>>> is entirely correct or we should be careful how we frame it. 
>>>>>> 
>>>>>> On Tue, Dec 10, 2024 at 12:01 PM Mick Semb Wever <m...@apache.org 
>>>>>> <mailto:m...@apache.org>> wrote:
>>>>>>> > A possibility with SAI is to mark it beta while also marking 2i as
>>>>>>> > deprecated (and leaving SASI as marked).  This sends a clear signal
>>>>>>> > (imho) that SAI is the recommended solution forward but also being
>>>>>>> > honest about its maturity and QA.
>>>>>>> 
>>>>>>> 
>>>>>>>  (and leaving SASI as marked *experimental*)
>> 

Reply via email to