Re: [DISCUSS] The future of CREATE INDEX

Caleb Rackliffe Wed, 17 May 2023 14:38:27 -0700

> 1. What's up with naming anything "legacy". Calling the current index
type "2i" seems perfectly fine with me. From what I've heard it can work
great for many users?


We can give the existing default secondary index any public-facing name we
like, but "2i" is too broad. It just stands for "secondary index", which is
obviously broad enough to cover anything. The use of "legacy" is
conversational, and it reflects the assertion that SAI should, when at
feature parity, be superior to the existing default 2i implementation for
any workload w/ partition-restricted queries. It will surely be possible to
construct a scenario where SAI's SSTable-attached design, combined with
global scatter/gather queries and a huge number of local/per-node SSTables,
causes it to perform worse than the existing default 2i, which is just an
inverted index implemented as a hidden table w/ search terms as partition
keys.

> 2. It should be possible to always specify the index type explicitly. In
other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
(if it isn't already)

Yes. It should be possible to specify the type no matter what syntax we
use. However, if we started this project from scratch, I don't think we
would build CREATE CUSTOM INDEX in the first place.

> 2b) It should be possible to just say "SAI" or "SASIIndex", not the full
Java path.
> 3. It's a fair point that the "CUSTOM" word may make this sound a bit too
special... The simplest change IMO is to just make the CUSTOM work optional.

Agreed on both, and 2b (aliasing) is already supported for CREATE CUSTOM
INDEX. (It may be that we should move toward something like a
ServiceLoader-enabled set of named 2i's.)

> 4. Benedict's point that a YAML option is per node is a good one... For
example, you wouldn't want some nodes to create a 2i index and other nodes
a SAI index for the same index.... That said, how many other YAML options
can you think of that would create total chaos if different nodes actually
had different values for them? For example what if a guardrail allowed some
action on some nodes but not others?  Maybe what we need is a jira ticket
to enforce that certain sections of the config must not differ?

At some point, my guess is that TCM will give us the ability to have
consistent, cluster-wide metadata/configuration. Right now, we have quite a
few YAML options that control cluster-wide behavior including our
prohibition on creating experimental SASI indexes and our option to disable
2i creation. None of the options we've discussed should make it possible
for a single secondary index on a column of a table to have differing local
implementations.

> 6. MySQL allows the DBA to determine the default engine. This seems to
work well. If the user doesn't care, they don't care, if they do, they use
the explicit syntax.

Sounds like option #3 on the 3rd POLL.

On Wed, May 17, 2023 at 3:33 PM Henrik Ingo <henrik.i...@datastax.com>
wrote:

> I have read the thread but chose to reply to the top message...
>
> I'm coming to this with the background of having worked with MySQL, where
> both the storage engine and index implementation had many options, and
> often of course some index types were only available in some engines.
>
> I would humbly suggest:
>
> 1. What's up with naming anything "legacy". Calling the current index type
> "2i" seems perfectly fine with me. From what I've heard it can work great
> for many users?
>
> 2. It should be possible to always specify the index type explicitly. In
> other words, it should be possible to CREATE CUSTOM INDEX ... USING "2i"
> (if it isn't already)
>
> 2b) It should be possible to just say "SAI" or "SASIIndex", not the full
> Java path.
>
> 3. It's a fair point that the "CUSTOM" word may make this sound a bit too
> special... The simplest change IMO is to just make the CUSTOM work optional.
>
> 4. Benedict's point that a YAML option is per node is a good one... For
> example, you wouldn't want some nodes to create a 2i index and other nodes
> a SAI index for the same index.... That said, how many other YAML options
> can you think of that would create total chaos if different nodes actually
> had different values for them? For example what if a guardrail allowed some
> action on some nodes but not others?  Maybe what we need is a jira ticket
> to enforce that certain sections of the config must not differ?
>
> 5. That said, the default index type could also be a property of the
> keyspace
>
> 6. MySQL allows the DBA to determine the default engine. This seems to
> work well. If the user doesn't care, they don't care, if they do, they use
> the explicit syntax.
>
> henrik
>
>
> On Wed, May 10, 2023 at 12:45 AM Caleb Rackliffe <calebrackli...@gmail.com>
> wrote:
>
>> Earlier today, Mick started a thread on the future of our index creation
>> DDL on Slack:
>>
>> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
>> <https://urldefense.com/v3/__https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019__;!!PbtH5S7Ebw!YuQzuQkxC0gmD9ofXEGoaEmVMwPwZ_ab8-B_PCfRfNsQtKIZDLOIuw38jnV1Vt8TqHXn-818hL-CoLbVJXBTCWgSxoE$>
>>
>> At the moment, there are two ways to create a secondary index.
>>
>> *1.) CREATE INDEX [IF NOT EXISTS] [name] ON <table> (<column>)*
>>
>> This creates an optionally named legacy 2i on the provided table and
>> column.
>>
>>     ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
>>
>> *2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON <table> (<column>)
>> USING <class|alias> [WITH OPTIONS = <options>]*
>>
>> This creates a secondary index on the provided table and column using the
>> specified 2i implementation class and (optional) parameters.
>>
>>     ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
>> 'StorageAttachedIndex'
>>
>> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
>> shorthand for the fully-qualified class name, which is also valid.)
>>
>> So what is there to discuss?
>>
>> The concern Mick raised is...
>>
>> "...just folk continuing to use CREATE INDEX  because they think CREATE
>> CUSTOM INDEX is advanced (or just don't know of it), and we leave users
>> doing 2i (when they think they are, and/or we definitely want them to be,
>> using SAI)"
>>
>> To paraphrase, we want people to use SAI once it's available where
>> possible, and the default behavior of CREATE INDEX could be at odds w/
>> that.
>>
>> The proposal we seem to have landed on is something like the following:
>>
>> For 5.0:
>>
>> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
>> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
>>
>> (Note: How this would interact w/ the existing secondary_indexes_enabled
>> YAML options isn't clear yet.)
>>
>> Post-5.0:
>>
>> 1.) Deprecate and eventually remove SASI when SAI hits full feature
>> parity w/ it.
>> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
>> hybrid between the two. For example, CREATE INDEX...USING...WITH. This
>> would both be flexible enough to accommodate index implementation selection
>> and prescriptive enough to force the user to make a decision (and wouldn't
>> change the legacy behavior of the existing CREATE INDEX). In this world,
>> creating a legacy 2i might look something like CREATE INDEX...USING
>> `legacy`.
>> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
>>
>> Eventually we would have a single enabled DDL statement for index
>> creation that would be minimal but also explicit/able to handle some
>> evolution.
>>
>> What does everyone think?
>>
>
>
> --
>
> Henrik Ingo
>
> c. +358 40 569 7354
>
> w. www.datastax.com
>
> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>
> <https://www.linkedin.com/company/datastax/>
> <https://github.com/datastax/>
>
>

Re: [DISCUSS] The future of CREATE INDEX

Reply via email to