If we assume SAI is what we should use by default for the cluster, would it 
make sense to allow

CREATE INDEX [IF NOT EXISTS] [name] ON <table> (<column>)

But use a new yaml config that switches from legacy to SAI?

default_2i_impl: sai

For 5.0 we can default to “legacy” (new features disabled by default), but 
allow operators to change this to SAI if they desire?

> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.

For 5.0, I would argue all indexes should be disabled by default and require 
operators to allow… I am totally cool with a new allow list to allow some impl..

secondary_indexes_enabled: false
secondary_indexes_impl_allowed: [] # default, but could allow users to do 
[’sai’] if they wish to allow sai… this does have weird semantics as it causes 
_enabled to be ignored… this could also replace _enabled, but what is allowed 
in the true case isn’t 100% clear?  Maybe you need _enabled=true and this allow 
list limits what is actually allowed (prob is way more clear)?


> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This would 
> both be flexible enough to accommodate index implementation selection and 
> prescriptive enough to force the user to make a decision (and wouldn't change 
> the legacy behavior of the existing CREATE INDEX). In this world, creating a 
> legacy 2i might look something like CREATE INDEX...USING `legacy`.

I do not mind a new syntax that tries to be more clear, but the “replace” is 
what I would push back against… we should keep the 2 existing syntax and not 
force users to migrate… we can logically merge the 3 syntaxes, but we should 
not remove the 2 others.

CREATE INDEX - gets rewritten to CREATE INDEX… USING config.default_2i_imp
CREATE CUSTOM INDEX` - gets rewritten to new using syntax

> 3.) Eventually deprecate CREATE CUSTOM INDEX…USING.

I don’t mind producing a warning telling users its best to use the new syntax, 
but if its low effort for us to maintain, we should… and since this can be 
rewritten to the new format in the parser, this should be low effort to 
support, so we should?

> On May 9, 2023, at 2:44 PM, Caleb Rackliffe <calebrackli...@gmail.com> wrote:
> 
> Earlier today, Mick started a thread on the future of our index creation DDL 
> on Slack:
> 
> https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019
> 
> At the moment, there are two ways to create a secondary index.
> 
> 1.) CREATE INDEX [IF NOT EXISTS] [name] ON <table> (<column>)
> 
> This creates an optionally named legacy 2i on the provided table and column.
> 
>     ex. CREATE INDEX my_index ON kd.tbl(my_text_col)
> 
> 2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON <table> (<column>) USING 
> <class|alias> [WITH OPTIONS = <options>]
> 
> This creates a secondary index on the provided table and column using the 
> specified 2i implementation class and (optional) parameters.
> 
>     ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING 
> 'StorageAttachedIndex'
> 
> (Note that the work on SAI added aliasing, so `StorageAttachedIndex` is 
> shorthand for the fully-qualified class name, which is also valid.)
> 
> So what is there to discuss?
> 
> The concern Mick raised is...
> 
> "...just folk continuing to use CREATE INDEX  because they think CREATE 
> CUSTOM INDEX is advanced (or just don't know of it), and we leave users doing 
> 2i (when they think they are, and/or we definitely want them to be, using 
> SAI)"
> 
> To paraphrase, we want people to use SAI once it's available where possible, 
> and the default behavior of CREATE INDEX could be at odds w/ that.
> 
> The proposal we seem to have landed on is something like the following:
> 
> For 5.0:
> 
> 1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
> 2.) Leave CREATE CUSTOM INDEX...USING... available by default.
> 
> (Note: How this would interact w/ the existing secondary_indexes_enabled YAML 
> options isn't clear yet.)
> 
> Post-5.0:
> 
> 1.) Deprecate and eventually remove SASI when SAI hits full feature parity w/ 
> it.
> 2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a 
> hybrid between the two. For example, CREATE INDEX...USING...WITH. This would 
> both be flexible enough to accommodate index implementation selection and 
> prescriptive enough to force the user to make a decision (and wouldn't change 
> the legacy behavior of the existing CREATE INDEX). In this world, creating a 
> legacy 2i might look something like CREATE INDEX...USING `legacy`.
> 3.) Eventually deprecate CREATE CUSTOM INDEX...USING.
> 
> Eventually we would have a single enabled DDL statement for index creation 
> that would be minimal but also explicit/able to handle some evolution.
> 
> What does everyone think?

Reply via email to