Earlier today, Mick started a thread on the future of our index creation
DDL on Slack:

https://the-asf.slack.com/archives/C018YGVCHMZ/p1683527794220019

At the moment, there are two ways to create a secondary index.

*1.) CREATE INDEX [IF NOT EXISTS] [name] ON <table> (<column>)*

This creates an optionally named legacy 2i on the provided table and column.

    ex. CREATE INDEX my_index ON kd.tbl(my_text_col)

*2.) CREATE CUSTOM INDEX [IF NOT EXISTS] [name] ON <table> (<column>) USING
<class|alias> [WITH OPTIONS = <options>]*

This creates a secondary index on the provided table and column using the
specified 2i implementation class and (optional) parameters.

    ex. CREATE CUSTOM INDEX my_index ON ks.tbl(my_text_col) USING
'StorageAttachedIndex'

(Note that the work on SAI added aliasing, so `StorageAttachedIndex` is
shorthand for the fully-qualified class name, which is also valid.)

So what is there to discuss?

The concern Mick raised is...

"...just folk continuing to use CREATE INDEX  because they think CREATE
CUSTOM INDEX is advanced (or just don't know of it), and we leave users
doing 2i (when they think they are, and/or we definitely want them to be,
using SAI)"

To paraphrase, we want people to use SAI once it's available where
possible, and the default behavior of CREATE INDEX could be at odds w/ that.

The proposal we seem to have landed on is something like the following:

For 5.0:

1.) Disable by default the creation of new legacy 2i via CREATE INDEX.
2.) Leave CREATE CUSTOM INDEX...USING... available by default.

(Note: How this would interact w/ the existing secondary_indexes_enabled
YAML options isn't clear yet.)

Post-5.0:

1.) Deprecate and eventually remove SASI when SAI hits full feature parity
w/ it.
2.) Replace both CREATE INDEX and CREATE CUSTOM INDEX w/ something of a
hybrid between the two. For example, CREATE INDEX...USING...WITH. This
would both be flexible enough to accommodate index implementation selection
and prescriptive enough to force the user to make a decision (and wouldn't
change the legacy behavior of the existing CREATE INDEX). In this world,
creating a legacy 2i might look something like CREATE INDEX...USING `legacy`
.
3.) Eventually deprecate CREATE CUSTOM INDEX...USING.

Eventually we would have a single enabled DDL statement for index creation
that would be minimal but also explicit/able to handle some evolution.

What does everyone think?

Reply via email to