Maybe there could be a separator char as one of the adapter’s parameters. People should choose a value, say ‘$’ or ‘#’, that is legal in an unquoted SQL identifier but does not occur in any of their index or type names.
If not specified, the adapter would end up in a simple mode, say looking for indexes first, then looking for types, and people would need to make sure indexes and types have distinct names. After the transition to single-type indexes, people could stop using the parameter. Julian > On Jun 29, 2018, at 4:43 PM, Andrei Sereda <and...@sereda.cc> wrote: > > That's a valid point. Then user would define a different pattern like > "i$index_t$type" for his cluster. > > I think we should first answer wherever such scenarios should be supported > by calcite (given that they're already deprecated by the vendor). If yes, > what should be collision strategy ? User defined pattern like above or > failure or auto generated name ? > > On Fri, Jun 29, 2018, 19:14 Julian Hyde <jh...@apache.org> wrote: > >>> In elastic (index/type) pair is guaranteed to be unique therefore >>> "${index}_${type}" will be also unique (as string). This is only >> necessary >>> when we have several types per index. Valid question is wherever user >>> should be allowed such flexibility. >> >> Uniqueness is not my concern. >> >> Suppose there is an index called "x_y" with a type called "z", and >> another index called "x" with a type called "y_z". If I write "x_y_z" >> it's not clear how it should be broken into index/type. >> >> >> On Fri, Jun 29, 2018 at 3:15 PM, Andrei Sereda <and...@sereda.cc> wrote: >>>> Can you show how those examples affect SQL against the ES adapter and/or >>> how they affect JSON models? >>> >>> The discussion is how to properly bridge (index/type) concept from ES >> into >>> relational world. Proposal to use placeholders ($index / $type) affects >>> only how table is named in calcite. They're not used as SQL literals. IE >> it >>> affects only configuration phase of the schema. >>> Pretty much we're doing string/replace to derive table name from >>> ($index/$type). >>> >>>> You seem to be using '_' as a separator character. Are we sure that >>>> people will never use it in index or type name? Separator characters >>>> often cause problems. >>> In elastic (index/type) pair is guaranteed to be unique therefore >>> "${index}_${type}" will be also unique (as string). This is only >> necessary >>> when we have several types per index. Valid question is wherever user >>> should be allowed such flexibility. >>> >>> >>> >>> On Fri, Jun 29, 2018 at 2:19 PM Julian Hyde <jh...@apache.org> wrote: >>> >>>> Andrei, >>>> >>>> I'm not an ES user so I don't fully understand this issue, but my two >>>> cents anyway... >>>> >>>> Can you show how those examples affect SQL against the ES adapter >>>> and/or how they affect JSON models? >>>> >>>> You seem to be using '_' as a separator character. Are we sure that >>>> people will never use it in index or type name? Separator characters >>>> often cause problems. >>>> >>>> Julian >>>> >>>> >>>> >>>> >>>> On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda <and...@sereda.cc> >> wrote: >>>>> I agree there should be a configuration option. How about the >> following >>>>> approach. >>>>> >>>>> Expose both variables ${index} and ${type} in configuration (JSON) and >>>> user >>>>> will use them to generate table name in calcite schema. >>>>> >>>>> Example >>>>> "table_name": "${type}" // current >>>>> "table_name": "${index}" // new (default?) >>>>> "table_name": "${index}_${type}" // most generic. supports multiple >> types >>>>> per index >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 29, 2018 at 9:26 AM Michael Mior <mm...@apache.org> >> wrote: >>>>> >>>>>> I think it sounds like you and Andrei are in a good position to >> tackle >>>> this >>>>>> one so I'm happy to have you both work on whatever solution you >> think is >>>>>> best. >>>>>> >>>>>> -- >>>>>> Michael Mior >>>>>> mm...@apache.org >>>>>> >>>>>> >>>>>> >>>>>> Le ven. 29 juin 2018 à 04:19, Christian Beikov < >>>> christian.bei...@gmail.com >>>>>>> >>>>>> a écrit : >>>>>> >>>>>>> IMO the best solution would be to make it configurable by >> introducing >>>> a >>>>>>> "table_mapping" config with values >>>>>>> >>>>>>> * type - every type in the known indices is mapped as table >>>>>>> * index - every known index is mapped as table >>>>>>> >>>>>>> We'd probably also need a "type_field" configuration for defining >>>> which >>>>>>> field to use for the type determination as one of the possible >> future >>>>>>> ways to do things is to introduce a custom field: >>>>>>> >>>>>>> >>>>>> >>>> >> https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2 >>>>>>> >>>>>>> We already detect the ES version, so we can set a smart default for >>>> this >>>>>>> setting. Let's make the index config param optional. >>>>>>> >>>>>>> * When no index is given, we discover indexes, the default for >>>>>>> "table_mapping" then is "index" >>>>>>> * When index is given, the we only discover types according to >> the >>>>>>> "type_field" configuration and the default for "table_mapping" >> is >>>>>>> "type" >>>>>>> >>>>>>> This would also allow to discover indexes but still use "type" as >>>>>>> "table_mapping". >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> Mit freundlichen Grüßen, >>>>>>> >>>> ------------------------------------------------------------------------ >>>>>>> *Christian Beikov* >>>>>>> Am 29.06.2018 um 02:41 schrieb Andrei Sereda: >>>>>>>> Yes. There is an API to list all indexes / types in elastic. They >>>> can >>>>>> be >>>>>>>> automatically imported into a schema. >>>>>>>> >>>>>>>> What needs to be agreed upon is how to expose those elements in >>>> calcite >>>>>>>> schema (naming / behaviour). >>>>>>>> >>>>>>>> 1) Many (most?) of setups are single type per index. Natural way >> to >>>>>> name >>>>>>>> would be "elastic.$index" (elastic being schema name). Multiple >>>>>> indexes >>>>>>>> would be under same schema "elastic.index1" "elastic.index2" etc. >>>>>>>> >>>>>>>> 2) What if index has several types should they exported as >> calcite >>>>>>> tables: >>>>>>>> "elastic.$index_type1" "elastic.$index_type2" ? Or (current >>>> behaviour) >>>>>>> as >>>>>>>> "elastic.type1" and "elastic.type2". Or as subschema >>>>>>>> "elastic.$index.type1" ? >>>>>>>> >>>>>>>> Now what if one has combination of (1) and (2) ? >>>>>>>> Setup (2) is already deprecated (and will be unsupported in next >>>>>> version) >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov < >>>>>>> christian.bei...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Is there an API to discover indexes? If there is, I'd suggest we >>>>>> allow a >>>>>>>>> config option that to make the adapter discover the possible >>>> indexes. >>>>>>>>> We'd still have to adapt the code a bit, but internally, the >> schema >>>>>>>>> could just keep a cache of type name to index name map and be >> able >>>> to >>>>>>>>> support both scenarios. >>>>>>>>> >>>>>>>>> >>>>>>>>> Mit freundlichen Grüßen, >>>>>>>>> >>>>>> >> ------------------------------------------------------------------------ >>>>>>>>> *Christian Beikov* >>>>>>>>> Am 29.06.2018 um 00:12 schrieb Andrei Sereda: >>>>>>>>>>> 1) What's the time horizon for the current adapter no longer >>>> working >>>>>>>>> with these >>>>>>>>>> changes to ES ? >>>>>>>>>> Current adapter will be working for a while with existing >> setup. >>>> The >>>>>>>>>> problem is nomenclature and ease of use. >>>>>>>>>> >>>>>>>>>> Their new SQL concepts mapping >>>>>>>>>> < >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html >>>>>>>>>> drops >>>>>>>>>> the notion of ES type (which before was equivalent of RDBMS >> table) >>>>>> and >>>>>>>>> uses >>>>>>>>>> ES index as new table equivalent (before ES index was equal to >>>>>>> database). >>>>>>>>>> Most users use elastic this way (one type , one index) index == >>>>>> table. >>>>>>>>>> >>>>>>>>>> Currently calcite requires schema per index. In RDBMS parlance >>>>>> database >>>>>>>>> per >>>>>>>>>> table (I'd like to change that). >>>>>>>>>> >>>>>>>>>>> 2) Any guess how complicated it would be to maintain code >> paths >>>> for >>>>>>> both >>>>>>>>>>> behaviours? I know this is probably really challenging to >>>> estimate, >>>>>>> but >>>>>>>>> I >>>>>>>>>>> really have no idea of the scope of these changes. Would it >> mean >>>> two >>>>>>>>>>> different ES adapters? >>>>>>>>>> One can have just a separate calcite schema implementations >> (same >>>>>>>>> adapter / >>>>>>>>>> module) : >>>>>>>>>> 1) LegacySchema (old). Schema can have only one index (but >>>> multiple >>>>>>>>>> types). Type == table in this case. >>>>>>>>>> 2) NewSchema (new). Single schema can have multiple indexes >>>> (type is >>>>>>>>>> dropped). Index == table in this case >>>>>>>>>> >>>>>>>>>>> 3) Do we really need compatibility with the current version of >>>> the >>>>>>>>>> adapter? >>>>>>>>>>> IMO this depends on what versions of ES we would lose support >> for >>>>>> and >>>>>>>>> how >>>>>>>>>>> complex it would be for users of the current ES adapter to >> make >>>>>>> updates >>>>>>>>>> for >>>>>>>>>>> any Calcite API changes. >>>>>>>>>> The issue is not in adapter but how calcite schema exposes >> tables. >>>>>>>>> Should >>>>>>>>>> it expose index as individual table (new), or ES type (old) ? >>>>>>>>>> >>>>>>>>>> Andrei. >>>>>>>>>> >>>>>>>>>> On Thu, Jun 28, 2018 at 5:23 PM Michael Mior <mm...@apache.org >>> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Unfortunately I know very little about ES so I'm not in a >> great >>>>>>>>> position to >>>>>>>>>>> asses the impact of these changes. I will say that that legacy >>>>>>>>>>> compatibility is great, but maintaining two sets of logic is >>>> always >>>>>> a >>>>>>>>>>> challenge. A few follow up questions: >>>>>>>>>>> >>>>>>>>>>> 1) What's the time horizon for the current adapter no longer >>>> working >>>>>>>>> with >>>>>>>>>>> these changes to ES? >>>>>>>>>>> >>>>>>>>>>> 2) Any guess how complicated it would be to maintain code >> paths >>>> for >>>>>>> both >>>>>>>>>>> behaviours? I know this is probably really challenging to >>>> estimate, >>>>>>> but >>>>>>>>> I >>>>>>>>>>> really have no idea of the scope of these changes. Would it >> mean >>>> two >>>>>>>>>>> different ES adapters? >>>>>>>>>>> >>>>>>>>>>> 3) Do we really need compatibility with the current version of >>>> the >>>>>>>>> adapter? >>>>>>>>>>> IMO this depends on what versions of ES we would lose support >> for >>>>>> and >>>>>>>>> how >>>>>>>>>>> complex it would be for users of the current ES adapter to >> make >>>>>>> updates >>>>>>>>> for >>>>>>>>>>> any Calcite API changes. >>>>>>>>>>> >>>>>>>>>>> Thanks for your continued work on the ES adapter Andrei! >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Michael Mior >>>>>>>>>>> mm...@apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Le jeu. 28 juin 2018 à 12:57, Andrei Sereda <and...@sereda.cc> >> a >>>>>>> écrit >>>>>>>>> : >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> Elastic announced >>>>>>>>>>>> < >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html >>>>>>>>>>>> that they will be deprecating mapping types in ES6 and >> indexes >>>> will >>>>>>> be >>>>>>>>>>>> single-typed only. >>>>>>>>>>>> >>>>>>>>>>>> Historical analogy < >> https://www.elastic.co/blog/index-vs-type> >>>>>>> between >>>>>>>>>>>> RDBMS and elastic was that index is equivalent to a database >> and >>>>>> type >>>>>>>>>>>> corresponds to table in that database. In a couple of >> releases >>>>>>> (ES6-8) >>>>>>>>>>> this >>>>>>>>>>>> shall not longer be true. >>>>>>>>>>>> >>>>>>>>>>>> Recent SQL addition >>>>>>>>>>>> <https://www.elastic.co/blog/elasticsearch-6-3-0-released> >> to >>>>>>> elastic >>>>>>>>>>>> confirms >>>>>>>>>>>> this trend >>>>>>>>>>>> < >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html >>>>>>>>>>>>> . >>>>>>>>>>>> Index is equivalent to a table and there are no more ES >> types. >>>>>>>>>>>> >>>>>>>>>>>> I would like to propose to include this logic in Calcite ES >>>>>> adapter. >>>>>>>>> IE, >>>>>>>>>>>> expose each ES single-typed index as a separate table inside >>>>>> calcite >>>>>>>>>>>> schema. This is in contrast to current integration where >> schema >>>>>> can >>>>>>>>> only >>>>>>>>>>>> have a single index. Current approach forces you to create >>>> multiple >>>>>>>>>>> schemas >>>>>>>>>>>> to query single-typed indexes (on the same ES cluster). >>>>>>>>>>>> >>>>>>>>>>>> Legacy compatibility can always be controlled with >> configuration >>>>>>>>>>>> parameters. >>>>>>>>>>>> >>>>>>>>>>>> Do you agree with such changes ? If yes, would you consider a >>>> PR ? >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Andrei. >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>