Sorry maybe I didn't make myself clear. I think some format property is
very suitable to
be hinted, like "ignore errors during parsing".  Maybe we should have a
dedicated
Hintable interface, and have `supportedHintOptions` method inside. All
factories supports
hint could implement from it.

Best,
Kurt


On Wed, Mar 18, 2020 at 9:10 PM Timo Walther <twal...@apache.org> wrote:

> Hi everyone,
>
> +1 to Kurt's suggestion. Let's just have it in source and sink factories
> for now. We can still move this method up in the future. Currently, I
> don't see a need for catalogs or formats. Because how would you target a
> format in the query?
>
> @Danny: Can you send a link to your PoC? I'm very skeptical about
> creating a new CatalogTable in planner. Actually CatalogTable should be
> immutable between Catalog and Factory. Because a catalog can return its
> own factory and fully control the instantiation. Depending on the
> implementation, that means it can be possible that the catalog has
> encoded more information in a concrete subclass implementing the
> interface. I vote for separating the concerns of catalog information and
> hints in the factory explicitly.
>
> Regards,
> Timo
>
>
> On 18.03.20 05:41, Jingsong Li wrote:
> > Hi,
> >
> > I am thinking we can provide hints to *table* related instances.
> > - TableFormatFactory: of cause we need hints support, there are many
> format
> > options in DDL too.
> > - catalog and module: I don't know, maybe in future we can provide some
> > hints for them.
> >
> > Best,
> > Jingsong Lee
> >
> > On Wed, Mar 18, 2020 at 12:28 PM Danny Chan <yuzhao....@gmail.com>
> wrote:
> >
> >> Yes, I think we should move the `supportedHintOptions` from TableFactory
> >> to TableSourceFactory, and we also need to add the interface to
> >> TableSinkFactory though because sink target table may also have hints
> >> attached.
> >>
> >> Best,
> >> Danny Chan
> >> 在 2020年3月18日 +0800 AM11:08,Kurt Young <ykt...@gmail.com>,写道:
> >>> Have one question for adding `supportedHintOptions` method to
> >>> `TableFactory`. It seems
> >>> `TableFactory` is a base factory interface for all *table module*
> related
> >>> instances, such as
> >>> catalog, module, format and so on. It's not created only for *table*.
> Is
> >> it
> >>> possible to move it
> >>> to `TableSourceFactory`?
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Wed, Mar 18, 2020 at 10:59 AM Danny Chan <yuzhao....@gmail.com>
> >> wrote:
> >>>
> >>>> Thanks Timo ~
> >>>>
> >>>> For the naming itself, I also think the PROPERTIES is not that
> >> concise, so
> >>>> +1 for OPTIONS (I had thought about that, but there are many codes in
> >>>> current Flink called it properties, i.e. the DescriptorProperties,
> >>>> #getSupportedProperties), let’s use OPTIONS if this is our new
> >> preference.
> >>>>
> >>>> +1 to `Set<ConfigOption> supportedHintOptions()` because the
> >> ConfigOption
> >>>> can take more info. AFAIK, Spark also call their table options instead
> >> of
> >>>> properties. [1]
> >>>>
> >>>> In my local POC, I did create a new CatalogTable, and it works for
> >> current
> >>>> connectors well, all the DDL tables would finally yield a CatalogTable
> >>>> instance and we can apply the options to that(in the
> CatalogSourceTable
> >>>> when we generating the TableSource), the pros is that we do not need
> to
> >>>> modify the codes of connectors itself. If we split the options from
> >>>> CatalogTable, we may need to add some additional logic in each
> >> connector
> >>>> factories in order to merge these properties (and the logic are almost
> >> the
> >>>> same), what do you think about this?
> >>>>
> >>>> [1]
> >>>>
> >>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html
> >>>>
> >>>> Best,
> >>>> Danny Chan
> >>>> 在 2020年3月17日 +0800 PM10:10,Timo Walther <twal...@apache.org>,写道:
> >>>>> Hi Danny,
> >>>>>
> >>>>> thanks for updating the FLIP. I think your current design is
> >> sufficient
> >>>>> to separate hints from result-related properties.
> >>>>>
> >>>>> One remark to the naming itself: I would vote for calling the hints
> >>>>> around table scan `OPTIONS('k'='v')`. We used the term "properties"
> >> in
> >>>>> the past but since we want to unify the Flink configuration
> >> experience,
> >>>>> we should use consistent naming and classes around `ConfigOptions`.
> >>>>>
> >>>>> It would be nice to use `Set<ConfigOption> supportedHintOptions();`
> >> to
> >>>>> start using config options instead of pure string properties. This
> >> will
> >>>>> also allow us to generate documentation in the future around
> >> supported
> >>>>> data types, ranges, etc. for options. At some point we would also
> >> like
> >>>>> to drop `DescriptorProperties` class. "Options" is also used in the
> >>>>> documentation [1] and in the SQL/MED standard [2].
> >>>>>
> >>>>> Furthermore, I would still vote for separating CatalogTable and hint
> >>>>> options. Otherwise the planner would need to create a new
> >> CatalogTable
> >>>>> instance which might not always be easy. We should offer them via:
> >>>>>
> >>>>> org.apache.flink.table.factories.TableSourceFactory.Context#getHints:
> >>>>> ReadableConfig
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> Regards,
> >>>>> Timo
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table
> >>>>> [2] https://wiki.postgresql.org/wiki/SQL/MED
> >>>>>
> >>>>>
> >>>>> On 12.03.20 15:06, Stephan Ewen wrote:
> >>>>>> @Danny sounds good.
> >>>>>>
> >>>>>> Maybe it is worth listing all the classes of problems that you
> >> want to
> >>>>>> address and then look at each class and see if hints are a good
> >> default
> >>>>>> solution or a good optional way of simplifying things?
> >>>>>> The discussion has grown a lot and it is starting to be hard to
> >>>> distinguish
> >>>>>> the parts where everyone agrees from the parts were there are
> >> concerns.
> >>>>>>
> >>>>>> On Thu, Mar 12, 2020 at 2:31 PM Danny Chan <danny0...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>>> Thanks Stephan ~
> >>>>>>>
> >>>>>>> We can remove the support for properties that may change the
> >>>> semantics of
> >>>>>>> query if you think that is a trouble.
> >>>>>>>
> >>>>>>> How about we support the /*+ properties() */ hint only for those
> >>>> optimize
> >>>>>>> parameters, such as the fetch size of source or something like
> >> that,
> >>>> does
> >>>>>>> that make sense?
> >>>>>>>
> >>>>>>> Stephan Ewen <se...@apache.org>于2020年3月12日 周四下午7:45写道:
> >>>>>>>
> >>>>>>>> I think Bowen has actually put it very well.
> >>>>>>>>
> >>>>>>>> (1) Hints that change semantics looks like trouble waiting to
> >>>> happen. For
> >>>>>>>> example Kafka offset handling should be in filters. The Kafka
> >>>> source
> >>>>>>> should
> >>>>>>>> support predicate pushdown.
> >>>>>>>>
> >>>>>>>> (2) Hints should not be a workaround for current shortcomings.
> >> A
> >>>> lot of
> >>>>>>> the
> >>>>>>>> suggested above sounds exactly like that. Working around
> >>>> catalog/DDL
> >>>>>>>> shortcomings, missing exposure of metadata (offsets), missing
> >>>> predicate
> >>>>>>>> pushdown in Kafka. Abusing a feature like hints now as a quick
> >> fix
> >>>> for
> >>>>>>>> these issues, rather than fixing the root causes, will much
> >> likely
> >>>> bite
> >>>>>>> us
> >>>>>>>> back badly in the future.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Stephan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Mar 12, 2020 at 10:43 AM Kurt Young <ykt...@gmail.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> It seems this FLIP's name is somewhat misleading. From my
> >>>>>>> understanding,
> >>>>>>>>> this FLIP is trying to
> >>>>>>>>> address the dynamic parameter issue, and table hints is the
> >> way
> >>>> we wan
> >>>>>>> to
> >>>>>>>>> choose. I think we should
> >>>>>>>>> be focus on "what's the right way to solve dynamic property"
> >>>> instead of
> >>>>>>>>> discussing "whether table
> >>>>>>>>> hints can affect query semantics".
> >>>>>>>>>
> >>>>>>>>> For now, there are two proposed ways to achieve dynamic
> >> property:
> >>>>>>>>> 1. FLIP-110: create temporary table xx like xx with (xxx)
> >>>>>>>>> 2. use custom "from t with (xxx)" syntax
> >>>>>>>>> 3. "Borrow" the table hints to have a special PROPERTIES
> >> hint.
> >>>>>>>>>
> >>>>>>>>> The first one didn't break anything, but the only problem i
> >> see
> >>>> is a
> >>>>>>>> little
> >>>>>>>>> more verbose than the table hint
> >>>>>>>>> approach. I can imagine when someone using SQL CLI to have a
> >> sql
> >>>>>>>>> experience, it's quite often that
> >>>>>>>>> he will modify the table property, some use cases i can
> >> think of:
> >>>>>>>>> 1. the source contains some corrupted data, i want to turn
> >> on the
> >>>>>>>>> "ignore-error" flag for certain formats.
> >>>>>>>>> 2. I have a kafka table and want to see some sample data
> >> from the
> >>>>>>>>> beginning, so i change the offset
> >>>>>>>>> to "earliest", and then I want to observe the latest data
> >> which
> >>>> keeps
> >>>>>>>>> coming in. I would write another query
> >>>>>>>>> to select from the latest table.
> >>>>>>>>> 3. I want to my jdbc sink flush data more eagerly then i can
> >>>> observe
> >>>>>>> the
> >>>>>>>>> data from database side.
> >>>>>>>>>
> >>>>>>>>> Most of such use cases are quite ad-hoc. If every time I
> >> want to
> >>>> have a
> >>>>>>>>> different experience, i need to create
> >>>>>>>>> a temporary table and then also modify my query, it doesn't
> >> feel
> >>>>>>> smooth.
> >>>>>>>>> Embed such dynamic property into
> >>>>>>>>> query would have better user experience.
> >>>>>>>>>
> >>>>>>>>> Both 2 & 3 can make this happen. The cons of #2 is breaking
> >> SQL
> >>>>>>>> compliant,
> >>>>>>>>> and for #3, it only breaks some
> >>>>>>>>> unwritten rules, but we can have an explanation on that. And
> >> I
> >>>> really
> >>>>>>>> doubt
> >>>>>>>>> whether user would complain about
> >>>>>>>>> this when they actually have flexible and good experience
> >> using
> >>>> this.
> >>>>>>>>>
> >>>>>>>>> My tendency would be #3 > #1 > #2, what do you think?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Kurt
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 12, 2020 at 1:11 PM Danny Chan <
> >> yuzhao....@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Aljoscha ~
> >>>>>>>>>>
> >>>>>>>>>> I agree for most of the query hints, they are optional as
> >> an
> >>>>>>> optimizer
> >>>>>>>>>> instruction, especially for the traditional RDBMS.
> >>>>>>>>>>
> >>>>>>>>>> But, just like BenChao said, Flink as a computation engine
> >> has
> >>>> many
> >>>>>>>>>> different kind of data sources, thus, dynamic parameters
> >> like
> >>>>>>>>> start_offest
> >>>>>>>>>> can only bind to each table scope, we can not set a session
> >>>> config
> >>>>>>> like
> >>>>>>>>>> KSQL because they are all about Kafka:
> >>>>>>>>>>> SET ‘auto.offset.reset’=‘earliest’;
> >>>>>>>>>>
> >>>>>>>>>> Thus the most flexible way to set up these dynamic params
> >> is
> >>>> to bind
> >>>>>>> to
> >>>>>>>>>> the table scope in the query when we want to override
> >>>> something, so
> >>>>>>> we
> >>>>>>>>> have
> >>>>>>>>>> these solutions above (with pros and cons from my side):
> >>>>>>>>>>
> >>>>>>>>>> • 1. Select * from t(offset=123) (from Timo)
> >>>>>>>>>>
> >>>>>>>>>> Pros:
> >>>>>>>>>> - Easy to add
> >>>>>>>>>> - Parameters are part of the main query
> >>>>>>>>>> Cons:
> >>>>>>>>>> - Not SQL compliant
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> • 2. Select * from t /*+ PROPERTIES(offset=123) */ (from
> >> me)
> >>>>>>>>>>
> >>>>>>>>>> Pros:
> >>>>>>>>>> - Easy to add
> >>>>>>>>>> - SQL compliant because it is nested in the comments
> >>>>>>>>>>
> >>>>>>>>>> Cons:
> >>>>>>>>>> - Parameters are not part of the main query
> >>>>>>>>>> - Cryptic syntax for new users
> >>>>>>>>>>
> >>>>>>>>>> The biggest problem for hints way may be the “if hints
> >> must be
> >>>>>>>> optional”,
> >>>>>>>>>> actually we have though about 1 for a while but aborted
> >>>> because it
> >>>>>>>> breaks
> >>>>>>>>>> the SQL standard too much. And we replace it with 2,
> >> because
> >>>> the
> >>>>>>> hints
> >>>>>>>>>> syntax do not break SQL standard(nested in comments).
> >>>>>>>>>>
> >>>>>>>>>> What if we have the special /*+ PROPERTIES */ hint that
> >> allows
> >>>>>>> override
> >>>>>>>>>> some properties of table dynamically, it does not break
> >>>> anything, at
> >>>>>>>>> lease
> >>>>>>>>>> for current Flink use cases.
> >>>>>>>>>>
> >>>>>>>>>> Planner hints are optional just because they are naturally
> >>>> enforcers
> >>>>>>> of
> >>>>>>>>>> the planner, most of them aim to instruct the optimizer,
> >> but,
> >>>> the
> >>>>>>> table
> >>>>>>>>>> hints is a little different, table hints can specify the
> >> table
> >>>> meta
> >>>>>>>> like
> >>>>>>>>>> index column, and it is very convenient to specify table
> >>>> properties.
> >>>>>>>>>>
> >>>>>>>>>> Or shall we not call /*+ PROPERTIES(offset=123) */ table
> >> hint,
> >>>> we
> >>>>>>> can
> >>>>>>>>>> call it table dynamic parameters.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Danny Chan
> >>>>>>>>>> 在 2020年3月11日 +0800 PM9:20,Aljoscha Krettek <
> >>>> aljos...@apache.org>,写道:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I don't understand this discussion. Hints, as I
> >> understand
> >>>> them,
> >>>>>>>> should
> >>>>>>>>>>> work like this:
> >>>>>>>>>>>
> >>>>>>>>>>> - hints are *optional* advice for the optimizer to try
> >> and
> >>>> help it
> >>>>>>> to
> >>>>>>>>>>> find a good execution strategy
> >>>>>>>>>>> - hints should not change query semantics, i.e. they
> >> should
> >>>> not
> >>>>>>>> change
> >>>>>>>>>>> connector properties executing a query with taking into
> >>>> account the
> >>>>>>>>>>> hints *must* produce the same result as executing the
> >> query
> >>>> without
> >>>>>>>>>>> taking into account the hints
> >>>>>>>>>>>
> >>>>>>>>>>>  From these simple requirements you can derive a solution
> >>>> that makes
> >>>>>>>>>>> sense. I don't have a strong preference for the syntax
> >> but we
> >>>>>>> should
> >>>>>>>>>>> strive to be in line with prior work.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Aljoscha
> >>>>>>>>>>>
> >>>>>>>>>>> On 11.03.20 11:53, Danny Chan wrote:
> >>>>>>>>>>>> Thanks Timo for summarize the 3 options ~
> >>>>>>>>>>>>
> >>>>>>>>>>>> I agree with Kurt that option2 is too complicated to
> >> use
> >>>> because:
> >>>>>>>>>>>>
> >>>>>>>>>>>> • As a Kafka topic consumer, the user must define both
> >> the
> >>>>>>> virtual
> >>>>>>>>>> column for start offset and he must apply a special filter
> >>>> predicate
> >>>>>>>>> after
> >>>>>>>>>> each query
> >>>>>>>>>>>> • And for the internal implementation, the metadata
> >> column
> >>>> push
> >>>>>>>> down
> >>>>>>>>>> is another hard topic, each kind of message queue may have
> >> its
> >>>> offset
> >>>>>>>>>> attribute, we need to consider the expression type for
> >>>> different
> >>>>>>> kind;
> >>>>>>>>> the
> >>>>>>>>>> source also need to recognize the constant column as a
> >> config
> >>>>>>>>> option(which
> >>>>>>>>>> is weird because usually what we pushed down is a table
> >> column)
> >>>>>>>>>>>>
> >>>>>>>>>>>> For option 1 and option3, I think there is no
> >> difference,
> >>>> option1
> >>>>>>>> is
> >>>>>>>>>> also a hint syntax which is introduced in Sybase and
> >>>> referenced then
> >>>>>>>>>> deprecated by MS-SQL in 199X years because of the
> >>>> ambitiousness.
> >>>>>>>>> Personally
> >>>>>>>>>> I prefer /*+ */ style table hint than WITH keyword for
> >> these
> >>>> reasons:
> >>>>>>>>>>>>
> >>>>>>>>>>>> • We do not break the standard SQL, the hints are
> >> nested
> >>>> in SQL
> >>>>>>>>>> comments
> >>>>>>>>>>>> • We do not need to introduce additional WITH keyword
> >>>> which may
> >>>>>>>>> appear
> >>>>>>>>>> in a query if we use that because a table can be
> >> referenced in
> >>>> all
> >>>>>>>> kinds
> >>>>>>>>> of
> >>>>>>>>>> SQL contexts: INSERT/DELETE/FROM/JOIN …. That would make
> >> our
> >>>> sql
> >>>>>>> query
> >>>>>>>>>> break too much of the SQL from standard
> >>>>>>>>>>>> • We would have uniform syntax for hints as query
> >> hint, one
> >>>>>>> syntax
> >>>>>>>>>> fits all and more easy to use
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> And here is the reason why we choose a uniform Oracle
> >>>> style query
> >>>>>>>>>> hint syntax which is addressed by Julian Hyde when we
> >> design
> >>>> the
> >>>>>>> syntax
> >>>>>>>>>> from the Calcite community:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I don’t much like the MSSQL-style syntax for table
> >> hints.
> >>>> It
> >>>>>>> adds a
> >>>>>>>>>> new use of the WITH keyword that is unrelated to the use of
> >>>> WITH for
> >>>>>>>>>> common-table expressions.
> >>>>>>>>>>>>
> >>>>>>>>>>>> A historical note. Microsoft SQL Server inherited its
> >> hint
> >>>> syntax
> >>>>>>>>> from
> >>>>>>>>>> Sybase a very long time ago. (See “Transact SQL
> >>>> Programming”[1], page
> >>>>>>>>> 632,
> >>>>>>>>>> “Optimizer hints”. The book was written in 1999, and covers
> >>>> Microsoft
> >>>>>>>> SQL
> >>>>>>>>>> Server 6.5 / 7.0 and Sybase Adaptive Server 11.5, but the
> >>>> syntax very
> >>>>>>>>>> likely predates Sybase 4.3, from which Microsoft SQL
> >> Server was
> >>>>>>> forked
> >>>>>>>> in
> >>>>>>>>>> 1993.)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Microsoft later added the WITH keyword to make it less
> >>>> ambiguous,
> >>>>>>>> and
> >>>>>>>>>> has now deprecated the syntax that does not use WITH.
> >>>>>>>>>>>>
> >>>>>>>>>>>> They are forced to keep the syntax for backwards
> >>>> compatibility
> >>>>>>> but
> >>>>>>>>>> that doesn’t mean that we should shoulder their burden.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think formatted comments are the right container for
> >>>> hints
> >>>>>>>> because
> >>>>>>>>>> it allows us to change the hint syntax without changing
> >> the SQL
> >>>>>>> parser,
> >>>>>>>>> and
> >>>>>>>>>> makes clear that we are at liberty to ignore hints
> >> entirely.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Julian
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://www.amazon.com/s?k=9781565924017 <
> >>>>>>>>>> https://www.amazon.com/s?k=9781565924017>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>> 在 2020年3月11日 +0800 PM4:03,Timo Walther <
> >> twal...@apache.org
> >>>>> ,写道:
> >>>>>>>>>>>>> Hi Danny,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> it is true that our DDL is not standard compliant by
> >>>> using the
> >>>>>>>> WITH
> >>>>>>>>>>>>> clause. Nevertheless, we aim for not diverging too
> >> much
> >>>> and the
> >>>>>>>>> LIKE
> >>>>>>>>>>>>> clause is an example of that. It will solve things
> >> like
> >>>>>>>> overwriting
> >>>>>>>>>>>>> WATERMARKs, add additional/modifying properties and
> >>>> inherit
> >>>>>>>> schema.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Bowen is right that Flink's DDL is mixing 3 types
> >>>> definition
> >>>>>>>>>> together.
> >>>>>>>>>>>>> We are not the first ones that try to solve this.
> >> There
> >>>> is also
> >>>>>>>> the
> >>>>>>>>>> SQL
> >>>>>>>>>>>>> MED standard [1] that tried to tackle this problem. I
> >>>> think it
> >>>>>>>> was
> >>>>>>>>>> not
> >>>>>>>>>>>>> considered when designing the current DDL.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Currently, I see 3 options for handling Kafka
> >> offsets. I
> >>>> will
> >>>>>>>> give
> >>>>>>>>>> some
> >>>>>>>>>>>>> examples and look forward to feedback here:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *Option 1* Runtime and semantic parms as part of the
> >>>> query
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> `SELECT * FROM MyTable('offset'=123)`
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Pros:
> >>>>>>>>>>>>> - Easy to add
> >>>>>>>>>>>>> - Parameters are part of the main query
> >>>>>>>>>>>>> - No complicated hinting syntax
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cons:
> >>>>>>>>>>>>> - Not SQL compliant
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *Option 2* Use metadata in query
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> `CREATE TABLE MyTable (id INT, offset AS
> >>>>>>>>> SYSTEM_METADATA('offset'))`
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> `SELECT * FROM MyTable WHERE offset > TIMESTAMP
> >>>> '2012-12-12
> >>>>>>>>>> 12:34:22'`
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Pros:
> >>>>>>>>>>>>> - SQL compliant in the query
> >>>>>>>>>>>>> - Access of metadata in the DDL which is required
> >> anyway
> >>>>>>>>>>>>> - Regular pushdown rules apply
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cons:
> >>>>>>>>>>>>> - Users need to add an additional comlumn in the DDL
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> *Option 3*: Use hints for properties
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> `
> >>>>>>>>>>>>> SELECT *
> >>>>>>>>>>>>> FROM MyTable /*+ PROPERTIES('offset'=123) */
> >>>>>>>>>>>>> `
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Pros:
> >>>>>>>>>>>>> - Easy to add
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cons:
> >>>>>>>>>>>>> - Parameters are not part of the main query
> >>>>>>>>>>>>> - Cryptic syntax for new users
> >>>>>>>>>>>>> - Not standard compliant.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If we go with this option, I would suggest to make it
> >>>> available
> >>>>>>>> in
> >>>>>>>>> a
> >>>>>>>>>>>>> separate map and don't mix it with statically defined
> >>>>>>> properties.
> >>>>>>>>>> Such
> >>>>>>>>>>>>> that the factory can decide which properties have the
> >>>> right to
> >>>>>>> be
> >>>>>>>>>>>>> overwritten by the hints:
> >>>>>>>>>>>>> TableSourceFactory.Context.getQueryHints():
> >>>> ReadableConfig
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1] https://en.wikipedia.org/wiki/SQL/MED
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Currently I see 3 options as a
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 11.03.20 07:21, Danny Chan wrote:
> >>>>>>>>>>>>>> Thanks Bowen ~
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I agree we should somehow categorize our connector
> >>>>>>> parameters.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For type1, I’m already preparing a solution like
> >> the
> >>>>>>> Confluent
> >>>>>>>>>> schema registry + Avro schema inference thing, so this may
> >> not
> >>>> be a
> >>>>>>>>> problem
> >>>>>>>>>> in the near future.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For type3, I have some questions:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> "SELECT * FROM mykafka WHERE offset > 12pm
> >> yesterday”
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Where does the offset column come from, a virtual
> >>>> column from
> >>>>>>>> the
> >>>>>>>>>> table schema, you said that
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> They change
> >>>>>>>>>>>>>> almost every time a query starts and have nothing
> >> to
> >>>> do with
> >>>>>>>>>> metadata, thus
> >>>>>>>>>>>>>> should not be part of table definition/DDL
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> But why you can reference it in the query, I’m
> >>>> confused for
> >>>>>>>> that,
> >>>>>>>>>> can you elaborate a little ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM12:52,Bowen Li <
> >>>> bowenl...@gmail.com
> >>>>>>>> ,写道:
> >>>>>>>>>>>>>>> Thanks Danny for kicking off the effort
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The root cause of too much manual work is Flink
> >> DDL
> >>>> has
> >>>>>>>> mixed 3
> >>>>>>>>>> types of
> >>>>>>>>>>>>>>> params together and doesn't handle each of them
> >> very
> >>>> well.
> >>>>>>>>> Below
> >>>>>>>>>> are how I
> >>>>>>>>>>>>>>> categorize them and corresponding solutions in my
> >>>> mind:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - type 1: Metadata of external data, like
> >> external
> >>>>>>>>> endpoint/url,
> >>>>>>>>>>>>>>> username/pwd, schemas, formats.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Such metadata are mostly already accessible in
> >>>> external
> >>>>>>>> system
> >>>>>>>>>> as long as
> >>>>>>>>>>>>>>> endpoints and credentials are provided. Flink can
> >>>> get it
> >>>>>>> thru
> >>>>>>>>>> catalogs, but
> >>>>>>>>>>>>>>> we haven't had many catalogs yet and thus Flink
> >> just
> >>>> hasn't
> >>>>>>>>> been
> >>>>>>>>>> able to
> >>>>>>>>>>>>>>> leverage that. So the solution should be building
> >>>> more
> >>>>>>>>> catalogs.
> >>>>>>>>>> Such
> >>>>>>>>>>>>>>> params should be part of a Flink table
> >>>> DDL/definition, and
> >>>>>>>> not
> >>>>>>>>>> overridable
> >>>>>>>>>>>>>>> in any means.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - type 2: Runtime params, like jdbc connector's
> >>>> fetch size,
> >>>>>>>>>> elasticsearch
> >>>>>>>>>>>>>>> connector's bulk flush size.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Such params don't affect query results, but
> >> affect
> >>>> how
> >>>>>>>> results
> >>>>>>>>>> are produced
> >>>>>>>>>>>>>>> (eg. fast or slow, aka performance) - they are
> >>>> essentially
> >>>>>>>>>> execution and
> >>>>>>>>>>>>>>> implementation details. They change often in
> >>>> exploration or
> >>>>>>>>>> development
> >>>>>>>>>>>>>>> stages, but not quite frequently in well-defined
> >>>>>>> long-running
> >>>>>>>>>> pipelines.
> >>>>>>>>>>>>>>> They should always have default values and can be
> >>>> missing
> >>>>>>> in
> >>>>>>>>>> query. They
> >>>>>>>>>>>>>>> can be part of a table DDL/definition, but should
> >>>> also be
> >>>>>>>>>> replaceable in a
> >>>>>>>>>>>>>>> query - *this is what table "hints" in FLIP-113
> >>>> should
> >>>>>>>> cover*.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - type 3: Semantic params, like kafka connector's
> >>>> start
> >>>>>>>> offset.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Such params affect query results - the semantics.
> >>>> They'd
> >>>>>>>> better
> >>>>>>>>>> be as
> >>>>>>>>>>>>>>> filter conditions in WHERE clause that can be
> >> pushed
> >>>> down.
> >>>>>>>> They
> >>>>>>>>>> change
> >>>>>>>>>>>>>>> almost every time a query starts and have
> >> nothing to
> >>>> do
> >>>>>>> with
> >>>>>>>>>> metadata, thus
> >>>>>>>>>>>>>>> should not be part of table definition/DDL, nor
> >> be
> >>>>>>> persisted
> >>>>>>>> in
> >>>>>>>>>> catalogs.
> >>>>>>>>>>>>>>> If they will, users should create views to keep
> >> such
> >>>> params
> >>>>>>>>>> around (note
> >>>>>>>>>>>>>>> this is different from variable substitution).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Take Flink-Kafka as an example. Once we get these
> >>>> params
> >>>>>>>> right,
> >>>>>>>>>> here're the
> >>>>>>>>>>>>>>> steps users need to do to develop and run a Flink
> >>>> job:
> >>>>>>>>>>>>>>> - configure a Flink ConfluentSchemaRegistry with
> >> url,
> >>>>>>>> username,
> >>>>>>>>>> and password
> >>>>>>>>>>>>>>> - run "SELECT * FROM mykafka WHERE offset > 12pm
> >>>> yesterday"
> >>>>>>>>>> (simplified
> >>>>>>>>>>>>>>> timestamp) in SQL CLI, Flink automatically
> >> retrieves
> >>>> all
> >>>>>>>>>> metadata of
> >>>>>>>>>>>>>>> schema, file format, etc and start the job
> >>>>>>>>>>>>>>> - users want to make the job read Kafka topic
> >>>> faster, so it
> >>>>>>>>> goes
> >>>>>>>>>> as "SELECT
> >>>>>>>>>>>>>>> * FROM mykafka /* faster_read_key=value*/ WHERE
> >>>> offset >
> >>>>>>> 12pm
> >>>>>>>>>> yesterday"
> >>>>>>>>>>>>>>> - done and satisfied, users submit it to
> >> production
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regarding "CREATE TABLE t LIKE with (k1=v1,
> >> k2=v2),
> >>>> I think
> >>>>>>>>> it's
> >>>>>>>>>> a
> >>>>>>>>>>>>>>> nice-to-have feature, but not a strategically
> >>>> critical,
> >>>>>>>>>> long-term solution,
> >>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>> 1) It may seem promising at the current stage to
> >>>> solve the
> >>>>>>>>>>>>>>> too-much-manual-work problem, but that's only
> >>>> because Flink
> >>>>>>>>>> hasn't
> >>>>>>>>>>>>>>> leveraged catalogs well and handled the 3 types
> >> of
> >>>> params
> >>>>>>>> above
> >>>>>>>>>> properly.
> >>>>>>>>>>>>>>> Once we get the params types right, the LIKE
> >> syntax
> >>>> won't
> >>>>>>> be
> >>>>>>>>> that
> >>>>>>>>>>>>>>> important, and will be just an easier way to
> >> create
> >>>> tables
> >>>>>>>>>> without retyping
> >>>>>>>>>>>>>>> long fields like username and pwd.
> >>>>>>>>>>>>>>> 2) Note that only some rare type of catalog can
> >>>> store k-v
> >>>>>>>>>> property pair, so
> >>>>>>>>>>>>>>> table created this way often cannot be
> >> persisted. In
> >>>> the
> >>>>>>>>>> foreseeable
> >>>>>>>>>>>>>>> future, such catalog will only be HiveCatalog,
> >> and
> >>>> not
> >>>>>>>> everyone
> >>>>>>>>>> has a Hive
> >>>>>>>>>>>>>>> metastore. To be honest, without persistence,
> >>>> recreating
> >>>>>>>> tables
> >>>>>>>>>> every time
> >>>>>>>>>>>>>>> this way is still a lot of keyboard typing.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>> Bowen
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 8:07 PM Kurt Young <
> >>>>>>> ykt...@gmail.com
> >>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If a specific connector want to have such
> >>>> parameter and
> >>>>>>>> read
> >>>>>>>>>> if out of
> >>>>>>>>>>>>>>>> configuration, then that's fine.
> >>>>>>>>>>>>>>>> If we are talking about a configuration for all
> >>>> kinds of
> >>>>>>>>>> sources, I would
> >>>>>>>>>>>>>>>> be super careful about that.
> >>>>>>>>>>>>>>>> It's true it can solve maybe 80% cases, but it
> >>>> will also
> >>>>>>>> make
> >>>>>>>>>> the left 20%
> >>>>>>>>>>>>>>>> feels weird.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Mar 11, 2020 at 11:00 AM Jark Wu <
> >>>>>>> imj...@gmail.com
> >>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi Kurt,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> #3 Regarding to global offset:
> >>>>>>>>>>>>>>>>> I'm not saying to use the global
> >> configuration to
> >>>>>>>> override
> >>>>>>>>>> connector
> >>>>>>>>>>>>>>>>> properties by the planner.
> >>>>>>>>>>>>>>>>> But the connector should take this
> >> configuration
> >>>> and
> >>>>>>>>>> translate into their
> >>>>>>>>>>>>>>>>> client API.
> >>>>>>>>>>>>>>>>> AFAIK, almost all the message queues support
> >>>> eariliest
> >>>>>>>> and
> >>>>>>>>>> latest and a
> >>>>>>>>>>>>>>>>> timestamp value as start point.
> >>>>>>>>>>>>>>>>> So we can support 3 options for this
> >>>> configuration:
> >>>>>>>>>> "eariliest", "latest"
> >>>>>>>>>>>>>>>>> and a timestamp string value.
> >>>>>>>>>>>>>>>>> Of course, this can't solve 100% cases, but I
> >>>> guess can
> >>>>>>>>>> sovle 80% or 90%
> >>>>>>>>>>>>>>>>> cases.
> >>>>>>>>>>>>>>>>> And the remaining cases can be resolved by
> >> LIKE
> >>>> syntax
> >>>>>>>>> which
> >>>>>>>>>> I guess is
> >>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>> very common cases.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, 11 Mar 2020 at 10:33, Kurt Young <
> >>>>>>>> ykt...@gmail.com
> >>>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Good to have such lovely discussions. I
> >> also
> >>>> want to
> >>>>>>>>> share
> >>>>>>>>>> some of my
> >>>>>>>>>>>>>>>>>> opinions.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> #1 Regarding to error handling: I also
> >> think
> >>>> ignore
> >>>>>>>>>> invalid hints would
> >>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>> dangerous, maybe
> >>>>>>>>>>>>>>>>>> the simplest solution is just throw an
> >>>> exception.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> #2 Regarding to property replacement: I
> >> don't
> >>>> think
> >>>>>>> we
> >>>>>>>>>> should
> >>>>>>>>>>>>>>>> constraint
> >>>>>>>>>>>>>>>>>> ourself to
> >>>>>>>>>>>>>>>>>> the meaning of the word "hint", and
> >> forbidden
> >>>> it
> >>>>>>>>> modifying
> >>>>>>>>>> any
> >>>>>>>>>>>>>>>> properties
> >>>>>>>>>>>>>>>>>> which can effect
> >>>>>>>>>>>>>>>>>> query results. IMO `PROPERTIES` is one of
> >> the
> >>>> table
> >>>>>>>>> hints,
> >>>>>>>>>> and a
> >>>>>>>>>>>>>>>> powerful
> >>>>>>>>>>>>>>>>>> one. It can
> >>>>>>>>>>>>>>>>>> modify properties located in DDL's WITH
> >> block.
> >>>> But I
> >>>>>>>> also
> >>>>>>>>>> see the harm
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> if we make it
> >>>>>>>>>>>>>>>>>> too flexible like change the kafka topic
> >> name
> >>>> with a
> >>>>>>>>> hint.
> >>>>>>>>>> Such use
> >>>>>>>>>>>>>>>> case
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> not common and
> >>>>>>>>>>>>>>>>>> sounds very dangerous to me. I would
> >> propose
> >>>> we have
> >>>>>>> a
> >>>>>>>>> map
> >>>>>>>>>> of hintable
> >>>>>>>>>>>>>>>>>> properties for each
> >>>>>>>>>>>>>>>>>> connector, and should validate all passed
> >> in
> >>>>>>> properties
> >>>>>>>>>> are actually
> >>>>>>>>>>>>>>>>>> hintable. And combining with
> >>>>>>>>>>>>>>>>>> #1 error handling, we can throw an
> >> exception
> >>>> once
> >>>>>>>>> received
> >>>>>>>>>> invalid
> >>>>>>>>>>>>>>>>>> property.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> #3 Regarding to global offset: I'm not sure
> >>>> it's
> >>>>>>>>> feasible.
> >>>>>>>>>> Different
> >>>>>>>>>>>>>>>>>> connectors will have totally
> >>>>>>>>>>>>>>>>>> different properties to represent offset,
> >> some
> >>>> might
> >>>>>>> be
> >>>>>>>>>> timestamps,
> >>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>> might be string literals
> >>>>>>>>>>>>>>>>>> like "earliest", and others might be just
> >>>> integers.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <
> >>>>>>>>> imj...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I want to jump in the discussion about
> >> the
> >>>> "dynamic
> >>>>>>>>>> start offset"
> >>>>>>>>>>>>>>>>>> problem.
> >>>>>>>>>>>>>>>>>>> First of all, I share the same concern
> >> with
> >>>> Timo
> >>>>>>> and
> >>>>>>>>>> Fabian, that the
> >>>>>>>>>>>>>>>>>>> "start offset" affects the query
> >> semantics,
> >>>> i.e.
> >>>>>>> the
> >>>>>>>>>> query result.
> >>>>>>>>>>>>>>>>>>> But "hints" is just used for optimization
> >>>> which
> >>>>>>>> should
> >>>>>>>>>> affect the
> >>>>>>>>>>>>>>>>> result?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I think the "dynamic start offset" is an
> >> very
> >>>>>>>> important
> >>>>>>>>>> usability
> >>>>>>>>>>>>>>>>> problem
> >>>>>>>>>>>>>>>>>>> which will be faced by many streaming
> >>>> platforms.
> >>>>>>>>>>>>>>>>>>> I also agree "CREATE TEMPORARY TABLE Temp
> >>>> (LIKE t)
> >>>>>>>> WITH
> >>>>>>>>>>>>>>>>>>> ('connector.startup-timestamp-millis' =
> >>>>>>>>>> '1578538374471')" is verbose,
> >>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>> if we have 10 tables to join?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> However, what I want to propose (should
> >> be
> >>>> another
> >>>>>>>>>> thread) is a
> >>>>>>>>>>>>>>>> global
> >>>>>>>>>>>>>>>>>>> configuration to reset start offsets of
> >> all
> >>>> the
> >>>>>>>> source
> >>>>>>>>>> connectors
> >>>>>>>>>>>>>>>>>>> in the query session, e.g.
> >>>>>>>>> "table.sources.start-offset".
> >>>>>>>>>> This is
> >>>>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>>>>>> now because `TableSourceFactory.Context`
> >> has
> >>>>>>>>>> `getConfiguration`
> >>>>>>>>>>>>>>>>>>> method to get the session configuration,
> >> and
> >>>> use it
> >>>>>>>> to
> >>>>>>>>>> create an
> >>>>>>>>>>>>>>>>> adapted
> >>>>>>>>>>>>>>>>>>> TableSource.
> >>>>>>>>>>>>>>>>>>> Then we can also expose to SQL CLI via
> >> SET
> >>>> command,
> >>>>>>>>> e.g.
> >>>>>>>>>> `SET
> >>>>>>>>>>>>>>>>>>>
> >> 'table.sources.start-offset'='earliest';`,
> >>>> which is
> >>>>>>>>>> pretty simple and
> >>>>>>>>>>>>>>>>>>> straightforward.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> This is very similar to KSQL's `SET
> >>>>>>>>>> 'auto.offset.reset'='earliest'`
> >>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>> is very helpful IMO.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at 22:29, Timo
> >> Walther <
> >>>>>>>>>> twal...@apache.org>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Hi Danny,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> compared to the hints, FLIP-110 is
> >> fully
> >>>>>>> compliant
> >>>>>>>> to
> >>>>>>>>>> the SQL
> >>>>>>>>>>>>>>>>> standard.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I don't think that `CREATE TEMPORARY
> >> TABLE
> >>>> Temp
> >>>>>>>> (LIKE
> >>>>>>>>>> t) WITH
> >>>>>>>>>>>>>>>> (k=v)`
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> too verbose or awkward for the power of
> >>>> basically
> >>>>>>>>>> changing the
> >>>>>>>>>>>>>>>> entire
> >>>>>>>>>>>>>>>>>>>> connector. Usually, this statement
> >> would
> >>>> just
> >>>>>>>> precede
> >>>>>>>>>> the query in
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>> multiline file. So it can be change
> >>>> "in-place"
> >>>>>>> like
> >>>>>>>>>> the hints you
> >>>>>>>>>>>>>>>>>>> proposed.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Many companies have a well-defined set
> >> of
> >>>> tables
> >>>>>>>> that
> >>>>>>>>>> should be
> >>>>>>>>>>>>>>>> used.
> >>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>> would be dangerous if users can change
> >> the
> >>>> path
> >>>>>>> or
> >>>>>>>>>> topic in a hint.
> >>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>> catalog/catalog manager should be the
> >>>> entity that
> >>>>>>>>>> controls which
> >>>>>>>>>>>>>>>>> tables
> >>>>>>>>>>>>>>>>>>>> exist and how they can be accessed.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> what’s the problem there if we user
> >> the
> >>>> table
> >>>>>>>> hints
> >>>>>>>>>> to support
> >>>>>>>>>>>>>>>>>> “start
> >>>>>>>>>>>>>>>>>>>> offset”?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> IMHO it violates the meaning of a hint.
> >>>> According
> >>>>>>>> to
> >>>>>>>>>> the
> >>>>>>>>>>>>>>>> dictionary,
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>> hint is "a statement that expresses
> >>>> indirectly
> >>>>>>> what
> >>>>>>>>>> one prefers not
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> say explicitly". But offsets are a
> >>>> property that
> >>>>>>>> are
> >>>>>>>>>> very explicit.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> If we go with the hint approach, it
> >> should
> >>>> be
> >>>>>>>>>> expressible in the
> >>>>>>>>>>>>>>>>>>>> TableSourceFactory which properties are
> >>>> supported
> >>>>>>>> for
> >>>>>>>>>> hinting. Or
> >>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>> plan to offer those hints in a separate
> >>>>>>> Map<String,
> >>>>>>>>>> String> that
> >>>>>>>>>>>>>>>>> cannot
> >>>>>>>>>>>>>>>>>>>> overwrite existing properties? I think
> >>>> this would
> >>>>>>>> be
> >>>>>>>>> a
> >>>>>>>>>> different
> >>>>>>>>>>>>>>>>>> story...
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On 10.03.20 10:34, Danny Chan wrote:
> >>>>>>>>>>>>>>>>>>>>> Thanks Timo ~
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Personally I would say that offset >
> >> 0
> >>>> and
> >>>>>>> start
> >>>>>>>>>> offset = 10 does
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>> have the same semantic, so from the SQL
> >>>> aspect,
> >>>>>>> we
> >>>>>>>>> can
> >>>>>>>>>> not
> >>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>> “starting offset” hint for query with
> >> such
> >>>> a
> >>>>>>>> syntax.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> And the CREATE TABLE LIKE syntax is a
> >>>> DDL which
> >>>>>>>> is
> >>>>>>>>>> just verbose
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> defining such dynamic parameters even
> >> if
> >>>> it could
> >>>>>>>> do
> >>>>>>>>>> that, shall we
> >>>>>>>>>>>>>>>>>> force
> >>>>>>>>>>>>>>>>>>>> users to define a temporal table for
> >> each
> >>>> query
> >>>>>>>> with
> >>>>>>>>>> dynamic
> >>>>>>>>>>>>>>>> params,
> >>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>> would say it’s an awkward solution.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> "Hints should give "hints" but not
> >>>> affect the
> >>>>>>>>> actual
> >>>>>>>>>> produced
> >>>>>>>>>>>>>>>>>> result.”
> >>>>>>>>>>>>>>>>>>>> You mentioned that multiple times and
> >>>> could we
> >>>>>>>> give a
> >>>>>>>>>> reason,
> >>>>>>>>>>>>>>>> what’s
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> problem there if we user the table
> >> hints to
> >>>>>>> support
> >>>>>>>>>> “start offset”
> >>>>>>>>>>>>>>>> ?
> >>>>>>>>>>>>>>>>>> From
> >>>>>>>>>>>>>>>>>>>> my side I saw some benefits for that:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> • It’s very convent to set up these
> >>>> parameters,
> >>>>>>>> the
> >>>>>>>>>> syntax is
> >>>>>>>>>>>>>>>> very
> >>>>>>>>>>>>>>>>>> much
> >>>>>>>>>>>>>>>>>>>> like the DDL definition
> >>>>>>>>>>>>>>>>>>>>> • It’s scope is very clear, right on
> >> the
> >>>> table
> >>>>>>> it
> >>>>>>>>>> attathed
> >>>>>>>>>>>>>>>>>>>>> • It does not affect the table
> >> schema,
> >>>> which
> >>>>>>>> means
> >>>>>>>>>> in order to
> >>>>>>>>>>>>>>>>>> specify
> >>>>>>>>>>>>>>>>>>>> the offset, there is no need to define
> >> an
> >>>> offset
> >>>>>>>>>> column which is
> >>>>>>>>>>>>>>>>> weird
> >>>>>>>>>>>>>>>>>>>> actually, offset should never be a
> >> column,
> >>>> it’s
> >>>>>>>> more
> >>>>>>>>>> like a
> >>>>>>>>>>>>>>>> metadata
> >>>>>>>>>>>>>>>>>> or a
> >>>>>>>>>>>>>>>>>>>> start option.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> So in total, FLIP-110 uses the offset
> >>>> more
> >>>>>>> like a
> >>>>>>>>>> Hive partition
> >>>>>>>>>>>>>>>>>> prune,
> >>>>>>>>>>>>>>>>>>>> we can do that if we have an offset
> >>>> column, but
> >>>>>>>> most
> >>>>>>>>>> of the case we
> >>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>> define that, so there is actually no
> >>>> conflict or
> >>>>>>>>>> overlap.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800 PM4:28,Timo
> >> Walther <
> >>>>>>>>>> twal...@apache.org>,写道:
> >>>>>>>>>>>>>>>>>>>>>> Hi Danny,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> shouldn't FLIP-110[1] solve most
> >> of the
> >>>>>>>> problems
> >>>>>>>>>> we have around
> >>>>>>>>>>>>>>>>>>> defining
> >>>>>>>>>>>>>>>>>>>>>> table properties more dynamically
> >>>> without
> >>>>>>>> manual
> >>>>>>>>>> schema work?
> >>>>>>>>>>>>>>>> Also
> >>>>>>>>>>>>>>>>>>>>>> offset definition is easier with
> >> such a
> >>>>>>> syntax.
> >>>>>>>>>> They must not be
> >>>>>>>>>>>>>>>>>>> defined
> >>>>>>>>>>>>>>>>>>>>>> in catalog but could be temporary
> >>>> tables that
> >>>>>>>>>> extend from the
> >>>>>>>>>>>>>>>>>> original
> >>>>>>>>>>>>>>>>>>>>>> table.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> In general, we should aim to keep
> >> the
> >>>> syntax
> >>>>>>>>>> concise and don't
> >>>>>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>>>>> too many ways of doing the same
> >> thing.
> >>>> Hints
> >>>>>>>>>> should give "hints"
> >>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>> affect the actual produced result.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Some connector properties might
> >> also
> >>>> change
> >>>>>>> the
> >>>>>>>>>> plan or schema
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> future. E.g. they might also define
> >>>> whether a
> >>>>>>>>>> table source
> >>>>>>>>>>>>>>>>> supports
> >>>>>>>>>>>>>>>>>>>>>> certain push-downs (e.g. predicate
> >>>>>>> push-down).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Dawid is currently working a draft
> >>>> that might
> >>>>>>>>>> makes it possible
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>> expose a Kafka offset via the
> >> schema
> >>>> such
> >>>>>>> that
> >>>>>>>>>> `SELECT * FROM
> >>>>>>>>>>>>>>>>> Topic
> >>>>>>>>>>>>>>>>>>>>>> WHERE offset > 10` would become
> >>>> possible and
> >>>>>>>>> could
> >>>>>>>>>> be pushed
> >>>>>>>>>>>>>>>> down.
> >>>>>>>>>>>>>>>>>> But
> >>>>>>>>>>>>>>>>>>>>>> this is of course, not planned
> >>>> initially.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On 10.03.20 08:34, Danny Chan
> >> wrote:
> >>>>>>>>>>>>>>>>>>>>>>> Thanks Wenlong ~
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint Error
> >> handling
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Actually we have no way to
> >> figure out
> >>>>>>>> whether a
> >>>>>>>>>> error prone
> >>>>>>>>>>>>>>>> hint
> >>>>>>>>>>>>>>>>>> is a
> >>>>>>>>>>>>>>>>>>>> PROPERTIES hint, for example, if use
> >>>> writes a
> >>>>>>> hint
> >>>>>>>>> like
> >>>>>>>>>>>>>>>> ‘PROPERTIAS’,
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>> not know if this hint is a PROPERTIES
> >>>> hint, what
> >>>>>>> we
> >>>>>>>>>> know is that
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> hint
> >>>>>>>>>>>>>>>>>>>> name was not registered in our Flink.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> If the user writes the hint name
> >>>> correctly
> >>>>>>>>> (i.e.
> >>>>>>>>>> PROPERTIES),
> >>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>> did
> >>>>>>>>>>>>>>>>>>>> can enforce the validation of the hint
> >>>> options
> >>>>>>>> though
> >>>>>>>>>> the pluggable
> >>>>>>>>>>>>>>>>>>>> HintOptionChecker.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint Option Format
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> For a key value style hint
> >> option,
> >>>> the key
> >>>>>>>> can
> >>>>>>>>>> be either a
> >>>>>>>>>>>>>>>> simple
> >>>>>>>>>>>>>>>>>>>> identifier or a string literal, which
> >>>> means that
> >>>>>>>> it’s
> >>>>>>>>>> compatible
> >>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>>> DDL syntax. We support simple
> >> identifier
> >>>> because
> >>>>>>>> many
> >>>>>>>>>> other hints
> >>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>> have the component complex keys like
> >> the
> >>>> table
> >>>>>>>>>> properties, and we
> >>>>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> unify the parse block.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800
> >>>> PM3:19,wenlong.lwl <
> >>>>>>>>>> wenlong88....@gmail.com
> >>>>>>>>>>>>>>>>>>> ,写道:
> >>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, thanks for the
> >> proposal.
> >>>> +1 for
> >>>>>>>>>> adding table hints,
> >>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> really
> >>>>>>>>>>>>>>>>>>>>>>>> a necessary feature for flink
> >> sql
> >>>> to
> >>>>>>>>> integrate
> >>>>>>>>>> with a catalog.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> For error handling, I think it
> >>>> would be
> >>>>>>>> more
> >>>>>>>>>> natural to throw
> >>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>> exception when error table hint
> >>>> provided,
> >>>>>>>>>> because the
> >>>>>>>>>>>>>>>> properties
> >>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>> hint
> >>>>>>>>>>>>>>>>>>>>>>>> will be merged and used to find
> >>>> the table
> >>>>>>>>>> factory which would
> >>>>>>>>>>>>>>>>>> cause
> >>>>>>>>>>>>>>>>>>> an
> >>>>>>>>>>>>>>>>>>>>>>>> exception when error properties
> >>>> provided,
> >>>>>>>>>> right? On the other
> >>>>>>>>>>>>>>>>>> hand,
> >>>>>>>>>>>>>>>>>>>> unlike
> >>>>>>>>>>>>>>>>>>>>>>>> other hints which just affect
> >> the
> >>>> way to
> >>>>>>>>>> execute the query,
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> property
> >>>>>>>>>>>>>>>>>>>>>>>> table hint actually affects the
> >>>> result of
> >>>>>>>> the
> >>>>>>>>>> query, we should
> >>>>>>>>>>>>>>>>>> never
> >>>>>>>>>>>>>>>>>>>> ignore
> >>>>>>>>>>>>>>>>>>>>>>>> the given property hints.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> For the format of property
> >> hints,
> >>>>>>>> currently,
> >>>>>>>>>> in sql client, we
> >>>>>>>>>>>>>>>>>>> accept
> >>>>>>>>>>>>>>>>>>>>>>>> properties in format of string
> >>>> only in
> >>>>>>> DDL:
> >>>>>>>>>>>>>>>>>>> 'connector.type'='kafka',
> >>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>> think the format of properties
> >> in
> >>>> hint
> >>>>>>>> should
> >>>>>>>>>> be the same as
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> format we
> >>>>>>>>>>>>>>>>>>>>>>>> defined in ddl. What do you
> >> think?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Bests,
> >>>>>>>>>>>>>>>>>>>>>>>> Wenlong Lyu
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at 14:22,
> >>>> Danny Chan
> >>>>>>> <
> >>>>>>>>>>>>>>>> yuzhao....@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> To Weike: About the Error
> >> Handing
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> To be consistent with other
> >> SQL
> >>>>>>> vendors,
> >>>>>>>>> the
> >>>>>>>>>> default is to
> >>>>>>>>>>>>>>>> log
> >>>>>>>>>>>>>>>>>>>> warnings
> >>>>>>>>>>>>>>>>>>>>>>>>> and if there is any error
> >>>> (invalid hint
> >>>>>>>>> name
> >>>>>>>>>> or options), the
> >>>>>>>>>>>>>>>>>> hint
> >>>>>>>>>>>>>>>>>>>> is just
> >>>>>>>>>>>>>>>>>>>>>>>>> ignored. I have already
> >>>> addressed in
> >>>>>>> the
> >>>>>>>>>> wiki.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> To Timo: About the PROPERTIES
> >>>> Table
> >>>>>>> Hint
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> • The properties hints is
> >> also
> >>>>>>> optional,
> >>>>>>>>>> user can pass in an
> >>>>>>>>>>>>>>>>>> option
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> override the table properties
> >>>> but this
> >>>>>>>> does
> >>>>>>>>>> not mean it is
> >>>>>>>>>>>>>>>>>>> required.
> >>>>>>>>>>>>>>>>>>>>>>>>> • They should not include
> >>>> semantics:
> >>>>>>> does
> >>>>>>>>>> the properties
> >>>>>>>>>>>>>>>> belong
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>> semantic ? I don't think so,
> >> the
> >>>> plan
> >>>>>>>> does
> >>>>>>>>>> not change right ?
> >>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>> result
> >>>>>>>>>>>>>>>>>>>>>>>>> set may be affected, but
> >> there
> >>>> are
> >>>>>>>> already
> >>>>>>>>>> some hints do so,
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> example,
> >>>>>>>>>>>>>>>>>>>>>>>>> MS-SQL MAXRECURSION and
> >> SNAPSHOT
> >>>> hint
> >>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>> • `SELECT * FROM t(k=v,
> >> k=v)`:
> >>>> this
> >>>>>>>> grammar
> >>>>>>>>>> breaks the SQL
> >>>>>>>>>>>>>>>>>> standard
> >>>>>>>>>>>>>>>>>>>>>>>>> compared to the hints
> >> way(which
> >>>> is
> >>>>>>>> included
> >>>>>>>>>> in comments)
> >>>>>>>>>>>>>>>>>>>>>>>>> • I actually didn't found any
> >>>> vendors
> >>>>>>> to
> >>>>>>>>>> support such
> >>>>>>>>>>>>>>>> grammar,
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>>>>>>>> is no way to override table
> >> level
> >>>>>>>>> properties
> >>>>>>>>>> dynamically. For
> >>>>>>>>>>>>>>>>>>> normal
> >>>>>>>>>>>>>>>>>>>> RDBMS,
> >>>>>>>>>>>>>>>>>>>>>>>>> I think there are no requests
> >>>> for such
> >>>>>>>>>> dynamic parameters
> >>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>> all the
> >>>>>>>>>>>>>>>>>>>>>>>>> table have the same storage
> >> and
> >>>>>>>> computation
> >>>>>>>>>> and they are
> >>>>>>>>>>>>>>>> almost
> >>>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>>> batch
> >>>>>>>>>>>>>>>>>>>>>>>>> tables.
> >>>>>>>>>>>>>>>>>>>>>>>>> • While Flink as a
> >> computation
> >>>> engine
> >>>>>>> has
> >>>>>>>>>> many connectors,
> >>>>>>>>>>>>>>>>>>>> especially for
> >>>>>>>>>>>>>>>>>>>>>>>>> some message queue like
> >> Kafka,
> >>>> we would
> >>>>>>>>> have
> >>>>>>>>>> a start_offset
> >>>>>>>>>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> different each time we start
> >> the
> >>>> query,
> >>>>>>>>> such
> >>>>>>>>>> parameters can
> >>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>>>>> persisted to catalog, because
> >>>> it’s not
> >>>>>>>>>> static, this is
> >>>>>>>>>>>>>>>> actually
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> background we propose the
> >> table
> >>>> hints
> >>>>>>> to
> >>>>>>>>>> indicate such
> >>>>>>>>>>>>>>>>> properties
> >>>>>>>>>>>>>>>>>>>>>>>>> dynamically.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> To Jark and Jinsong: I have
> >>>> removed the
> >>>>>>>>>> query hints part and
> >>>>>>>>>>>>>>>>>> change
> >>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> title.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 +0800 PM5:46,Timo
> >>>> Walther <
> >>>>>>>>>> twal...@apache.org
> >>>>>>>>>>>>>>>>> ,写道:
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny,
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> thanks for the proposal. I
> >>>> agree with
> >>>>>>>>> Jark
> >>>>>>>>>> and Jingsong.
> >>>>>>>>>>>>>>>>> Planner
> >>>>>>>>>>>>>>>>>>>> hints
> >>>>>>>>>>>>>>>>>>>>>>>>>> and table hints are
> >> orthogonal
> >>>> topics
> >>>>>>>>> that
> >>>>>>>>>> should be
> >>>>>>>>>>>>>>>> discussed
> >>>>>>>>>>>>>>>>>>>>>>>>> separately.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I share Jingsong's opinion
> >>>> that we
> >>>>>>>> should
> >>>>>>>>>> not use planner
> >>>>>>>>>>>>>>>>> hints
> >>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>> passing connector
> >> properties.
> >>>> Planner
> >>>>>>>>>> hints should be
> >>>>>>>>>>>>>>>> optional
> >>>>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>> any
> >>>>>>>>>>>>>>>>>>>>>>>>>> time. They should not
> >> include
> >>>>>>> semantics
> >>>>>>>>>> but only affect
> >>>>>>>>>>>>>>>>>> execution
> >>>>>>>>>>>>>>>>>>>> time.
> >>>>>>>>>>>>>>>>>>>>>>>>>> Connector properties are an
> >>>> important
> >>>>>>>>> part
> >>>>>>>>>> of the query
> >>>>>>>>>>>>>>>>> itself.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Have you thought about
> >> options
> >>>> such
> >>>>>>> as
> >>>>>>>>>> `SELECT * FROM t(k=v,
> >>>>>>>>>>>>>>>>>>> k=v)`?
> >>>>>>>>>>>>>>>>>>>> How
> >>>>>>>>>>>>>>>>>>>>>>>>>> are other vendors deal with
> >>>> this
> >>>>>>>> problem?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On 09.03.20 10:37,
> >> Jingsong Li
> >>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, +1 for table
> >> hints,
> >>>>>>> thanks
> >>>>>>>>> for
> >>>>>>>>>> driving.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I took a look to FLIP,
> >> most
> >>>> of
> >>>>>>>> content
> >>>>>>>>>> are talking about
> >>>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>>> hints.
> >>>>>>>>>>>>>>>>>>>>>>>>> It is
> >>>>>>>>>>>>>>>>>>>>>>>>>>> hard to discussion and
> >>>> voting. So
> >>>>>>> +1
> >>>>>>>> to
> >>>>>>>>>> split it as Jark
> >>>>>>>>>>>>>>>>> said.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Another thing is
> >>>> configuration that
> >>>>>>>>>> suitable to config with
> >>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>> hints:
> >>>>>>>>>>>>>>>>>>>>>>>>>>> "connector.path" and
> >>>>>>>> "connector.topic",
> >>>>>>>>>> Are they really
> >>>>>>>>>>>>>>>>>> suitable
> >>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>> table
> >>>>>>>>>>>>>>>>>>>>>>>>>>> hints? Looks weird to me.
> >>>> Because I
> >>>>>>>>>> think these properties
> >>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> core of
> >>>>>>>>>>>>>>>>>>>>>>>>>>> table.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jingsong Lee
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 9, 2020 at
> >> 5:30
> >>>> PM Jark
> >>>>>>>> Wu
> >>>>>>>>> <
> >>>>>>>>>> imj...@gmail.com>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Danny for
> >> starting
> >>>> the
> >>>>>>>>>> discussion.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> +1 for this feature.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> If we just focus on the
> >>>> table
> >>>>>>> hints
> >>>>>>>>>> not the query hints in
> >>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>> release,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> could you split the
> >> FLIP
> >>>> into two
> >>>>>>>>>> FLIPs?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Because it's hard to
> >> vote
> >>>> on
> >>>>>>>> partial
> >>>>>>>>>> part of a FLIP. You
> >>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>> keep
> >>>>>>>>>>>>>>>>>>>>>>>>> the table
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> hints proposal in
> >> FLIP-113
> >>>> and
> >>>>>>> move
> >>>>>>>>>> query hints into
> >>>>>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>>> FLIP.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> So that we can focuse
> >> on
> >>>> the
> >>>>>>> table
> >>>>>>>>>> hints in the FLIP.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, 9 Mar 2020 at
> >>>> 17:14,
> >>>>>>> DONG,
> >>>>>>>>>> Weike <
> >>>>>>>>>>>>>>>>>>> kyled...@connect.hku.hk
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a nice
> >> feature,
> >>>> +1.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I am
> >>>> interested in
> >>>>>>> but
> >>>>>>>>> not
> >>>>>>>>>> mentioned in the
> >>>>>>>>>>>>>>>>>> proposal
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> handling, as it is
> >> quite
> >>>> common
> >>>>>>>> for
> >>>>>>>>>> users to write
> >>>>>>>>>>>>>>>>>>> inappropriate
> >>>>>>>>>>>>>>>>>>>>>>>>> hints in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL code, if illegal
> >> or
> >>>> "bad"
> >>>>>>>> hints
> >>>>>>>>>> are given, would the
> >>>>>>>>>>>>>>>>>> system
> >>>>>>>>>>>>>>>>>>>>>>>>> simply
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore them or throw
> >>>>>>> exceptions?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks : )
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Weike
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 9, 2020
> >> at
> >>>> 5:02 PM
> >>>>>>>>> Danny
> >>>>>>>>>> Chan <
> >>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Note:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we only plan to
> >>>> support table
> >>>>>>>>>> hints in Flink release
> >>>>>>>>>>>>>>>> 1.11,
> >>>>>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>>>>>>>> please
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> focus
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly on the table
> >>>> hints
> >>>>>>> part
> >>>>>>>>> and
> >>>>>>>>>> just ignore the
> >>>>>>>>>>>>>>>> planner
> >>>>>>>>>>>>>>>>>>>>>>>>> hints, sorry
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that mistake ~
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 +0800
> >>>>>>> PM4:36,Danny
> >>>>>>>>>> Chan <
> >>>>>>>>>>>>>>>> yuzhao....@gmail.com
> >>>>>>>>>>>>>>>>>>>> ,写道:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, fellows ~
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to
> >>>> propose the
> >>>>>>>>>> supports for SQL hints for
> >>>>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would support
> >>>> hints
> >>>>>>> syntax
> >>>>>>>>> as
> >>>>>>>>>> following:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> select /*+
> >>>> NO_HASH_JOIN,
> >>>>>>>>>> RESOURCE(mem='128mb',
> >>>>>>>>>>>>>>>>>>>>>>>>> parallelism='24') */
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> emp /*+
> >> INDEX(idx1,
> >>>> idx2)
> >>>>>>> */
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dept /*+
> >>>>>>> PROPERTIES(k1='v1',
> >>>>>>>>>> k2='v2') */
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> emp.deptno =
> >>>> dept.deptno
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Basically we
> >> would
> >>>> support
> >>>>>>>> both
> >>>>>>>>>> query hints(after the
> >>>>>>>>>>>>>>>>>> SELECT
> >>>>>>>>>>>>>>>>>>>>>>>>> keyword)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and table
> >> hints(after
> >>>> the
> >>>>>>>>>> referenced table name), for
> >>>>>>>>>>>>>>>>> 1.11,
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>> plan to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support table hints
> >>>> with a
> >>>>>>> hint
> >>>>>>>>>> probably named
> >>>>>>>>>>>>>>>> PROPERTIES:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table_name /*+
> >>>>>>>>>> PROPERTIES(k1='v1', k2='v2') *+/
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am looking
> >> forward
> >>>> to
> >>>>>>> your
> >>>>>>>>>> comments.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can access
> >> the
> >>>> FLIP
> >>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >
> >
>
>

Reply via email to