Thanks for all you discussions Timo, Jark Wu, Lin Li, Fabian
I have fired a new design doc in [1] which reorganized from the MVP doc[2] and
the initial doc[3] that both proposed by Shuyi Chen.
The main diff is that i extend the create table DDL to support complex sql type
(array, map and struct
>>>>>>> table support for better connectivity to Hive and Kafka. I'll
> >> add
> >>>>>>>>>>> partitioned table syntax(compatible to hive) into the DDL Draft
> >>> doc
> >>>>>>>>>>> later[
tching Flink schema.
For
instance,
data types. Even if so, perfect match is not required. For
instance,
the
external schema file may evolve while table schema in Flink
may
stay
unchanged. A responsible reader should be able to scan the
file
based
on
file schema and return the data based o
session,
> > > >>>>>>>> people should be able to list all source table and sink tables
> > to
> > > >>> know
> > > >>>>>>>> upfront if they can use an INSERT INTO here or not.
> &g
c") ?
> > >>>>>>>> @Xuefu: Yes, you are right that an external schema might not
> > >> excatly
> > >>>>>>>> match but this is true for both directions:
> > >>>>>>>> table schema "derives" format schema and format schema "derives"
> > >>
list all source table and sink tables
> > to
> > > >>> know
> > > >>>>>>>> upfront if they can use an INSERT INTO here or not.
> > > >>>>>>>>
> > > >>>>>>>> 6. Part
gt;>>>>> @Xuefu: Yes, you are right that an external schema might not
> > >> excatly
> > >>>>>>>> match but this is true for both directions:
> > >>>>>>>> table schema "derives" format schema and format schema "derives"
> > >>> table
> > >>>>>>>> schema.
> > >>>>
t;existing field is Long/Timestamp, we can just use it as
> >>>>>>>> rowtime": yes, but we need to mark a field as such an attribute.
> >> How
> >>>>>>>> does the syntax for marking look like? Also in case of timestamps
> >>> that
> >>>>>>>> are nested in the schema?
> >>>>>>>>
> >>&g
stay
unchanged. A responsible reader should be able to scan the file
based
on
file schema and return the data based on table schema.
Other aspects:
7. Hive compatibility. Since Flink SQL will soon be able to
operate
on
Hive metadata and data, it's an add-on benefit if we can be
compatible
with
Hive syntax/se
> > > >>>>> 4a. timestamps are in the schema twice.
> > > >>>>> If an existing field is Long/Timestamp, we can just use it
> as
> > > >>>> rowtime,
> > > >>>>> no twice defined. If it is not a Long/Timestamp, w
ich is exactly the same
> as
> > >>>>> "replacing the existing column" in runtime.
> > >>>>>
> > >>>>> 4b. how can we write out a timestamp into the message header?
> > >>>>> That's a good po
rite to Kafka message header. What do you
> >> think?
> >>>>> 4c. separate all time attribute concerns into a special clause
> >>> next
> >>>> to
> >>>>> the regular schema?
> >>>>> Separating watermark into a spec
A responsible reader should be able to scan the file
based
on
file schema and return the data based on table schema.
Other aspects:
7. Hive compatibility. Since Flink SQL will soon be able to operate
on
Hive metadata and data, it's an add-on benefit if we can be
compatible
with
Hiv
; > > because it keeps standard compliant. BTW, this feature is not in MVP,
> > we
> > > > can discuss it more depth in the future when we need it.
> > > >
> > > > 5. Schema declaration:
> > > > I like the proposal to omit the schema if we can get the schema from
> > > > e
t;
> > > [1]:
> > >
> >
> https://docs.microsoft.com/en-us/sql/relational-databases/partitions/create-partitioned-tables-and-indexes?view=sql-server-2017
> > >
> > > Best,
> > > Jark
> > >
> > > On Thu, 6 Dec 2018 at 12:09, Zhang, Xuefu
> > wrote:
> > >
> > >> Hi Timo/Shuyi/Lin,
> > >>
> >
k, or both, it doesn't seem necessary to use these
> >> keywords to enforce permissions.
> >> 5. It might be okay if schema declaration is always needed. While there
> >> might be some duplication sometimes, it's not always true. For example,
> >> external s
tibility. Since Flink SQL will soon be able to operate on
Hive metadata and data, it's an add-on benefit if we can be compatible with
Hive syntax/semantics while following ANSI standard. At least we should be
as close as possible. Hive DDL can found at
https://cwiki.apache.org/confluence/display/Hive/L
. At least we should be
> as close as possible. Hive DDL can found at
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
>
> Thanks,
> Xuefu
>
>
>
> --
> Sender:Lin Li
> Sent at:2018 Dec 6 (Thu) 10
found at
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
Thanks,
Xuefu
--
Sender:Lin Li
Sent at:2018 Dec 6 (Thu) 10:49
Recipient:dev
Subject:Re: [DISCUSS] Flink SQL DDL Design
Hi Timo and Shuyi,
thanks for yo
ve
> >>>>>>> from
> >>>>>>>>>> other fields). The “AS” keyword defines watermark strategy,
> >>>> such
> >>>>> as
> >>>>>>>>> BOUNDED
> >>>>>>>>>> WITH OFFSET (covers almost all the requirements) and
> >>> ASCENDING.
> >>>>>>>>>> When the expected rowtime field doe
here are any questions or
suggestions.
Thanks,
Xuefu
------------------
Sender:Timo Walther
Sent at:2018 Nov 27 (Tue) 16:21
Recipient:dev
Subject:Re: [DISCUSS] Flink SQL DDL Design
Thanks for offering your help here, Xuefu. It would
be
gr
t; [ PERIOD FOR SYSTEM_TIME ]
> > > > > > > > > > [ WATERMARK watermarkName FOR rowTimeColumn AS
> > > > > > > > > > withOffset(rowTimeColumn, offset) ] ) [ WITH (
> > > tableOption
> > > > [
> &g
ARCHAR ]
> > > > > > > > > | [ BOOLEAN ]
> > > > > > > > > | [ TINYINT ]
> > > > > > > > > | [ SMALLINT ]
> > > > > > > > > | [ INT ]
> > > &
> > > > > > > > | [ DATE ]
> > > > > > > > | [ TIME ]
> > > > > > > > | [ TIMESTAMP ]
> > > > > > > > | [ VARBINARY ]
> > > > > > > > }
> > > > > > > >
> > > > > > > > computedColumnDefinition ::=
> &
ame [, columnName]* )
> > > > > > >
> > > > > > > tableIndex ::=
> > > > > > > [ UNIQUE ] INDEX indexName
> > > > > > > (columnName [, columnName]* )
> > > > > > >
> > > > > > > rowTimeColumn ::=
> > > > > > > columnName
> > > > > &
CREATE VIEW
> > > > > >
> > > > > > CREATE VIEW viewName
> > > > > > [
> > > > > > ( columnName [, columnName]* )
> > > > > > ]
> > > > > > AS queryStatement;
> > > > > >
> > > > > > CREATE FUNCTION
> > > > > >
> > > > > > CREATE FUNCTIO
gt; > > >
> > > > > className ::=
> > > > > fully qualified name
> > > > >
> > > > >
> > > > > Shuyi Chen 于2018年11月28日周三 上午3:28写道:
> > > > >
> > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the
> > design
> > > > doc
> > > > > &g
7;;
> > > > >
> > > > > className ::=
> > > > > fully qualified name
> > > > >
> > > > >
> > > > > Shuyi Chen 于2018年11月28日周三 上午3:28写道:
> > > > >
> > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the
> > design
> > > > doc
> &
Xuefu
wrote:
> +1 Sounds great!
>
>
> --
> Sender:Shuyi Chen
> Sent at:2018 Nov 29 (Thu) 06:56
> Recipient:dev
> Subject:Re: [DISCUSS] Flink SQL DDL Design
>
> Thanks a lot, Shaoxuan, Jack and Lin. We s
+1 Sounds great!
--
Sender:Shuyi Chen
Sent at:2018 Nov 29 (Thu) 06:56
Recipient:dev
Subject:Re: [DISCUSS] Flink SQL DDL Design
Thanks a lot, Shaoxuan, Jack and Lin. We should definitely collaborate
here, we have also our own DDL
to
> > generic
> > > > > key-value pairs, so that it will make integration with Hive DDL (or
> > > > others,
> > > > > e.g. Beam DDL) easier.
> > > > >
> > > > > I'll run a final pass over the design doc and finalize the design
>
gt; > the
> > > > next few days. And we can start creating tasks and collaborate on the
> > > > implementation. Thanks a lot for all the comments and inputs.
> > > >
> > > > Cheers!
> > > > Shuyi
> > > >
> > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> xuef...@alibaba-inc.c
t DDL can actually proceed w/o being
> blocked
> > > by
> > > > connector API. We can leave the unknown out while defining the basic
> > > syntax.
> > > >
> > > > @Shuyi
> > > >
> > > > As commented in the doc, I think we can probably stick with simple
> > syntax
> > > > with general pro
ther stuff for the last 2
> > > weeks,
> > > > but we are definitely interested in moving this forward. I think once
> > the
> > > > unified connector API design [1] is done, we can finalize the DDL
> > design
> > > as
> > > > well and start creating concrete s
ize on
> > the proposal and then we can divide the tasks for better collaboration.
> >
> > Please let me know if there are any questions or suggestions.
> >
> > Thanks,
> > Xuefu
> >
> >
> >
> >
> > --
> > Sender:Timo Walther
&g
> Sender:Timo Walther
> Sent at:2018 Nov 27 (Tue) 16:21
> Recipient:dev
> Subject:Re: [DISCUSS] Flink SQL DDL Design
>
> Thanks for offering your help here, Xuefu. It would be great to move
> these efforts forward. I agree t
ce and like to move this forward. We can
>> collaborate.
>>
>> Thanks,
>>
>> Xuefu
>>
>>
>> ------------------
>> 发件人:wenlong.lwl
>> 日 期:2018年11月05日 11:15:35
>> 收件人:
>> 主 题:Re: [DISCUSS] Flink SQL DDL Design
>>
>
15:35
收件人:
主 题:Re: [DISCUSS] Flink SQL DDL Design
Hi, Shuyi, thanks for the proposal.
I have two concerns about the table ddl:
1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the co
t;
> We have some dedicated resource and like to move this forward. We can
> collaborate.
>
> Thanks,
>
> Xuefu
>
>
> --
> 发件人:wenlong.lwl
> 日 期:2018年11月05日 11:15:35
> 收件人:
> 主 题:Re: [DISCUSS] Flink SQL
Hi Wenlong, thanks a lot for the comments.
1) I agree we can infer the table type from the queries if the Flink job is
static. However, for SQL client cases, the query is adhoc, dynamic, and not
known beforehand. In such case, we might want to enforce the table open
mode at startup time, so users
35
收件人:
主 题:Re: [DISCUSS] Flink SQL DDL Design
Hi, Shuyi, thanks for the proposal.
I have two concerns about the table ddl:
1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the conte
Hi, Shuyi, thanks for the proposal.
I have two concerns about the table ddl:
1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the context of the query using the table. it will be more
+1, Thanks for the proposal.
I guess this is a long-awaited change. This can vastly increase the
functionalities of the SQL Client as it will be possible to use complex
extensions like for example those provided by Apache Bahir[1].
Best Regards,
Dom.
[1]
https://github.com/apache/bahir-flink
s
+1. Thanks for putting the proposal together Shuyi.
DDL has been brought up in a couple of times previously [1,2]. Utilizing
DDL will definitely be a great extension to the current Flink SQL to
systematically support some of the previously brought up features such as
[3]. And it will also be benef
Thanks Shuyi!
I left some comments there. I think the design of SQL DDL and Flink-Hive
integration/External catalog enhancements will work closely with each
other. Hope we are well aligned on the directions of the two designs, and I
look forward to working with you guys on both!
Bowen
On Thu, N
Hi everyone,
SQL DDL support has been a long-time ask from the community. Current Flink
SQL support only DML (e.g. SELECT and INSERT statements). In its current
form, Flink SQL users still need to define/create table sources and sinks
programmatically in Java/Scala. Also, in SQL Client, without DD
46 matches
Mail list logo