Ok, let's go with option 1. This doesn't prevent up from adding option 2
latter if necessary.

I've opened a ticket: https://issues.apache.org/jira/browse/FLINK-18735

On Mon, Jul 27, 2020 at 12:05 AM Konstantin Knauf <kna...@apache.org> wrote:

> Hi everyone,
>
> I have the feeling that #1 suffices for the vast majority of use cases.
>
> Cheers,
>
> Konstantin
>
> On Mon, Jul 27, 2020 at 6:36 AM Jark Wu <imj...@gmail.com> wrote:
>
> > +1 to option#1.
> >
> > I think it makes sense to enhance the datagen connector.
> > In this case, I think we can support the default TIMESTAMP generation
> > strategy as "sequence" with an optional start point.
> > This strategy can be changed to "constant", "random", or others.
> > This would be really helpful and cool if we can support this, and that's
> > why I prefer #1 than #2:
> >
> > CREATE TEMPORARY TABLE Orders WITH (
> >     'connector' = 'datagen'
> > ) LIKE Orders (EXCLUDING ALL)
> >
> > Regarding #2, if we want to extend the LIKE clause, maybe we can add an
> > "OVERWRITING COLUMNS" like option.
> > But unless we have other strong use cases for this, otherwise, I think
> this
> > makes things complicated (the current like options are already puzzling).
> > Maybe @Dawid Wysakowicz <dwysakow...@apache.org>  has more thoughts on
> > this.
> >
> >
> > Best,
> > Jark
> >
> > On Mon, 27 Jul 2020 at 10:50, godfrey he <godfre...@gmail.com> wrote:
> >
> > > Hi Seth,
> > > Thanks for bringing up this topic.
> > >
> > > I think the second approach is a more generic solution.
> > > Other connectors can also benefit from this.
> > > We also keep the flexibility for generating random timestamps for some
> > > scenarios.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Seth Wiesman <sjwies...@gmail.com> 于2020年7月24日周五 下午11:30写道:
> > >
> > > > Hi everyone,
> > > >
> > > > Currently, the data gen table source only supports a subset of Flink
> > SQL
> > > > types. One missing type in particular is TIMESTAMP(3). The reason, I
> > > > suspect, it was not added originally is that it doesn't really make
> > sense
> > > > to have random timestamps. What you really want is for them to be
> > > > ascending. In the use cases of data generation, users typically don't
> > > care
> > > > about late data. The workaround proposed in the docs is to create
> your
> > > > event time attribute using a computed column.
> > > >
> > > > CREATE TABLE t (
> > > >     ts AS LOCALTIMESTAMP
> > > > ) WITH (
> > > >     'connector' = 'datagen'
> > > > )
> > > >
> > > > The problem is that this does not play well with the LIKE clause.
> Many
> > > > users do not create datagen backed tables from scratch but using the
> > LIKE
> > > > clause to shadow a physical table in their catalog - such as Kafka.
> > > >
> > > > The problem is the LIKE clause does not allow redefining columns so
> > there
> > > > is no way to do this for a table with an event time attribute. The
> > below
> > > > will fail.
> > > >
> > > > CREATE TABLE Orders (
> > > >     order_id   BIGINT,
> > > >     order_time TIMESTAMP(3)
> > > >     quantity   INT,
> > > >     cost       AS price * quantity,
> > > >     WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND,
> > > >     PRIMARY KEY (order_id) NOT ENFORCED
> > > > ) WITH (
> > > >     'connector' = 'kafka',
> > > >     'topic' = 'orders',
> > > >     'properties.bootstrap.servers' = 'localhost:9092',
> > > >     'properties.group.id' = 'orderGroup',
> > > >     'format' = 'csv'
> > > > )
> > > >
> > > > CREATE TEMPORARY TABLE Orders WITH (
> > > >     'connector' = 'datagen'
> > > > ) LIKE Orders (EXCLUDING ALL)
> > > >
> > > >
> > > > I see two solutions to this and would like to hear what people think.
> > > >
> > > > 1) Support TIMESTAMP in datagen tables but always supply strictly
> > > ascending
> > > > timestamps. The above would now "just work". This semantic makes
> sense
> > > > given the way event time attributes are used in streaming
> applications
> > > and
> > > > we can clearly document the behavior.
> > > >
> > > > 2) Relax the constraints of the LIKE clause to allow overriding
> > physical
> > > > columns with computed columns. This would make it clearer to the user
> > > what
> > > > is happening but would require substantially higher development
> effort
> > > and
> > > > I don't know if this feature would add value beyond this one use
> case.
> > In
> > > > practice, this would allow the following.
> > > >
> > > > Please let me know what you think.
> > > > CREATE TEMPORARY TABLE Orders (
> > > >     order_time AS LOCALTIMESTAMP
> > > > ) WITH (
> > > >      'connector' = 'datagen'
> > > > ) LIKE Orders (EXCLUDING ALL)
> > > >
> > > > Seth
> > > >
> > >
> >
>
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>

Reply via email to