Hi everyone,

I have the feeling that #1 suffices for the vast majority of use cases.

Cheers,

Konstantin

On Mon, Jul 27, 2020 at 6:36 AM Jark Wu <imj...@gmail.com> wrote:

> +1 to option#1.
>
> I think it makes sense to enhance the datagen connector.
> In this case, I think we can support the default TIMESTAMP generation
> strategy as "sequence" with an optional start point.
> This strategy can be changed to "constant", "random", or others.
> This would be really helpful and cool if we can support this, and that's
> why I prefer #1 than #2:
>
> CREATE TEMPORARY TABLE Orders WITH (
>     'connector' = 'datagen'
> ) LIKE Orders (EXCLUDING ALL)
>
> Regarding #2, if we want to extend the LIKE clause, maybe we can add an
> "OVERWRITING COLUMNS" like option.
> But unless we have other strong use cases for this, otherwise, I think this
> makes things complicated (the current like options are already puzzling).
> Maybe @Dawid Wysakowicz <dwysakow...@apache.org>  has more thoughts on
> this.
>
>
> Best,
> Jark
>
> On Mon, 27 Jul 2020 at 10:50, godfrey he <godfre...@gmail.com> wrote:
>
> > Hi Seth,
> > Thanks for bringing up this topic.
> >
> > I think the second approach is a more generic solution.
> > Other connectors can also benefit from this.
> > We also keep the flexibility for generating random timestamps for some
> > scenarios.
> >
> > Best,
> > Godfrey
> >
> > Seth Wiesman <sjwies...@gmail.com> 于2020年7月24日周五 下午11:30写道:
> >
> > > Hi everyone,
> > >
> > > Currently, the data gen table source only supports a subset of Flink
> SQL
> > > types. One missing type in particular is TIMESTAMP(3). The reason, I
> > > suspect, it was not added originally is that it doesn't really make
> sense
> > > to have random timestamps. What you really want is for them to be
> > > ascending. In the use cases of data generation, users typically don't
> > care
> > > about late data. The workaround proposed in the docs is to create your
> > > event time attribute using a computed column.
> > >
> > > CREATE TABLE t (
> > >     ts AS LOCALTIMESTAMP
> > > ) WITH (
> > >     'connector' = 'datagen'
> > > )
> > >
> > > The problem is that this does not play well with the LIKE clause. Many
> > > users do not create datagen backed tables from scratch but using the
> LIKE
> > > clause to shadow a physical table in their catalog - such as Kafka.
> > >
> > > The problem is the LIKE clause does not allow redefining columns so
> there
> > > is no way to do this for a table with an event time attribute. The
> below
> > > will fail.
> > >
> > > CREATE TABLE Orders (
> > >     order_id   BIGINT,
> > >     order_time TIMESTAMP(3)
> > >     quantity   INT,
> > >     cost       AS price * quantity,
> > >     WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND,
> > >     PRIMARY KEY (order_id) NOT ENFORCED
> > > ) WITH (
> > >     'connector' = 'kafka',
> > >     'topic' = 'orders',
> > >     'properties.bootstrap.servers' = 'localhost:9092',
> > >     'properties.group.id' = 'orderGroup',
> > >     'format' = 'csv'
> > > )
> > >
> > > CREATE TEMPORARY TABLE Orders WITH (
> > >     'connector' = 'datagen'
> > > ) LIKE Orders (EXCLUDING ALL)
> > >
> > >
> > > I see two solutions to this and would like to hear what people think.
> > >
> > > 1) Support TIMESTAMP in datagen tables but always supply strictly
> > ascending
> > > timestamps. The above would now "just work". This semantic makes sense
> > > given the way event time attributes are used in streaming applications
> > and
> > > we can clearly document the behavior.
> > >
> > > 2) Relax the constraints of the LIKE clause to allow overriding
> physical
> > > columns with computed columns. This would make it clearer to the user
> > what
> > > is happening but would require substantially higher development effort
> > and
> > > I don't know if this feature would add value beyond this one use case.
> In
> > > practice, this would allow the following.
> > >
> > > Please let me know what you think.
> > > CREATE TEMPORARY TABLE Orders (
> > >     order_time AS LOCALTIMESTAMP
> > > ) WITH (
> > >      'connector' = 'datagen'
> > > ) LIKE Orders (EXCLUDING ALL)
> > >
> > > Seth
> > >
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk

Reply via email to