Ok, let's go with option 1. This doesn't prevent up from adding option 2 latter if necessary.
I've opened a ticket: https://issues.apache.org/jira/browse/FLINK-18735 On Mon, Jul 27, 2020 at 12:05 AM Konstantin Knauf <kna...@apache.org> wrote: > Hi everyone, > > I have the feeling that #1 suffices for the vast majority of use cases. > > Cheers, > > Konstantin > > On Mon, Jul 27, 2020 at 6:36 AM Jark Wu <imj...@gmail.com> wrote: > > > +1 to option#1. > > > > I think it makes sense to enhance the datagen connector. > > In this case, I think we can support the default TIMESTAMP generation > > strategy as "sequence" with an optional start point. > > This strategy can be changed to "constant", "random", or others. > > This would be really helpful and cool if we can support this, and that's > > why I prefer #1 than #2: > > > > CREATE TEMPORARY TABLE Orders WITH ( > > 'connector' = 'datagen' > > ) LIKE Orders (EXCLUDING ALL) > > > > Regarding #2, if we want to extend the LIKE clause, maybe we can add an > > "OVERWRITING COLUMNS" like option. > > But unless we have other strong use cases for this, otherwise, I think > this > > makes things complicated (the current like options are already puzzling). > > Maybe @Dawid Wysakowicz <dwysakow...@apache.org> has more thoughts on > > this. > > > > > > Best, > > Jark > > > > On Mon, 27 Jul 2020 at 10:50, godfrey he <godfre...@gmail.com> wrote: > > > > > Hi Seth, > > > Thanks for bringing up this topic. > > > > > > I think the second approach is a more generic solution. > > > Other connectors can also benefit from this. > > > We also keep the flexibility for generating random timestamps for some > > > scenarios. > > > > > > Best, > > > Godfrey > > > > > > Seth Wiesman <sjwies...@gmail.com> 于2020年7月24日周五 下午11:30写道: > > > > > > > Hi everyone, > > > > > > > > Currently, the data gen table source only supports a subset of Flink > > SQL > > > > types. One missing type in particular is TIMESTAMP(3). The reason, I > > > > suspect, it was not added originally is that it doesn't really make > > sense > > > > to have random timestamps. What you really want is for them to be > > > > ascending. In the use cases of data generation, users typically don't > > > care > > > > about late data. The workaround proposed in the docs is to create > your > > > > event time attribute using a computed column. > > > > > > > > CREATE TABLE t ( > > > > ts AS LOCALTIMESTAMP > > > > ) WITH ( > > > > 'connector' = 'datagen' > > > > ) > > > > > > > > The problem is that this does not play well with the LIKE clause. > Many > > > > users do not create datagen backed tables from scratch but using the > > LIKE > > > > clause to shadow a physical table in their catalog - such as Kafka. > > > > > > > > The problem is the LIKE clause does not allow redefining columns so > > there > > > > is no way to do this for a table with an event time attribute. The > > below > > > > will fail. > > > > > > > > CREATE TABLE Orders ( > > > > order_id BIGINT, > > > > order_time TIMESTAMP(3) > > > > quantity INT, > > > > cost AS price * quantity, > > > > WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND, > > > > PRIMARY KEY (order_id) NOT ENFORCED > > > > ) WITH ( > > > > 'connector' = 'kafka', > > > > 'topic' = 'orders', > > > > 'properties.bootstrap.servers' = 'localhost:9092', > > > > 'properties.group.id' = 'orderGroup', > > > > 'format' = 'csv' > > > > ) > > > > > > > > CREATE TEMPORARY TABLE Orders WITH ( > > > > 'connector' = 'datagen' > > > > ) LIKE Orders (EXCLUDING ALL) > > > > > > > > > > > > I see two solutions to this and would like to hear what people think. > > > > > > > > 1) Support TIMESTAMP in datagen tables but always supply strictly > > > ascending > > > > timestamps. The above would now "just work". This semantic makes > sense > > > > given the way event time attributes are used in streaming > applications > > > and > > > > we can clearly document the behavior. > > > > > > > > 2) Relax the constraints of the LIKE clause to allow overriding > > physical > > > > columns with computed columns. This would make it clearer to the user > > > what > > > > is happening but would require substantially higher development > effort > > > and > > > > I don't know if this feature would add value beyond this one use > case. > > In > > > > practice, this would allow the following. > > > > > > > > Please let me know what you think. > > > > CREATE TEMPORARY TABLE Orders ( > > > > order_time AS LOCALTIMESTAMP > > > > ) WITH ( > > > > 'connector' = 'datagen' > > > > ) LIKE Orders (EXCLUDING ALL) > > > > > > > > Seth > > > > > > > > > > > > -- > > Konstantin Knauf > > https://twitter.com/snntrable > > https://github.com/knaufk >