Hi everyone, I have the feeling that #1 suffices for the vast majority of use cases.
Cheers, Konstantin On Mon, Jul 27, 2020 at 6:36 AM Jark Wu <imj...@gmail.com> wrote: > +1 to option#1. > > I think it makes sense to enhance the datagen connector. > In this case, I think we can support the default TIMESTAMP generation > strategy as "sequence" with an optional start point. > This strategy can be changed to "constant", "random", or others. > This would be really helpful and cool if we can support this, and that's > why I prefer #1 than #2: > > CREATE TEMPORARY TABLE Orders WITH ( > 'connector' = 'datagen' > ) LIKE Orders (EXCLUDING ALL) > > Regarding #2, if we want to extend the LIKE clause, maybe we can add an > "OVERWRITING COLUMNS" like option. > But unless we have other strong use cases for this, otherwise, I think this > makes things complicated (the current like options are already puzzling). > Maybe @Dawid Wysakowicz <dwysakow...@apache.org> has more thoughts on > this. > > > Best, > Jark > > On Mon, 27 Jul 2020 at 10:50, godfrey he <godfre...@gmail.com> wrote: > > > Hi Seth, > > Thanks for bringing up this topic. > > > > I think the second approach is a more generic solution. > > Other connectors can also benefit from this. > > We also keep the flexibility for generating random timestamps for some > > scenarios. > > > > Best, > > Godfrey > > > > Seth Wiesman <sjwies...@gmail.com> 于2020年7月24日周五 下午11:30写道: > > > > > Hi everyone, > > > > > > Currently, the data gen table source only supports a subset of Flink > SQL > > > types. One missing type in particular is TIMESTAMP(3). The reason, I > > > suspect, it was not added originally is that it doesn't really make > sense > > > to have random timestamps. What you really want is for them to be > > > ascending. In the use cases of data generation, users typically don't > > care > > > about late data. The workaround proposed in the docs is to create your > > > event time attribute using a computed column. > > > > > > CREATE TABLE t ( > > > ts AS LOCALTIMESTAMP > > > ) WITH ( > > > 'connector' = 'datagen' > > > ) > > > > > > The problem is that this does not play well with the LIKE clause. Many > > > users do not create datagen backed tables from scratch but using the > LIKE > > > clause to shadow a physical table in their catalog - such as Kafka. > > > > > > The problem is the LIKE clause does not allow redefining columns so > there > > > is no way to do this for a table with an event time attribute. The > below > > > will fail. > > > > > > CREATE TABLE Orders ( > > > order_id BIGINT, > > > order_time TIMESTAMP(3) > > > quantity INT, > > > cost AS price * quantity, > > > WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND, > > > PRIMARY KEY (order_id) NOT ENFORCED > > > ) WITH ( > > > 'connector' = 'kafka', > > > 'topic' = 'orders', > > > 'properties.bootstrap.servers' = 'localhost:9092', > > > 'properties.group.id' = 'orderGroup', > > > 'format' = 'csv' > > > ) > > > > > > CREATE TEMPORARY TABLE Orders WITH ( > > > 'connector' = 'datagen' > > > ) LIKE Orders (EXCLUDING ALL) > > > > > > > > > I see two solutions to this and would like to hear what people think. > > > > > > 1) Support TIMESTAMP in datagen tables but always supply strictly > > ascending > > > timestamps. The above would now "just work". This semantic makes sense > > > given the way event time attributes are used in streaming applications > > and > > > we can clearly document the behavior. > > > > > > 2) Relax the constraints of the LIKE clause to allow overriding > physical > > > columns with computed columns. This would make it clearer to the user > > what > > > is happening but would require substantially higher development effort > > and > > > I don't know if this feature would add value beyond this one use case. > In > > > practice, this would allow the following. > > > > > > Please let me know what you think. > > > CREATE TEMPORARY TABLE Orders ( > > > order_time AS LOCALTIMESTAMP > > > ) WITH ( > > > 'connector' = 'datagen' > > > ) LIKE Orders (EXCLUDING ALL) > > > > > > Seth > > > > > > -- Konstantin Knauf https://twitter.com/snntrable https://github.com/knaufk