Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-05-05 Thread Jingsong Li
Thanks Konstantin for your Faker link. It looks very interesting and very real. We can add this generator to datagen source. Best, Jingsong Lee On Fri, May 1, 2020 at 1:00 AM Konstantin Knauf wrote: > Hi Jark, > > my gut feeling is 1), because of its consistency with other connectors > (does no

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-04-30 Thread Konstantin Knauf
Hi Jark, my gut feeling is 1), because of its consistency with other connectors (does not add two secret keywords) although it is more verbose. Best, Konstantin On Thu, Apr 30, 2020 at 5:01 PM Jark Wu wrote: > Hi Konstantin, > > Thanks for the link of Java Faker. It's an intereting project

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-04-30 Thread Jark Wu
Hi Konstantin, Thanks for the link of Java Faker. It's an intereting project and could benefit to a comprehensive datagen source. What the discarding and printing sink look like in your thought? 1) manually create a table with a `blackhole` or `print` connector, e.g. CREATE TABLE my_sink ( a I

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-04-30 Thread Konstantin Knauf
Hi everyone, sorry for reviving this thread at this point in time. Generally, I think, this is a very valuable effort. Have we considered only providing a very basic data generator (+ discarding and printing sink tables) in Apache Flink and moving a more comprehensive data generating table source

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-24 Thread Jingsong Li
Hi all, I created https://issues.apache.org/jira/browse/FLINK-16743 for follow-up discussion. FYI. Best, Jingsong Lee On Tue, Mar 24, 2020 at 2:20 PM Bowen Li wrote: > I agree with Jingsong that sink schema inference and system tables can be > considered later. I wouldn’t recommend to tackle t

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Bowen Li
I agree with Jingsong that sink schema inference and system tables can be considered later. I wouldn’t recommend to tackle them for the sake of simplifying user experience to the extreme. Providing the above handy source and sink implementations already offer users a ton of immediate value. On Mo

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Jingsong Li
Hi Benchao, > do you think we need to add more columns with various types? I didn't list all types, but we should support primitive types, varchar, Decimal, Timestamp and etc... This can be done continuously. Hi Benchao, Jark, About console and blackhole, yes, they can have no schema, the schema

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Jark Wu
Hi Jingsong, Regarding (2) and (3), I was thinking to ignore manually DDL work, so users can use them directly: # this will log results to `.out` files INSERT INTO console SELECT ... # this will drop all received records INSERT INTO blackhole SELECT ... Here `console` and `blackhole` are system

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Benchao Li
Hi Jingsong, Thanks for bring this up. Generally, it's a very good proposal. About data gen source, do you think we need to add more columns with various types? About print sink, do we need to specify the schema? Jingsong Li 于2020年3月23日周一 下午1:51写道: > Thanks Bowen, Jark and Dian for your feedb

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-22 Thread Jingsong Li
Thanks Bowen, Jark and Dian for your feedback and suggestions. I reorganize with your suggestions, and try to expose DDLs: 1.datagen source: - easy startup/test for streaming job - performance testing DDL: CREATE TABLE user ( id BIGINT, age INT, description STRING ) WITH ( 'conne

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-22 Thread Dian Fu
Thanks Jingsong for bringing up this discussion. +1 to this proposal. I think Bowen's proposal makes much sense to me. This is also a painful problem for PyFlink users. Currently there is no built-in easy-to-use table source/sink and it requires users to write a lot of code to trying out PyFlin

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-21 Thread Jark Wu
+1 to Bowen's proposal. I also saw many requirements on such built-in connectors. I will leave some my thoughts here: > 1. datagen source (random source) I think we can merge the functinality of sequence-source into random source to allow users to custom their data values. Flink can generate rand

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-20 Thread Bowen Li
+1. I would suggest to take a step even further and see what users really need to test/try/play with table API and Flink SQL. Besides this one, here're some more sources and sinks that I have developed or used previously to facilitate building Flink table/SQL pipelines. 1. random input data s

[DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-19 Thread Jingsong Li
Hi all, I heard some users complain that table is difficult to test. Now with SQL client, users are more and more inclined to use it to test rather than program. The most common example is Kafka source. If users need to test their SQL output and checkpoint, they need to: - 1.Launch a Kafka standa