Hi, Thank you for bringing this up, and I am +1 for this feature.
IMO, one important thing that I would like to mention is that an improperly-designed FLIP-27 connector could impose very severe memory pressure on the JobManager, especially when there are enormous number of splits for the source tables, e.g. there are billions of records to read. Frankly speaking, we have been haunted by this problem for a long time when using the Flink CDC Connectors to read large tables. Therefore, in order to prevent JobManager from experiencing frequent OOM faults, JdbcSourceEnumerator should avoid saving too many JdbcSourceSplits in the unassigned list. And it would be better if all the splits would be computed on the fly. Best, Weike -----邮件原件----- 发件人: Lijie Wang <wangdachui9...@gmail.com> 发送时间: 2022年7月1日 上午 10:25 收件人: dev@flink.apache.org 主题: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27 Hi Roc, Thanks for driving the discussion. Could you describe in detail what the JdbcSourceSplit represents? It looks like something wrong with the comments of JdbcSourceSplit in FLIP(it describe as "A {@link SourceSplit} that represents a file, or a region of a file...."). Best, Lijie Roc Marshal <flin...@126.com> 于2022年6月30日周四 21:41写道: > Hi, Boto. > Thanks for your reply. > > +1 to me on watermark strategy definition in ‘streaming’ & table > source. I'm not sure if FLIP-202[1] is suitable for a separate > discussion, but I think your proposal is very helpful to the new > source. It would be great if the new source could be compatible with this > abstraction. > > In addition, whether we need to support such a special bounded > scenario abstraction? > The number of JdbcSourceSplit is certain, but the time to generate > all JdbcSourceSplit completely is not certain in the user defined > implementation. When the condition that the JdbcSourceSplit > generate-process end is met, the JdbcSourceSplit will not be generated. > After all JdbcSourceSplit processing is completed, the reader will be > notified that there are no more JdbcSourceSplit from > JdbcSourceSplitEnumerator. > > - [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduc > e+ClickHouse+Connector > > Best regards, > Roc Marshal > > On 2022/06/30 09:02:23 João Boto wrote: > > Hi, > > > > On source we could improve the JdbcParameterValuesProvider.. to be > defined as a query(s) or something more dynamic. > > The most time if your job is dynamic or have some condition to be > > met > (based on data on table) you have to create a connection an get that > info from database > > > > If we are going to create/allow a "streaming" jdbc source, we should > > be > able to define watermark and get new data from table using that watermark.. > > > > > > For the sink (but it could apply on source) will be great to be able > > to > set your implementation of the connection type.. For example if you > are connecting to clickhouse, be able to set a implementation based on > "BalancedClickhouseDataSource" for example (in this[1] implementation > we have a example) or set a extension version of a implementation for > debug purpose > > > > Regards > > > > > > [1] > https://github.com/apache/flink/pull/20097/files#diff-8b36e3403381dc14 > c748aeb5de0b4ceb7d7daec39594b1eacff1694b5266419d > > > > On 2022/06/27 13:09:51 Roc Marshal wrote: > > > Hi, all, > > > > > > > > > > > > > > > I would like to open a discussion on porting JDBC Source to new > > > Source > API (FLIP-27[1]). > > > > > > Martijn Visser, Jing Ge and I had a preliminary discussion on the > > > JIRA > FLINK-25420[2] and planed to start the discussion about the source > part first. > > > > > > > > > > > > Please let me know: > > > > > > - The issues about old Jdbc source you encountered; > > > - The new feature or design you want; > > > - More suggestions from other dimensions... > > > > > > > > > > > > You could find more details in FLIP-239[3]. > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+ > Source+Interface > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-25420 > > > > > > [3] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=21738 > 6271 > > > > > > > > > > > > > > > Best regards, > > > > > > Roc Marshal > >