Hi, Weike. Thank you for your reply As you said, too many splits stored in SourceEnumerator will increase the load of JM. What do you think if we introduce a capacity of splits in SourceEnumerator to limit the total number, and introduce a reject or callback mechanism with too many splits in the timely generation strategy to solve this problem? Looking forward to a better solution .
Best regards, Roc Marshal On 2022/07/01 07:58:22 Dong Weike wrote: > Hi, > > Thank you for bringing this up, and I am +1 for this feature. > > IMO, one important thing that I would like to mention is that an > improperly-designed FLIP-27 connector could impose very severe memory > pressure on the JobManager, especially when there are enormous number of > splits for the source tables, e.g. there are billions of records to read. > Frankly speaking, we have been haunted by this problem for a long time when > using the Flink CDC Connectors to read large tables. > > Therefore, in order to prevent JobManager from experiencing frequent OOM > faults, JdbcSourceEnumerator should avoid saving too many JdbcSourceSplits in > the unassigned list. And it would be better if all the splits would be > computed on the fly. > > Best, > Weike > > -----邮件原件----- > 发件人: Lijie Wang <wa...@gmail.com> > 发送时间: 2022年7月1日 上午 10:25 > 收件人: dev@flink.apache.org > 主题: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to FLIP-27 > > Hi Roc, > > Thanks for driving the discussion. > > Could you describe in detail what the JdbcSourceSplit represents? It looks > like something wrong with the comments of JdbcSourceSplit in FLIP(it describe > as "A {@link SourceSplit} that represents a file, or a region of a file...."). > > Best, > Lijie > > > Roc Marshal <fl...@126.com> 于2022年6月30日周四 21:41写道: > > > Hi, Boto. > > Thanks for your reply. > > > > +1 to me on watermark strategy definition in ‘streaming’ & table > > source. I'm not sure if FLIP-202[1] is suitable for a separate > > discussion, but I think your proposal is very helpful to the new > > source. It would be great if the new source could be compatible with this > > abstraction. > > > > In addition, whether we need to support such a special bounded > > scenario abstraction? > > The number of JdbcSourceSplit is certain, but the time to generate > > all JdbcSourceSplit completely is not certain in the user defined > > implementation. When the condition that the JdbcSourceSplit > > generate-process end is met, the JdbcSourceSplit will not be generated. > > After all JdbcSourceSplit processing is completed, the reader will be > > notified that there are no more JdbcSourceSplit from > > JdbcSourceSplitEnumerator. > > > > - [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduc > > e+ClickHouse+Connector > > > > Best regards, > > Roc Marshal > > > > On 2022/06/30 09:02:23 João Boto wrote: > > > Hi, > > > > > > On source we could improve the JdbcParameterValuesProvider.. to be > > defined as a query(s) or something more dynamic. > > > The most time if your job is dynamic or have some condition to be > > > met > > (based on data on table) you have to create a connection an get that > > info from database > > > > > > If we are going to create/allow a "streaming" jdbc source, we should > > > be > > able to define watermark and get new data from table using that watermark.. > > > > > > > > > For the sink (but it could apply on source) will be great to be able > > > to > > set your implementation of the connection type.. For example if you > > are connecting to clickhouse, be able to set a implementation based on > > "BalancedClickhouseDataSource" for example (in this[1] implementation > > we have a example) or set a extension version of a implementation for > > debug purpose > > > > > > Regards > > > > > > > > > [1] > > https://github.com/apache/flink/pull/20097/files#diff-8b36e3403381dc14 > > c748aeb5de0b4ceb7d7daec39594b1eacff1694b5266419d > > > > > > On 2022/06/27 13:09:51 Roc Marshal wrote: > > > > Hi, all, > > > > > > > > > > > > > > > > > > > > I would like to open a discussion on porting JDBC Source to new > > > > Source > > API (FLIP-27[1]). > > > > > > > > Martijn Visser, Jing Ge and I had a preliminary discussion on the > > > > JIRA > > FLINK-25420[2] and planed to start the discussion about the source > > part first. > > > > > > > > > > > > > > > > Please let me know: > > > > > > > > - The issues about old Jdbc source you encountered; > > > > - The new feature or design you want; > > > > - More suggestions from other dimensions... > > > > > > > > > > > > > > > > You could find more details in FLIP-239[3]. > > > > > > > > Looking forward to your feedback. > > > > > > > > > > > > > > > > > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+ > > Source+Interface > > > > > > > > [2] https://issues.apache.org/jira/browse/FLINK-25420 > > > > > > > > [3] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=21738 > > 6271 > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Roc Marshal > > > >