It would be great if João could help finalize the PR. In case he doesn't have the permission to update the FLIP, please reach out to me so we can get that sorted.
Op wo 10 aug. 2022 om 04:41 schreef Roc Marshal <flin...@126.com>: > Hi, > Martijn, Boto. > I just complete the design of the source and the skeleton design of the > sink at present. I think the current Flip is missing part of the sink > design. > @Boto Would you like to complete the sink part directly on the FLIP page? > Looking forward to your reply. > > On 2022/08/01 13:42:48 Martijn Visser wrote: > > Hi, > > > > There is currently already a PR submitted to port the JDBC interface to > the > > new interfaces. Can we make sure that this FLIP is being finalized, so > that > > you and other maintainers can work on getting the PRs correct and > > eventually merged in? > > > > Best regards, > > > > Martijn > > > > Op ma 4 jul. 2022 om 16:38 schreef Martijn Visser < > martijnvis...@apache.org > > >: > > > > > Hi Roc, > > > > > > Thanks for the FLIP and opening the discussion. I have a couple of > initial > > > questions/remarks: > > > > > > * The FLIP contains information for both Source and Sink, but nothing > > > explicitly on the Lookup functionality. I'm assuming we also want to > have > > > that implementation covered while porting this to the new interfaces. > > > * The FLIP mentions porting to both the new Source and the new Sink > API, > > > but the FLIP only contains detailed information on the Source. Are you > > > planning to add that to the FLIP before casting a vote? Because the > > > discussion should definitely be resolved for both the Source and the > Sink. > > > > > > Best regards, > > > > > > Martijn > > > > > > Op za 2 jul. 2022 om 06:35 schreef Roc Marshal <fl...@126.com>: > > > > > >> Hi, Weike. > > >> > > >> Thank you for your reply > > >> As you said, too many splits stored in SourceEnumerator will increase > the > > >> load of JM. > > >> What do you think if we introduce a capacity of splits in > > >> SourceEnumerator to limit the total number, and introduce a reject or > > >> callback mechanism with too many splits in the timely generation > strategy > > >> to solve this problem? > > >> Looking forward to a better solution . > > >> > > >> Best regards, > > >> Roc Marshal > > >> > > >> On 2022/07/01 07:58:22 Dong Weike wrote: > > >> > Hi, > > >> > > > >> > Thank you for bringing this up, and I am +1 for this feature. > > >> > > > >> > IMO, one important thing that I would like to mention is that an > > >> improperly-designed FLIP-27 connector could impose very severe memory > > >> pressure on the JobManager, especially when there are enormous number > of > > >> splits for the source tables, e.g. there are billions of records to > read. > > >> Frankly speaking, we have been haunted by this problem for a long > time when > > >> using the Flink CDC Connectors to read large tables. > > >> > > > >> > Therefore, in order to prevent JobManager from experiencing frequent > > >> OOM faults, JdbcSourceEnumerator should avoid saving too many > > >> JdbcSourceSplits in the unassigned list. And it would be better if > all the > > >> splits would be computed on the fly. > > >> > > > >> > Best, > > >> > Weike > > >> > > > >> > -----邮件原件----- > > >> > 发件人: Lijie Wang <wa...@gmail.com> > > >> > 发送时间: 2022年7月1日 上午 10:25 > > >> > 收件人: dev@flink.apache.org > > >> > 主题: Re: Re: [DISCUSS] FLIP-239: Port JDBC Connector Source to > FLIP-27 > > >> > > > >> > Hi Roc, > > >> > > > >> > Thanks for driving the discussion. > > >> > > > >> > Could you describe in detail what the JdbcSourceSplit represents? It > > >> looks like something wrong with the comments of JdbcSourceSplit in > FLIP(it > > >> describe as "A {@link SourceSplit} that represents a file, or a > region of a > > >> file...."). > > >> > > > >> > Best, > > >> > Lijie > > >> > > > >> > > > >> > Roc Marshal <fl...@126.com> 于2022年6月30日周四 21:41写道: > > >> > > > >> > > Hi, Boto. > > >> > > Thanks for your reply. > > >> > > > > >> > > +1 to me on watermark strategy definition in ‘streaming’ & > table > > >> > > source. I'm not sure if FLIP-202[1] is suitable for a separate > > >> > > discussion, but I think your proposal is very helpful to the new > > >> > > source. It would be great if the new source could be compatible > with > > >> this abstraction. > > >> > > > > >> > > In addition, whether we need to support such a special bounded > > >> > > scenario abstraction? > > >> > > The number of JdbcSourceSplit is certain, but the time to > generate > > >> > > all JdbcSourceSplit completely is not certain in the user defined > > >> > > implementation. When the condition that the JdbcSourceSplit > > >> > > generate-process end is met, the JdbcSourceSplit will not be > > >> generated. > > >> > > After all JdbcSourceSplit processing is completed, the reader > will be > > >> > > notified that there are no more JdbcSourceSplit from > > >> > > JdbcSourceSplitEnumerator. > > >> > > > > >> > > - [1] > > >> > > > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-202%3A+Introduc > > >> > > e+ClickHouse+Connector > > >> > > > > >> > > Best regards, > > >> > > Roc Marshal > > >> > > > > >> > > On 2022/06/30 09:02:23 João Boto wrote: > > >> > > > Hi, > > >> > > > > > >> > > > On source we could improve the JdbcParameterValuesProvider.. to > be > > >> > > defined as a query(s) or something more dynamic. > > >> > > > The most time if your job is dynamic or have some condition to > be > > >> > > > met > > >> > > (based on data on table) you have to create a connection an get > that > > >> > > info from database > > >> > > > > > >> > > > If we are going to create/allow a "streaming" jdbc source, we > > >> should > > >> > > > be > > >> > > able to define watermark and get new data from table using that > > >> watermark.. > > >> > > > > > >> > > > > > >> > > > For the sink (but it could apply on source) will be great to be > > >> able > > >> > > > to > > >> > > set your implementation of the connection type.. For example if > you > > >> > > are connecting to clickhouse, be able to set a implementation > based > > >> on > > >> > > "BalancedClickhouseDataSource" for example (in this[1] > implementation > > >> > > we have a example) or set a extension version of a implementation > for > > >> > > debug purpose > > >> > > > > > >> > > > Regards > > >> > > > > > >> > > > > > >> > > > [1] > > >> > > > > >> > https://github.com/apache/flink/pull/20097/files#diff-8b36e3403381dc14 > > >> > > c748aeb5de0b4ceb7d7daec39594b1eacff1694b5266419d > > >> > > > > > >> > > > On 2022/06/27 13:09:51 Roc Marshal wrote: > > >> > > > > Hi, all, > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > I would like to open a discussion on porting JDBC Source to > new > > >> > > > > Source > > >> > > API (FLIP-27[1]). > > >> > > > > > > >> > > > > Martijn Visser, Jing Ge and I had a preliminary discussion on > the > > >> > > > > JIRA > > >> > > FLINK-25420[2] and planed to start the discussion about the source > > >> > > part first. > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > Please let me know: > > >> > > > > > > >> > > > > - The issues about old Jdbc source you encountered; > > >> > > > > - The new feature or design you want; > > >> > > > > - More suggestions from other dimensions... > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > You could find more details in FLIP-239[3]. > > >> > > > > > > >> > > > > Looking forward to your feedback. > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > [1] > > >> > > > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+ > > >> > > Source+Interface > > >> > > > > > > >> > > > > [2] https://issues.apache.org/jira/browse/FLINK-25420 > > >> > > > > > > >> > > > > [3] > > >> > > > > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=21738 > > >> > > 6271 > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > Best regards, > > >> > > > > > > >> > > > > Roc Marshal > > >> > > > > > >> > > > >> > > > > >