[ https://issues.apache.org/jira/browse/FLINK-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316320#comment-15316320 ]
Flavio Pompermaier commented on FLINK-3777: ------------------------------------------- Hi Stephan, we decided to introduce the open/close-IF functions to give the possibility, where necessary, to properly handle the initialization and destruction of an IF. I asked for such a feature during the implementation of an efficient JDBC connector for a huge table (11 billions of rows) where the creation and destruction of a JDBC connection become a very expensive operation during the job because a new connection was created millions of times. In the former implementation, without those methods, I had to write a custom IF with a connection-pool to overcome this problem. Since Flink is supposed to be a tool for big data I thought this was a must-have feature instead of a corner case.. BTW, I think you're referring to https://issues.apache.org/jira/browse/FLINK-4024. I quickly looked at that code and I'm not fully convinced that the main problem is the introduction of those 2 new methods..first of all the FileSourceFunction actually seems to be not related to File at all, it's something more generic. Second, as I stated at the very beginning of this thread, open() and close() are actually referred to splits and not to the IF (openSplit and closeSplit would help in readability of the code) and third, a proper call to open/close-IF not only improves readability of the code but also force a developer to detect possible bad usage of an IF. Summarizing, IMHO it's better to fix FLINK-4024 wrt reverting all this PR that enhance the flexibility of Flink when dealing with real big data. > Add open and close methods to manage IF lifecycle > ------------------------------------------------- > > Key: FLINK-3777 > URL: https://issues.apache.org/jira/browse/FLINK-3777 > Project: Flink > Issue Type: Improvement > Components: Core > Affects Versions: 1.0.1 > Reporter: Flavio Pompermaier > Assignee: Flavio Pompermaier > Labels: inputformat, lifecycle > > At the moment the opening and closing of an inputFormat are not managed, > although open() could be (improperly IMHO) simulated by configure(). > This limits the possibility to reuse expensive resources (like database > connections) and manage their release. > Probably the best option would be to add 2 methods (i.e. openInputformat() > and closeInputFormat() ) to RichInputFormat* > * NOTE: the best option from a "semantic" point of view would be to rename > the current open() and close() to openSplit() and closeSplit() respectively > while using open() and close() methods for the IF lifecycle management, but > this would cause a backward compatibility issue... -- This message was sent by Atlassian JIRA (v6.3.4#6332)