An InputFormat object processes several InputSplits, so open() is repeatedly called on the same object. I suggest to create the connection in the first open() call and reuse it in all subsequent open() calls.
So no pool at all ;-) 2016-04-14 17:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > I didn't understand what you mean for "it should also be possible to reuse > the same connection of an InputFormat across InputSplits, i.e., calls of > the open() method". > At the moment in the open method there's a call to establishConnection, > thus, a new connection is created for each split. > If I understood correctly, you're suggesting to create a pool in the > inputFormat and simply call poo.borrow() in the open() rather than > establishConnection? > > On 14 Apr 2016 17:28, "Chesnay Schepler" <ches...@apache.org> wrote: > > > On 14.04.2016 17:22, Fabian Hueske wrote: > > > >> Hi Flavio, > >> > >> that are good questions. > >> > >> 1) Replacing null values by default values and simply forwarding records > >> is > >> very dangerous, in my opinion. > >> I see two alternatives: A) we use a data type that tolerates null > values. > >> This could be a POJO that the user has to provide or Row. The drawback > of > >> Row is that it is untyped and not easy to handle. B) We use Tuple and > add > >> an additional field that holds an Integer which serves as a bitset to > mark > >> null fields. This would be a pretty low level API though. I am leaning > >> towards the user-provided POJO option. > >> > > i would also lean towards the POJO option. > > > >> > >> 2) The JDBCInputFormat is located in a dedicated Maven module. I think > we > >> can add a dependency to that module. However, it should also be possible > >> to > >> reuse the same connection of an InputFormat across InputSplits, i.e., > >> calls > >> of the open() method. Wouldn't that be sufficient? > >> > > this is the right approach imo. > > > >> Best, Fabian > >> > >> 2016-04-14 16:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > >> > >> Hi guys, > >>> > >>> I'm integrating the comments of Chesnay to my PR but there's a couple > of > >>> thing that I'd like to discuss with the core developers. > >>> > >>> > >>> 1. about the JDBC type mapping (addValue() method at [1]: At the > >>> moment > >>> if I find a null value for a Double, the getDouble of jdbc return > >>> 0.0. > >>> Is > >>> it really the correct behaviour? Wouldn't be better to use a POJO > or > >>> the > >>> Row of datatable that can handle void? Moreover, the mapping > between > >>> SQL > >>> type and Java types varies much from the single JDBC > implementation. > >>> Wouldn't be better to rely on the Java type coming from using > >>> resultSet.getObject() to get such a mapping rather than using the > >>> ResultSetMetadata types? > >>> 2. I'd like to handle connections very efficiently because we have > a > >>> use > >>> case with billions of records and thus millions of splits and > >>> establish > >>> a > >>> new connection each time could be expensive. Would it be a problem > to > >>> add > >>> apache pool dependency to the jdbc batch connector in order to > reuase > >>> the > >>> created connections? > >>> > >>> > >>> [1] > >>> > >>> > >>> > https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java > >>> > >>> > > >