An InputFormat object processes several InputSplits, so open() is
repeatedly called on the same object.
I suggest to create the connection in the first open() call and reuse it in
all subsequent open() calls.

So no pool at all ;-)

2016-04-14 17:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:

> I didn't understand what you mean for "it should also be possible to reuse
> the same connection of an InputFormat across InputSplits, i.e., calls of
> the open() method".
> At the moment in the open method there's a call to establishConnection,
> thus, a new connection is created for each split.
> If I understood correctly, you're suggesting to create a pool in the
> inputFormat and simply call poo.borrow() in the open() rather than
> establishConnection?
>
> On 14 Apr 2016 17:28, "Chesnay Schepler" <ches...@apache.org> wrote:
>
> > On 14.04.2016 17:22, Fabian Hueske wrote:
> >
> >> Hi Flavio,
> >>
> >> that are good questions.
> >>
> >> 1) Replacing null values by default values and simply forwarding records
> >> is
> >> very dangerous, in my opinion.
> >> I see two alternatives: A) we use a data type that tolerates null
> values.
> >> This could be a POJO that the user has to provide or Row. The drawback
> of
> >> Row is that it is untyped and not easy to handle. B) We use Tuple and
> add
> >> an additional field that holds an Integer which serves as a bitset to
> mark
> >> null fields. This would be a pretty low level API though. I am leaning
> >> towards the user-provided POJO option.
> >>
> > i would also lean towards the POJO option.
> >
> >>
> >> 2) The JDBCInputFormat is located in a dedicated Maven module. I think
> we
> >> can add a dependency to that module. However, it should also be possible
> >> to
> >> reuse the same connection of an InputFormat across InputSplits, i.e.,
> >> calls
> >> of the open() method. Wouldn't that be sufficient?
> >>
> > this is the right approach imo.
> >
> >> Best, Fabian
> >>
> >> 2016-04-14 16:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
> >>
> >> Hi guys,
> >>>
> >>> I'm integrating the comments of Chesnay to my PR but there's a couple
> of
> >>> thing that I'd like to discuss with the core developers.
> >>>
> >>>
> >>>     1. about the JDBC type mapping (addValue() method at [1]: At the
> >>> moment
> >>>     if I find a null value for a  Double, the getDouble of jdbc return
> >>> 0.0.
> >>> Is
> >>>     it really the correct behaviour? Wouldn't be better to use a POJO
> or
> >>> the
> >>>     Row of datatable that can handle void? Moreover, the mapping
> between
> >>> SQL
> >>>     type and Java types varies much from the single JDBC
> implementation.
> >>>     Wouldn't be better to rely on the Java type coming from using
> >>>     resultSet.getObject() to get such a mapping rather than using the
> >>>     ResultSetMetadata types?
> >>>     2. I'd like to handle connections very efficiently because we have
> a
> >>> use
> >>>     case with billions of records and thus millions of splits and
> >>> establish
> >>> a
> >>>     new connection each time could be expensive. Would it be a problem
> to
> >>> add
> >>>     apache pool dependency to the jdbc batch connector in order to
> reuase
> >>> the
> >>>     created connections?
> >>>
> >>>
> >>> [1]
> >>>
> >>>
> >>>
> https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java
> >>>
> >>>
> >
>

Reply via email to