Re: FLINK-3750 (JDBCInputFormat)

Chesnay Schepler Thu, 14 Apr 2016 09:08:43 -0700

no.

if (connection==null) {
 establishCOnnection();
}


done. same connection for all splits.

On 14.04.2016 17:59, Flavio Pompermaier wrote:

I didn't understand what you mean for "it should also be possible to reuse
the same connection of an InputFormat across InputSplits, i.e., calls of
the open() method".
At the moment in the open method there's a call to establishConnection,
thus, a new connection is created for each split.
If I understood correctly, you're suggesting to create a pool in the
inputFormat and simply call poo.borrow() in the open() rather than
establishConnection?

On 14 Apr 2016 17:28, "Chesnay Schepler" <[email protected]> wrote:

On 14.04.2016 17:22, Fabian Hueske wrote:

Hi Flavio,

that are good questions.

1) Replacing null values by default values and simply forwarding records
is
very dangerous, in my opinion.
I see two alternatives: A) we use a data type that tolerates null values.
This could be a POJO that the user has to provide or Row. The drawback of
Row is that it is untyped and not easy to handle. B) We use Tuple and add
an additional field that holds an Integer which serves as a bitset to mark
null fields. This would be a pretty low level API though. I am leaning
towards the user-provided POJO option.

i would also lean towards the POJO option.

2) The JDBCInputFormat is located in a dedicated Maven module. I think we
can add a dependency to that module. However, it should also be possible
to
reuse the same connection of an InputFormat across InputSplits, i.e.,
calls
of the open() method. Wouldn't that be sufficient?

this is the right approach imo.

Best, Fabian

2016-04-14 16:59 GMT+02:00 Flavio Pompermaier <[email protected]>:

Hi guys,

I'm integrating the comments of Chesnay to my PR but there's a couple of
thing that I'd like to discuss with the core developers.


     1. about the JDBC type mapping (addValue() method at [1]: At the
moment
     if I find a null value for a  Double, the getDouble of jdbc return
0.0.
Is
     it really the correct behaviour? Wouldn't be better to use a POJO or
the
     Row of datatable that can handle void? Moreover, the mapping between
SQL
     type and Java types varies much from the single JDBC implementation.
     Wouldn't be better to rely on the Java type coming from using
     resultSet.getObject() to get such a mapping rather than using the
     ResultSetMetadata types?
     2. I'd like to handle connections very efficiently because we have a
use
     case with billions of records and thus millions of splits and
establish
a
     new connection each time could be expensive. Would it be a problem to
add
     apache pool dependency to the jdbc batch connector in order to reuase
the
     created connections?


[1]


https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java

Re: FLINK-3750 (JDBCInputFormat)

Reply via email to