On 14.04.2016 17:22, Fabian Hueske wrote:
Hi Flavio,
that are good questions.
1) Replacing null values by default values and simply forwarding records is
very dangerous, in my opinion.
I see two alternatives: A) we use a data type that tolerates null values.
This could be a POJO that the user has to provide or Row. The drawback of
Row is that it is untyped and not easy to handle. B) We use Tuple and add
an additional field that holds an Integer which serves as a bitset to mark
null fields. This would be a pretty low level API though. I am leaning
towards the user-provided POJO option.
i would also lean towards the POJO option.
2) The JDBCInputFormat is located in a dedicated Maven module. I think we
can add a dependency to that module. However, it should also be possible to
reuse the same connection of an InputFormat across InputSplits, i.e., calls
of the open() method. Wouldn't that be sufficient?
this is the right approach imo.
Best, Fabian
2016-04-14 16:59 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
Hi guys,
I'm integrating the comments of Chesnay to my PR but there's a couple of
thing that I'd like to discuss with the core developers.
1. about the JDBC type mapping (addValue() method at [1]: At the moment
if I find a null value for a Double, the getDouble of jdbc return 0.0.
Is
it really the correct behaviour? Wouldn't be better to use a POJO or the
Row of datatable that can handle void? Moreover, the mapping between SQL
type and Java types varies much from the single JDBC implementation.
Wouldn't be better to rely on the Java type coming from using
resultSet.getObject() to get such a mapping rather than using the
ResultSetMetadata types?
2. I'd like to handle connections very efficiently because we have a use
case with billions of records and thus millions of splits and establish
a
new connection each time could be expensive. Would it be a problem to
add
apache pool dependency to the jdbc batch connector in order to reuase
the
created connections?
[1]
https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java