Hi Flavio,

sorry for not replying earlier.
I think there is definitely need to improve the JdbcInputFormat.
All your points wrt to the current JdbcInputFormat are valid and fixing
them would be a big improvement and highly welcome contribution, IMO.

I am not so sure about adding a flink-sqoop module to Flink.
How much better/faster would flink-sqoop be compared to Apache Scoop. With
YARN it is easy to use two frameworks side-by-side.
Maybe you can share a few details about your use case / environment and why
flink-sqoop would be a good addition.

Best, Fabian


2016-04-15 10:03 GMT+02:00 Stefano Bortoli <s.bort...@gmail.com>:

> Hi Flavio,
>
> I think this can be very handy when you have to run jobs Sqoop-like but you
> need to run the process with few resources. As for Cascading, Flink could
> do the heavy-lifting and make the scan of large relational databases more
> robust. Of course to make it work in real world, the JDBC Input format must
> be improved. Besides parallelism, null values, and related inputsplit, we
> need to find a way to map properly the Java types towards the database
> types. Probably having a wrapper POJO implementing cast/tranformation
> policy passed as a parameter of the InputFormat could do. Another thing we
> need to take care of is the management of connections, which can be very
> costly if the database is particularly large.
>
>
> saluti,
> Stefano
>
>
>
> 2016-04-13 12:45 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:
>
> > Hi to all,
> > we've recently migrated our sqoop[1] import process to a Flink job, using
> > an improved version of the Flink JDBC Input Format[2] that is able to
> > exploit the parallelism of the cluster (the current Flink version
> > implements NonParallelInput).
> >
> > Still need to improve the mapping part of sql types to java ones (in the
> > addValue method IMHO) but this could be the basis for a flink-sqoop
> module
> > that will incrementally cover the sqoop functionalities when requested.
> > Do you think that such a module could be of interest for Flink or not?
> >
> > [1] https://sqoop.apache.org/
> > [2]
> https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed351
> >
> > Best,
> > Flavio
> >
>

Reply via email to