Re: Sqoop-like module in Flink

Stefano Bortoli Fri, 15 Apr 2016 01:04:45 -0700

Hi Flavio,

I think this can be very handy when you have to run jobs Sqoop-like but you
need to run the process with few resources. As for Cascading, Flink could
do the heavy-lifting and make the scan of large relational databases more
robust. Of course to make it work in real world, the JDBC Input format must
be improved. Besides parallelism, null values, and related inputsplit, we
need to find a way to map properly the Java types towards the database
types. Probably having a wrapper POJO implementing cast/tranformation
policy passed as a parameter of the InputFormat could do. Another thing we
need to take care of is the management of connections, which can be very
costly if the database is particularly large.



saluti,
Stefano



2016-04-13 12:45 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>:

> Hi to all,
> we've recently migrated our sqoop[1] import process to a Flink job, using
> an improved version of the Flink JDBC Input Format[2] that is able to
> exploit the parallelism of the cluster (the current Flink version
> implements NonParallelInput).
>
> Still need to improve the mapping part of sql types to java ones (in the
> addValue method IMHO) but this could be the basis for a flink-sqoop module
> that will incrementally cover the sqoop functionalities when requested.
> Do you think that such a module could be of interest for Flink or not?
>
> [1] https://sqoop.apache.org/
> [2] https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed351
>
> Best,
> Flavio
>

Re: Sqoop-like module in Flink

Reply via email to