Hi Flavio, I think this can be very handy when you have to run jobs Sqoop-like but you need to run the process with few resources. As for Cascading, Flink could do the heavy-lifting and make the scan of large relational databases more robust. Of course to make it work in real world, the JDBC Input format must be improved. Besides parallelism, null values, and related inputsplit, we need to find a way to map properly the Java types towards the database types. Probably having a wrapper POJO implementing cast/tranformation policy passed as a parameter of the InputFormat could do. Another thing we need to take care of is the management of connections, which can be very costly if the database is particularly large.
saluti, Stefano 2016-04-13 12:45 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > Hi to all, > we've recently migrated our sqoop[1] import process to a Flink job, using > an improved version of the Flink JDBC Input Format[2] that is able to > exploit the parallelism of the cluster (the current Flink version > implements NonParallelInput). > > Still need to improve the mapping part of sql types to java ones (in the > addValue method IMHO) but this could be the basis for a flink-sqoop module > that will incrementally cover the sqoop functionalities when requested. > Do you think that such a module could be of interest for Flink or not? > > [1] https://sqoop.apache.org/ > [2] https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed351 > > Best, > Flavio >