Hi Flavio, sorry for not replying earlier. I think there is definitely need to improve the JdbcInputFormat. All your points wrt to the current JdbcInputFormat are valid and fixing them would be a big improvement and highly welcome contribution, IMO.
I am not so sure about adding a flink-sqoop module to Flink. How much better/faster would flink-sqoop be compared to Apache Scoop. With YARN it is easy to use two frameworks side-by-side. Maybe you can share a few details about your use case / environment and why flink-sqoop would be a good addition. Best, Fabian 2016-04-15 10:03 GMT+02:00 Stefano Bortoli <s.bort...@gmail.com>: > Hi Flavio, > > I think this can be very handy when you have to run jobs Sqoop-like but you > need to run the process with few resources. As for Cascading, Flink could > do the heavy-lifting and make the scan of large relational databases more > robust. Of course to make it work in real world, the JDBC Input format must > be improved. Besides parallelism, null values, and related inputsplit, we > need to find a way to map properly the Java types towards the database > types. Probably having a wrapper POJO implementing cast/tranformation > policy passed as a parameter of the InputFormat could do. Another thing we > need to take care of is the management of connections, which can be very > costly if the database is particularly large. > > > saluti, > Stefano > > > > 2016-04-13 12:45 GMT+02:00 Flavio Pompermaier <pomperma...@okkam.it>: > > > Hi to all, > > we've recently migrated our sqoop[1] import process to a Flink job, using > > an improved version of the Flink JDBC Input Format[2] that is able to > > exploit the parallelism of the cluster (the current Flink version > > implements NonParallelInput). > > > > Still need to improve the mapping part of sql types to java ones (in the > > addValue method IMHO) but this could be the basis for a flink-sqoop > module > > that will incrementally cover the sqoop functionalities when requested. > > Do you think that such a module could be of interest for Flink or not? > > > > [1] https://sqoop.apache.org/ > > [2] > https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed351 > > > > Best, > > Flavio > > >