Hi Flavio, I think the easiest solution is to read the CSV file with the CsvInputFormat and use a subsequent MapPartition to batch 1000 rows together. In each partition, you might end up with an incomplete batch. However, I don't see yet how you can feed these batches into the JdbcInputFormat which does not accept a DataSet as input. You could create a RichMapFunction that contains the logic of the JdbcInputFormat to directly query the database with the input of the MapPartitionOperator.
If you want to use the DataStream API, you can use a tumbling count window to group IDs together and query the external database in a subsequent Map operator. Hope this helps, Fabian 2016-11-28 18:32 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > Hi to all, > > I have a use case where I have to read a huge csv containing ids to fetch > from a table in a db. > The jdbc input format can handle parameterized queries so I was thinking > to fetch data using 1000 id at a time. What is the easiest whay to divide a > dataset by slices of 1000 ids each (in order to create parameters for my > JDBC Input format)? Is that possible? > Or maybe there's an easiest solutions using streaming APIs? > > Best, > Flavio >