Hi Flavio,

I think the easiest solution is to read the CSV file with the
CsvInputFormat and use a subsequent MapPartition to batch 1000 rows
together.
In each partition, you might end up with an incomplete batch.
However, I don't see yet how you can feed these batches into the
JdbcInputFormat which does not accept a DataSet as input. You could create
a RichMapFunction that contains the logic of the JdbcInputFormat to
directly query the database with the input of the MapPartitionOperator.

If you want to use the DataStream API, you can use a tumbling count window
to group IDs together and query the external database in a subsequent Map
operator.

Hope this helps,
Fabian


2016-11-28 18:32 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>:

> Hi to all,
>
> I have a use case where I have to read a huge csv containing ids to fetch
> from a table in a db.
> The jdbc input format can handle parameterized queries so I was thinking
> to fetch data using 1000 id at a time. What is the easiest whay to divide a
> dataset by slices of 1000 ids each (in order to create parameters for my
> JDBC Input format)? Is that possible?
> Or maybe there's an easiest solutions using streaming APIs?
>
> Best,
> Flavio
>

Reply via email to