Hi, I'm trying to code some machine learning algorithms on top of flink such as a variational Bayes learning algorithms. Instead of working at a data element level (i.e. using map transformations), it would be far more efficient to work at a "batch of elements" levels (i.e. I get a batch of elements and I produce some output).
I could code that using "mapPartition" function. But I can not control the size of the partition, isn't? Is there any way to transform a stream (or DataSet) of elements in a stream (or DataSet) of data batches with the same size? Thanks for your support, Andres