How to create a stream of data batches

Andres R. Masegosa Fri, 04 Sep 2015 04:01:35 -0700

Hi,

I'm trying to code some machine learning algorithms on top of flink such
as a variational Bayes learning algorithms. Instead of working at a data
element level (i.e. using map transformations), it would be far more
efficient to work at a "batch of elements" levels (i.e. I get a batch of
elements and I produce some output).


I could code that using "mapPartition" function. But I can not control
the size of the partition, isn't?

Is there any way to transform a stream (or DataSet) of elements in a
stream (or DataSet) of data batches with the same size?


Thanks for your support,
Andres

How to create a stream of data batches

Reply via email to