Hi Jay! You can use the "fromCollection()" or "fromElements()" method to create a DataSet or DataStream from a Java/Scala collection. That moves the data into the cluster and allows you to run parallel transformations on the elements.
Make sure you set the parallelism of the operation that you want to be parallel. Here is a code sample: ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<MyType> data = env.fromElements(myArray); data.map(new TrasactionMapper()).setParallelism(80); // makes sure you have 80 mappers Stephan On Sun, Jul 12, 2015 at 3:04 PM, jay vyas <jayunit100.apa...@gmail.com> wrote: > Hi flink. > > Im happy to announce that ive done a small bit of initial hacking on > bigpetstore-flink, in order to represent what we do in spark in flink. > > TL;DR the main question is at the bottom! > > Currently, i want to generate transactions for a list of customers. The > generation of transactions is a parallel process, and the customers are > generated beforehand. > > In hadoop , we can create an input format with custom splits if we want to > split a data set up, otherwise, we can break it into files. > > in spark, there is a conveneint "parallelize" which we can run on a list, > which we can then capture the RDD from , and run a parallelized transform. > > In flink, i have an array of "customers" and i want to parallelize our > transaction generator for each customer. How would i do that? > > -- > jay vyas >