Re: Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Greg Hogan
Hi Till, I appreciate the detailed explanation. My specific case has been with the graph generators. I think it is possible to implement some random sources using SplittableIterator rather than building a Collection, so it might be best to rework the graph generator API to better fit the Flink mod

Re: Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Till Rohrmann
Hi Greg, I think we haven't discussed the opportunity for a parallelized collection input format, yet. Thanks for bringing this up. I think it should be possible to implement a generic parallel collection input format. However, I have two questions here: 1. Is it really a problem for users that

Parallelizing ExecutionConfig.fromCollection

2016-04-25 Thread Greg Hogan
Hi, CollectionInputFormat currently enforces a parallelism of 1 by implementing NonParallelInput and serializing the entire Collection. If my understanding is correct this serialized InputFormat is often the cause of a new job exceeding the akka message size limit. As an alternative the Collectio