On 24 Jun 2015, at 16:22, Aaron Jackson <ajack...@pobox.com> wrote: > Thanks. My setup is actually 3 task managers x 4 slots. I played with the > parallelism and found that at low values, the error did not occur. I can > only conclude that there is some form of data shuffling that is occurring > that is sensitive to the data source. Yes, seems a little odd to me as well. > OOC, did you load the file into HDFS or use it from a local file system > (e.g. file:///tmp/data.csv) - my results have shown that so far, HDFS does > not appear to be sensitive to this issue. > > I updated the example to include my configuration and slaves, but for > brevity, I'll include the configurable bits here: > > jobmanager.rpc.address: host01 > jobmanager.rpc.port: 6123 > jobmanager.heap.mb: 512 > taskmanager.heap.mb: 2048 > taskmanager.numberOfTaskSlots: 4 > parallelization.degree.default: 1 > jobmanager.web.port: 8081 > webclient.port: 8080 > taskmanager.network.numberOfBuffers: 8192 > taskmanager.tmp.dirs: /datassd/flink/tmp > > And the slaves ... > > host01 > host02 > host03 > > I did notice an extra empty line at the end of the slaves. And while I > highly doubt it makes ANY difference, I'm still going to re-run with it > removed. > > Thanks for looking into it.
Thank you for being so helpful. I've tried it with the local filesystem. On 23 Jun 2015, at 07:11, Aaron Jackson <ajack...@pobox.com> wrote: > I have 12 task managers across 3 machines - so it's a small setup. Sorry for my misunderstanding. I've tried it with both 12 task managers and 3 as well now. What's odd is that the stack trace shows that it is trying to connect to "localhost" for the remote channel although localhost is not configured anywhere. Let me think about that. ;) – Ufuk