That was it. host3 was showing localhost - looked a little further and it was missing an entry in /etc/hosts.
Thanks for looking into this. Aaron On Wed, Jun 24, 2015 at 2:13 PM, Stephan Ewen <se...@apache.org> wrote: > Aaron, > > Can you check how the TaskManagers register at the JobManager? When you > look at the 'TaskManagers' section in the JobManager's web Interface (at > port 8081), what does it say as the TaskManager host names? > > Does it list "host1", "host2", "host3"...? > > Thanks, > Stephan > Am 24.06.2015 20:31 schrieb "Ufuk Celebi" <u...@apache.org>: > >> On 24 Jun 2015, at 16:22, Aaron Jackson <ajack...@pobox.com> wrote: >> >> > Thanks. My setup is actually 3 task managers x 4 slots. I played with >> the parallelism and found that at low values, the error did not occur. I >> can only conclude that there is some form of data shuffling that is >> occurring that is sensitive to the data source. Yes, seems a little odd to >> me as well. OOC, did you load the file into HDFS or use it from a local >> file system (e.g. file:///tmp/data.csv) - my results have shown that so >> far, HDFS does not appear to be sensitive to this issue. >> > >> > I updated the example to include my configuration and slaves, but for >> brevity, I'll include the configurable bits here: >> > >> > jobmanager.rpc.address: host01 >> > jobmanager.rpc.port: 6123 >> > jobmanager.heap.mb: 512 >> > taskmanager.heap.mb: 2048 >> > taskmanager.numberOfTaskSlots: 4 >> > parallelization.degree.default: 1 >> > jobmanager.web.port: 8081 >> > webclient.port: 8080 >> > taskmanager.network.numberOfBuffers: 8192 >> > taskmanager.tmp.dirs: /datassd/flink/tmp >> > >> > And the slaves ... >> > >> > host01 >> > host02 >> > host03 >> > >> > I did notice an extra empty line at the end of the slaves. And while I >> highly doubt it makes ANY difference, I'm still going to re-run with it >> removed. >> > >> > Thanks for looking into it. >> >> Thank you for being so helpful. I've tried it with the local filesystem. >> >> On 23 Jun 2015, at 07:11, Aaron Jackson <ajack...@pobox.com> wrote: >> >> > I have 12 task managers across 3 machines - so it's a small setup. >> >> Sorry for my misunderstanding. I've tried it with both 12 task managers >> and 3 as well now. What's odd is that the stack trace shows that it is >> trying to connect to "localhost" for the remote channel although localhost >> is not configured anywhere. Let me think about that. ;) >> >> – Ufuk >> >> >> >> >> >>