Re: Connecting the channel failed: Connection refused

2015-06-25 Thread Stephan Ewen
That makes perfect sense, thanks! Am 25.06.2015 21:39 schrieb "Aaron Jackson" : > So the JobManager was running on host1. This also explains why I didn't > see the problem until I had asked for a sizeable degree of parallelism > since it probably never assigned a task to host3. > > Thanks for you

Re: Connecting the channel failed: Connection refused

2015-06-25 Thread Aaron Jackson
So the JobManager was running on host1. This also explains why I didn't see the problem until I had asked for a sizeable degree of parallelism since it probably never assigned a task to host3. Thanks for your help On Thu, Jun 25, 2015 at 3:34 AM, Stephan Ewen wrote: > Nice! > > TaskManagers ne

Re: Connecting the channel failed: Connection refused

2015-06-25 Thread Stephan Ewen
Nice! TaskManagers need to announce where they listen for connections. We do not yet block "localhost" as an acceptable address, to not prohibit local test setups. There are some routines that try to select an interface that can communicate with the outside world. Is host3 running on the same m

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Aaron Jackson
That was it. host3 was showing localhost - looked a little further and it was missing an entry in /etc/hosts. Thanks for looking into this. Aaron On Wed, Jun 24, 2015 at 2:13 PM, Stephan Ewen wrote: > Aaron, > > Can you check how the TaskManagers register at the JobManager? When you > look at

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Stephan Ewen
Aaron, Can you check how the TaskManagers register at the JobManager? When you look at the 'TaskManagers' section in the JobManager's web Interface (at port 8081), what does it say as the TaskManager host names? Does it list "host1", "host2", "host3"...? Thanks, Stephan Am 24.06.2015 20:31 schr

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Ufuk Celebi
On 24 Jun 2015, at 16:22, Aaron Jackson wrote: > Thanks. My setup is actually 3 task managers x 4 slots. I played with the > parallelism and found that at low values, the error did not occur. I can > only conclude that there is some form of data shuffling that is occurring > that is sensiti

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Aaron Jackson
Thanks. My setup is actually 3 task managers x 4 slots. I played with the parallelism and found that at low values, the error did not occur. I can only conclude that there is some form of data shuffling that is occurring that is sensitive to the data source. Yes, seems a little odd to me as wel

Re: Connecting the channel failed: Connection refused

2015-06-24 Thread Ufuk Celebi
Hey Aaron, thanks for preparing the example. I've checked it out and tried it with a similar setup (12 task managers with 1 slots each, running the job with parallelism of 12). I couldn't reproduce the problem. What have you configured in the "slaves" file? I think Flink does not allow you to

Re: Connecting the channel failed: Connection refused

2015-06-23 Thread Aaron Jackson
Yes, the task manager continues running. I have put together a test app to demonstrate the problem and in doing so noticed some oddities. The problem manifests itself on a simple join (I originally believed it was the distinct, I was wrong). - When the source is generated via fromCollection()

Re: Connecting the channel failed: Connection refused

2015-06-22 Thread Ufuk Celebi
Hey Aaron, thanks for reporting the issue. You are right that the Exception is thrown during a shuffle. The receiver initiates a TCP connection to receive all the data for the join. A failing connect usually means that there respective TaskManager is not running. Can you check whether all expe