That makes perfect sense, thanks!
Am 25.06.2015 21:39 schrieb "Aaron Jackson" <ajack...@pobox.com>:

> So the JobManager was running on host1.  This also explains why I didn't
> see the problem until I had asked for a sizeable degree of parallelism
> since it probably never assigned a task to host3.
>
> Thanks for your help
>
> On Thu, Jun 25, 2015 at 3:34 AM, Stephan Ewen <se...@apache.org> wrote:
>
>> Nice!
>>
>> TaskManagers need to announce where they listen for connections.
>>
>> We do not yet block "localhost" as an acceptable address, to not prohibit
>> local test setups.
>>
>> There are some routines that try to select an interface that can
>> communicate with the outside world.
>>
>> Is host3 running on the same machine as the JobManager? Or did you
>> experience a long delay until TaskManager 3 was registered?
>>
>> Thanks for helping us debug this,
>> Stephan
>>
>>
>>
>>
>>
>>
>> On Wed, Jun 24, 2015 at 11:58 PM, Aaron Jackson <ajack...@pobox.com>
>> wrote:
>>
>>> That was it.  host3 was showing localhost - looked a little further and
>>> it was missing an entry in /etc/hosts.
>>>
>>> Thanks for looking into this.
>>>
>>> Aaron
>>>
>>> On Wed, Jun 24, 2015 at 2:13 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>>> Aaron,
>>>>
>>>> Can you check how the TaskManagers register at the JobManager? When you
>>>> look at the 'TaskManagers' section in the JobManager's web Interface (at
>>>> port 8081), what does it say as the TaskManager host names?
>>>>
>>>> Does it list "host1", "host2", "host3"...?
>>>>
>>>> Thanks,
>>>> Stephan
>>>>  Am 24.06.2015 20:31 schrieb "Ufuk Celebi" <u...@apache.org>:
>>>>
>>>>> On 24 Jun 2015, at 16:22, Aaron Jackson <ajack...@pobox.com> wrote:
>>>>>
>>>>> > Thanks.  My setup is actually 3 task managers x 4 slots.  I played
>>>>> with the parallelism and found that at low values, the error did not
>>>>> occur.  I can only conclude that there is some form of data shuffling that
>>>>> is occurring that is sensitive to the data source.  Yes, seems a little 
>>>>> odd
>>>>> to me as well.  OOC, did you load the file into HDFS or use it from a 
>>>>> local
>>>>> file system (e.g. file:///tmp/data.csv) - my results have shown that so
>>>>> far, HDFS does not appear to be sensitive to this issue.
>>>>> >
>>>>> > I updated the example to include my configuration and slaves, but
>>>>> for brevity, I'll include the configurable bits here:
>>>>> >
>>>>> > jobmanager.rpc.address: host01
>>>>> > jobmanager.rpc.port: 6123
>>>>> > jobmanager.heap.mb: 512
>>>>> > taskmanager.heap.mb: 2048
>>>>> > taskmanager.numberOfTaskSlots: 4
>>>>> > parallelization.degree.default: 1
>>>>> > jobmanager.web.port: 8081
>>>>> > webclient.port: 8080
>>>>> > taskmanager.network.numberOfBuffers: 8192
>>>>> > taskmanager.tmp.dirs: /datassd/flink/tmp
>>>>> >
>>>>> > And the slaves ...
>>>>> >
>>>>> > host01
>>>>> > host02
>>>>> > host03
>>>>> >
>>>>> > I did notice an extra empty line at the end of the slaves.  And
>>>>> while I highly doubt it makes ANY difference, I'm still going to re-run
>>>>> with it removed.
>>>>> >
>>>>> > Thanks for looking into it.
>>>>>
>>>>> Thank you for being so helpful. I've tried it with the local
>>>>> filesystem.
>>>>>
>>>>> On 23 Jun 2015, at 07:11, Aaron Jackson <ajack...@pobox.com> wrote:
>>>>>
>>>>> > I have 12 task managers across 3 machines - so it's a small setup.
>>>>>
>>>>> Sorry for my misunderstanding. I've tried it with both 12 task
>>>>> managers and 3 as well now. What's odd is that the stack trace shows that
>>>>> it is trying to connect to "localhost" for the remote channel although
>>>>> localhost is not configured anywhere. Let me think about that. ;)
>>>>>
>>>>> – Ufuk
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>
>

Reply via email to