Hi,
I did change my config to have parallelism of 9 and 3 slots on each machine and 
now it does distribute properly.
The other day I was told i could have many more slots than CPUs available and 
the system would distribute the load properly between the hosts with available 
CPU time, but it doesn't seem to be the case here:with 9 slots on each hosts, 
the distribute partitioning still send all 9 tasks to the 9 slots on the same 
machine even though it only has 3CPUs available and the other 6 CPUs on the 
other machines were not running anything.
The idea of having more slots than CPUs to me is that you can spin out more 
jobs and they'll be balanced as data comes... but with few jobs it seems odd 
that the task are all attributed to the same host just because it has slots 
available.
Anyways, that fixed that issue for me for now.. .looking forward to learning 
more about Flink
Date: Thu, 12 Mar 2015 20:00:32 +0100
Subject: RE: Flink logs only written to one host
From: fhue...@gmail.com
To: user@flink.apache.org

Can you check the JM log file how many slots are available?
Slots are configured per TM. If you configure 9 slots and 3 TMs you end up with 
27 slots, 9 on each TM.
On Mar 12, 2015 7:55 PM, "Emmanuel" <ele...@msn.com> wrote:



It appears actually that the slots used are all on the same host.My guess is 
because I am using the default partitioning method (forward, which defaults to 
the same host) 
However I now tried .shuffle() and .distribute() without any luck:
I have a DataStream<String> text = env.socketTextStream(inHostName, inPort);
this is the one socket input stream.Adding text.distribute().map(...)does not 
seem to distribute the .map() process on the other hosts.Is this the correct 
way to use .distribute() on a stream input? ThanksEmmanuel
From: ele...@msn.com
To: user@flink.apache.org
Subject: Flink logs only written to one host
Date: Thu, 12 Mar 2015 17:30:28 +0000




Hello,
I'm using a 3 nodes (3VMs) cluster, 3CPUs each, parallelism of 9, I usually 
only see taskmanager.out logs generated only on one of the 3 nodes when I use 
the System.out.println() method, to print debug info in my main processing 
function.
Is this expected? Or am I just doing something wrong? I stream from a socket 
with socketTextStream; I understand that this job runs on a single process, and 
I see that in the UI (using one slot only), but the computation task runs on 9 
slots. That task includes the System.out.println() statement, yet it only shows 
on one host's .out log folder. The host is not always the same, so I have to 
tail all logs on all hosts, but I'm surprised of this behavior.Am I just 
missing something? Are 'print' statement to stdout aggregated on one host 
somehow? If so how is this controlled? Why would that host change?
I would love to understand what is going on, and if maybe somehow the 9 slots 
may be running on a single host which would defeat the purpose.
Thanks for the insight
Emmanuel                                                                        
          
                                          

Reply via email to