Re: Parameters to Control Intra-node Parallelism

2016-07-14 Thread Saliya Ekanayake
Thank you, Ovidiu. On Wed, Jul 13, 2016 at 3:34 PM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi, > > I would pay attention to the memory settings such that > heap+off-heap+network buffers can be served from your node’s RAM for both > TMs. > Also, there is some correlation

Re: Parameters to Control Intra-node Parallelism

2016-07-13 Thread Ovidiu-Cristian MARCU
Hi, I would pay attention to the memory settings such that heap+off-heap+network buffers can be served from your node’s RAM for both TMs. Also, there is some correlation between the number of buffers, parallelism and your workflow’s operators. The suggestion to be used for the numberOfBuffers d

Re: Parameters to Control Intra-node Parallelism

2016-07-12 Thread Saliya Ekanayake
Hi Ovidiu, Checking the /var/log/messages based on Greg's response revealed TMs were killed due to out of memory. Here's the node architecture. Each node has 128GB of RAM. I was trying to run 2 TMs per node binding each to 12 cores (or 1 socket). The total number of nodes were 16. I finally, manag

Re: Parameters to Control Intra-node Parallelism

2016-07-12 Thread Ovidiu-Cristian MARCU
Hi, Can you post your configuration parameters (exclude default settings) and cluster description? Best, Ovidiu > On 11 Jul 2016, at 17:49, Saliya Ekanayake wrote: > > Thank you Greg, I'll check if this was the cause for my TMs to disappear. > > On Mon, Jul 11, 2016 at 11:34 AM, Greg Hogan <

Re: Parameters to Control Intra-node Parallelism

2016-07-11 Thread Saliya Ekanayake
Thank you Greg, I'll check if this was the cause for my TMs to disappear. On Mon, Jul 11, 2016 at 11:34 AM, Greg Hogan wrote: > The OOM killer doesn't give warning so you'll need to call dmesg or look > in /var/log/messages or similar. The following reports that Debian flavors > may use /var/log

Re: Parameters to Control Intra-node Parallelism

2016-07-11 Thread Greg Hogan
The OOM killer doesn't give warning so you'll need to call dmesg or look in /var/log/messages or similar. The following reports that Debian flavors may use /var/log/syslog. http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer On Sun, Jul 10, 2016 at 11:55

Re: Parameters to Control Intra-node Parallelism

2016-07-10 Thread Saliya Ekanayake
Greg, where did you see the OOM log as shown in this mail thread? In my case none of the TaskManagers nor JobManger reports an error like this. On Sun, Jul 10, 2016 at 8:45 PM, Greg Hogan wrote: > These symptoms sounds similar to what I was experiencing in the following > thread. Flink can have

Re: Parameters to Control Intra-node Parallelism

2016-07-10 Thread Greg Hogan
These symptoms sounds similar to what I was experiencing in the following thread. Flink can have some unexpected memory usage which can result in an OOM kill by the kernel, and this becomes more pronounced as the cluster size grows. https://www.mail-archive.com/dev@flink.apache.org/msg06346.html

Re: Parameters to Control Intra-node Parallelism

2016-07-08 Thread Saliya Ekanayake
I checked, but JVMs didn't crash. No puppet or other services like that. One thing I found is that things work OK when I have a smaller number of slaves. For example, here I was trying to run on 16 nodes giving 2 TMs each. Then I reduced it to 4 nodes each with 2 TMs, which worked. On Fri, Jul

Re: Parameters to Control Intra-node Parallelism

2016-07-08 Thread Robert Metzger
Hi, from the TaskManager logs, I can not see anything suspicious. Its a bit weird that the TaskManager logs just end, without any shutdown messages. Usually the TMs log some shut down stuff when they are stopping. Also, if they would be still running, I would expect some error messages from akka ab

Re: Parameters to Control Intra-node Parallelism

2016-07-07 Thread Saliya Ekanayake
I see two logs (attached), but there's only 1 TaskManger process. Also, the Web console says it can find only 1 TM. However, I see this part in JM log, which shows there was a second TM at one point, but it was unregistered. Any thoughts? -- - Registered TaskManager at j-

Re: Parameters to Control Intra-node Parallelism

2016-07-07 Thread Ufuk Celebi
No that should suffice. Can you check whether there are any task manager logs for the second TM on that machine (taskmanager-X-j-011.log where X is the TM number)? If yes, the task manager process does start up and there is another problem. If not, the task managers seems not to start even. – Ufuk

Re: Parameters to Control Intra-node Parallelism

2016-07-06 Thread Saliya Ekanayake
I tried to run more than one task manager per node by duplicating the slave IPs. At startup it says for example, [INFO] 1 instance(s) of taskmanager are already running on j-011. Starting taskmanager daemon on host j-011. but I only see 1 task manager process running. Is there anything else I ne

Re: Parameters to Control Intra-node Parallelism

2016-07-03 Thread Saliya Ekanayake
Thank you! On Sun, Jul 3, 2016 at 11:28 AM, Ufuk Celebi wrote: > Yes, exactly. > > On Sat, Jul 2, 2016 at 6:28 PM, Saliya Ekanayake > wrote: > > Thank you, yes, it can be done externally, if not supported within Flink. > > > > So the way to spawn multiple task managers would be to list the same

Re: Parameters to Control Intra-node Parallelism

2016-07-03 Thread Ufuk Celebi
Yes, exactly. On Sat, Jul 2, 2016 at 6:28 PM, Saliya Ekanayake wrote: > Thank you, yes, it can be done externally, if not supported within Flink. > > So the way to spawn multiple task managers would be to list the same slave > machines N times as necessary in the slaves file? > > On Sat, Jul 2, 2

Re: Parameters to Control Intra-node Parallelism

2016-07-02 Thread Saliya Ekanayake
Thank you, yes, it can be done externally, if not supported within Flink. So the way to spawn multiple task managers would be to list the same slave machines N times as necessary in the slaves file? On Sat, Jul 2, 2016 at 11:22 AM, Ufuk Celebi wrote: > No, not inside of Flink. That sounds like

Re: Parameters to Control Intra-node Parallelism

2016-07-02 Thread Ufuk Celebi
No, not inside of Flink. That sounds like something like the OS or resource manager should handle. On Sat, Jul 2, 2016 at 5:12 PM, Saliya Ekanayake wrote: > That's great, so is there support to pin task managers to sockets as well? > > On Sat, Jul 2, 2016 at 11:08 AM, Ufuk Celebi wrote: >> >> Re

Re: Parameters to Control Intra-node Parallelism

2016-07-02 Thread Saliya Ekanayake
That's great, so is there support to pin task managers to sockets as well? On Sat, Jul 2, 2016 at 11:08 AM, Ufuk Celebi wrote: > Regarding 2) if you don't manually configure something else, that > should happen always. > > Yes, you can run more than one task manager per node depending on the > p

Re: Parameters to Control Intra-node Parallelism

2016-07-02 Thread Ufuk Celebi
Regarding 2) if you don't manually configure something else, that should happen always. Yes, you can run more than one task manager per node depending on the process isolation you want. Within a task manager, there are multiple threads for each slot. For example, if you have 2 task managers with 2

Re: Parameters to Control Intra-node Parallelism

2016-07-01 Thread Saliya Ekanayake
Hi Ufuk, Looking at the document you sent it seems only 1 task manager per node exist and within that you have multiple slots. Is it possible to run more than 1 task manager per node? Also, within a task manager is the parallelism done through threads or processes? Thank you, Saliya On Thu, Jun

Re: Parameters to Control Intra-node Parallelism

2016-06-30 Thread Saliya Ekanayake
Thank you, I'll check these. In 2.) you said they are likely to exchange through memory. Is there a case why they wouldn't? On Thu, Jun 30, 2016 at 5:03 AM, Ufuk Celebi wrote: > On Thu, Jun 30, 2016 at 1:44 AM, Saliya Ekanayake > wrote: > > 1. What parameters are available to control paralleli

Re: Parameters to Control Intra-node Parallelism

2016-06-30 Thread Ufuk Celebi
On Thu, Jun 30, 2016 at 1:44 AM, Saliya Ekanayake wrote: > 1. What parameters are available to control parallelism within a node? Task Manager processing slots: https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#configuring-taskmanager-processing-slots > 2. Does Flink

Parameters to Control Intra-node Parallelism

2016-06-29 Thread Saliya Ekanayake
Hi, We are trying to scale some of our scientific applications written in Flink. A few questions on tuning Flink performance. 1. What parameters are available to control parallelism within a node? 2. Does Flink support shared memory-based messaging within a node (without doing TCP calls)? 3. Is t