Hi Michele, I'm happy that you got it to run the way you want.
I guess services such as the HDFS NameNode and YARNs ResourceManager are running on the master. I don't know what you are doing on the cluster, but I suspect it is for experimentation only. As long as you are not maintaining a huge HDFS installation in the cluster, you don't need a fancy machine for the master. The documentation [1] of EMR says: "The master node does not have large computational requirements. For most clusters of 50 or fewer nodes, consider using a m1.small for Hadoop 1 clusters and m1.large for Hadoop 2 clusters. For clusters of more than 50 nodes, consider using an m1.large for Hadoop 1 clusters and m1.xlarge for Hadoop 2 clusters." The m1.large machines [2] have 7.5 GB and 2 cores. [1] http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-instances.html [2] http://aws.amazon.com/ec2/previous-generation/ On Mon, Jul 27, 2015 at 5:19 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > OK thanks Robert you have been very clear now! :) > > just one question, more related on emr than to flink, if i cannot run > anything on the EMR master, then is it useful to allocate a big machine (8 > core, 30GB) on it? I thought it was the jm but it is not > > > > > > Il giorno 27/lug/2015, alle ore 14:56, Robert Metzger < > rmetz...@apache.org> ha scritto: > > Hi Michele, > > > > no in an EMR configuration with 1 master and 5 core I have 5 active > node in the resource manager…sounds strange to me: ganglia shows 6 nodes > and 1 is always offload > > Okay, so there are only 5 machines available to deploy containers to. > The JobManager/ApplicationMaster will also occupy one container. > I guess in EMR they are not running a NodeManager on the master node, so > you can not deploy anything there via YARN. > > > now i am a little lost because I thought I was running 5 node for 5 tm > and the 6th (master one) as jm but it seems like I have to use the 5 core > as both tm and jm > > Flink on YARN can only deploy containers on machines which have a YARN > NodeManager running. The JM runs on such a container. > > > btw which is a good parameter for number of buffer? > > see here for some explanation what they are used for: > http://www.slideshare.net/robertmetzger1/apache-flink-hands-on/37 > I would double them until your job runs (as a first approach ;) ) > > > I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots > each but in flink dashboard it says “Flink Managed Memory 10506mb” with an > exclamation mark saying it is much smaller than the physical memory > (30105mb)…that’s true but i cannot run the cluster with more than 20992 > > I answered that question two weeks ago on this list (in the example for > 10GB of memory): > > > Regarding the memory you are able to use in the end: >> Initially, you request 10240MB. >> From that, we add a 25% safety margin to avoid that YARN is going to >> kill the JVM. >> 10240*0.75 = 7680 MB. >> So Flink's TaskManager will see 7680 MB when starting up. >> Flink's Memory manager is only using 70% of the available heap space for >> managed memory: >> 7680*0.7 = 5376 MB. >> The safety margin for YARN is very conservative. As Till already said, >> you can set a different value for the "yarn.heap-cutoff-ratio" (try >> 0.15) and see if your job still runs. > > > > > > > On Mon, Jul 27, 2015 at 11:29 AM, Michele Bertoni < > michele1.bert...@mail.polimi.it> wrote: > >> Hi Fabian, thanks for your reply >> so you flink is using about 50% of memory for itself right? >> >> anyway now I am running an EMR with 1 master and 5 core all of them are >> m3.2xlarge with 8 cores and 30GB of memory >> >> I would like to run flink on yarn with 40 slots on 5 tm with the >> maximum available resources, what i do is >> >> change in conf-yaml.xml numberofSlots to 8 and default parallelism to 40 >> run yarn with the command >> ./yarn-session.sh -n 5 -jm 2048 -tm 23040 (23040 is the maximum allowed >> out of 30GB I don’t know why) >> >> I get an error something like "failed allocating memory after 4/5 >> container available memory 20992" >> I suspect that it is not using the master of the cluster for allocating >> the jm but using one of the core right? in fact 20992 is exactly 23040-2048 >> >> then i run it with 20992 >> ./yarn-session.sh -n 5 -jm 2048 -tm 20992 >> it succeeds in running 5tm with 40 slots, but when I run a program I >> always get >> >> Caused by: java.io.IOException: Insufficient number of network >> buffers: required 40, but only 14 available. The total number of network >> buffers is currently set to 4096. You can increase this number by setting >> the configuration key 'taskmanager.network.numberOfBuffers’. >> >> I change the buffers number as robert said from 2048 to 4096 on of my >> programs run but the second still has same problems >> >> >> Thanks for help >> Best, >> michele >> >> >> Il giorno 27/lug/2015, alle ore 11:19, Fabian Hueske <fhue...@gmail.com> >> ha scritto: >> >> Hi Michele, >> >> the 10506 MB refer to the size of Flink's managed memory whereas the >> 20992 MB refer to the total amount of TM memory. At start-up, the TM >> allocates a fraction of the JVM memory as byte arrays and manages this >> portion by itself. The remaining memory is used as regular JVM heap for TM >> and user code. >> >> The purpose of the warning is to tell the user, that the memory >> configuration might not be optimal. However, this depends of course on the >> setup environment and should probably be rephrased to make this more clear. >> >> Cheers, Fabian >> >> 2015-07-27 11:07 GMT+02:00 Michele Bertoni < >> michele1.bert...@mail.polimi.it>: >> >>> I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots >>> each but in flink dashboard it says “Flink Managed Memory 10506mb” with an >>> exclamation mark saying it is much smaller than the physical memory >>> (30105mb)…that’s true but i cannot run the cluster with more than 20992 >>> >>> thanks >>> >>> >>> >>> Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni < >>> michele1.bert...@mail.polimi.it> ha scritto: >>> >>> Hi Robert, >>> thanks for answering, today I have been able to try again: no in an EMR >>> configuration with 1 master and 5 core I have 5 active node in the resource >>> manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload >>> >>> the total amount of memory is 112.5GB that is actually 22.5 for each >>> of the 5 >>> >>> now i am a little lost because I thought I was running 5 node for 5 tm >>> and the 6th (master one) as jm but it seems like I have to use the 5 core >>> as both tm and jm >>> >>> >>> >>> btw which is a good parameter for number of buffer? >>> >>> >>> thanks, >>> Best >>> michele >>> >>> >>> Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger < >>> rmetz...@apache.org> ha scritto: >>> >>> Hi Michele, >>> >>> configuring a YARN cluster to allocate all available resources as good >>> as possible is sometimes tricky, that is true. >>> We are aware of these problems and there are actually the following two >>> JIRAs for this: >>> https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client >>> to allocate all cluster resources, if no argument given) --> I think the >>> consensus on the issue was give users an option to allocate everything (so >>> don't do it by default) >>> https://issues.apache.org/jira/browse/FLINK-1288 (YARN >>> ApplicationMaster sometimes fails to allocate the specified number of >>> workers) >>> >>> How many NodeManager's is YARN reporting in the ResourceManager UI? >>> (in "Active Nodes" column) (I suspect 6?) >>> How much memory per NodeManager is YARN reporting? (You can see this in >>> the "Nodes" page of the RM) >>> >>> > I would like to run 5 nodes with 8 slots each, is it correct? >>> >>> Yes. >>> >>> >>> > Then i reduced memories, everything started but i get a runtime >>> error of missing buffer >>> >>> What exactly is the exception? >>> I guess you have to give the system a few more network buffers using the >>> taskmanager.network.numberOfBuffers config parameter. >>> >>> > Can someone help me syep-by-step in a good configuration for such >>> cluster? I think the documentation is really missing details >>> >>> When starting Flink on YARN, there are usually some WARN log messages >>> in the beginning when the system detects that specified containers will not >>> fit in the cluster. >>> Also, in the ResourceManager UI, you can see the status of the >>> scheduler. This often helps to understand what's going on, resource-wise. >>> >>> >>> >>> On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni < >>> michele1.bert...@mail.polimi.it> wrote: >>> >>>> Hi everybody, i need a help on how to configure a yarn cluster >>>> I tried a lot of conf but none of them was correct >>>> >>>> We have a cluster on amazon emr let's say 1manager+5worker all of them >>>> are m3.2xlarge then 8 core each and 30 GB of RAM each >>>> >>>> What is a good configuration for such cluster? >>>> >>>> I would like to run 5 nodes with 8 slots each, is it correct? >>>> >>>> Now the problems: by now i run all tests mistakenly using 40 task >>>> managers each with 2048MB and 1 slot (at least it was working) >>>> >>>> Today i found the error and i tried run 5 task manager and setting a >>>> default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm >>>> 23040) that is the limit allowed by yarn but i am getting errors: one TM is >>>> not running because there is no available memory. it seems like the jm is >>>> not using memory from the master but from the nodes (in fact yarn says TM >>>> number 5 is missing 2048 that is the memory for the jm) >>>> >>>> Then i reduced memories, everything started but i get a runtime error >>>> of missing buffer >>>> >>>> Can someone help me syep-by-step in a good configuration for such >>>> cluster? I think the documentation is really missing details >>>> >>>> Thanks a lot >>>> Best >>>> Michele >>>> >>> >>> >>> >>> >> >> > >