Hi Michele,
> no in an EMR configuration with 1 master and 5 core I have 5 active node in the resource manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload Okay, so there are only 5 machines available to deploy containers to. The JobManager/ApplicationMaster will also occupy one container. I guess in EMR they are not running a NodeManager on the master node, so you can not deploy anything there via YARN. > now i am a little lost because I thought I was running 5 node for 5 tm and the 6th (master one) as jm but it seems like I have to use the 5 core as both tm and jm Flink on YARN can only deploy containers on machines which have a YARN NodeManager running. The JM runs on such a container. > btw which is a good parameter for number of buffer? see here for some explanation what they are used for: http://www.slideshare.net/robertmetzger1/apache-flink-hands-on/37 I would double them until your job runs (as a first approach ;) ) > I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each but in flink dashboard it says “Flink Managed Memory 10506mb” with an exclamation mark saying it is much smaller than the physical memory (30105mb)…that’s true but i cannot run the cluster with more than 20992 I answered that question two weeks ago on this list (in the example for 10GB of memory): Regarding the memory you are able to use in the end: > Initially, you request 10240MB. > From that, we add a 25% safety margin to avoid that YARN is going to kill > the JVM. > 10240*0.75 = 7680 MB. > So Flink's TaskManager will see 7680 MB when starting up. > Flink's Memory manager is only using 70% of the available heap space for > managed memory: > 7680*0.7 = 5376 MB. > The safety margin for YARN is very conservative. As Till already said, > you can set a different value for the "yarn.heap-cutoff-ratio" (try 0.15) > and see if your job still runs. On Mon, Jul 27, 2015 at 11:29 AM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > Hi Fabian, thanks for your reply > so you flink is using about 50% of memory for itself right? > > anyway now I am running an EMR with 1 master and 5 core all of them are > m3.2xlarge with 8 cores and 30GB of memory > > I would like to run flink on yarn with 40 slots on 5 tm with the maximum > available resources, what i do is > > change in conf-yaml.xml numberofSlots to 8 and default parallelism to 40 > run yarn with the command > ./yarn-session.sh -n 5 -jm 2048 -tm 23040 (23040 is the maximum allowed > out of 30GB I don’t know why) > > I get an error something like "failed allocating memory after 4/5 > container available memory 20992" > I suspect that it is not using the master of the cluster for allocating > the jm but using one of the core right? in fact 20992 is exactly 23040-2048 > > then i run it with 20992 > ./yarn-session.sh -n 5 -jm 2048 -tm 20992 > it succeeds in running 5tm with 40 slots, but when I run a program I > always get > > Caused by: java.io.IOException: Insufficient number of network buffers: > required 40, but only 14 available. The total number of network buffers is > currently set to 4096. You can increase this number by setting the > configuration key 'taskmanager.network.numberOfBuffers’. > > I change the buffers number as robert said from 2048 to 4096 on of my > programs run but the second still has same problems > > > Thanks for help > Best, > michele > > > Il giorno 27/lug/2015, alle ore 11:19, Fabian Hueske <fhue...@gmail.com> > ha scritto: > > Hi Michele, > > the 10506 MB refer to the size of Flink's managed memory whereas the > 20992 MB refer to the total amount of TM memory. At start-up, the TM > allocates a fraction of the JVM memory as byte arrays and manages this > portion by itself. The remaining memory is used as regular JVM heap for TM > and user code. > > The purpose of the warning is to tell the user, that the memory > configuration might not be optimal. However, this depends of course on the > setup environment and should probably be rephrased to make this more clear. > > Cheers, Fabian > > 2015-07-27 11:07 GMT+02:00 Michele Bertoni < > michele1.bert...@mail.polimi.it>: > >> I have been able to run 5 tm with -jm 2048 and -tm 20992 and 8 slots each >> but in flink dashboard it says “Flink Managed Memory 10506mb” with an >> exclamation mark saying it is much smaller than the physical memory >> (30105mb)…that’s true but i cannot run the cluster with more than 20992 >> >> thanks >> >> >> >> Il giorno 27/lug/2015, alle ore 11:02, Michele Bertoni < >> michele1.bert...@mail.polimi.it> ha scritto: >> >> Hi Robert, >> thanks for answering, today I have been able to try again: no in an EMR >> configuration with 1 master and 5 core I have 5 active node in the resource >> manager…sounds strange to me: ganglia shows 6 nodes and 1 is always offload >> >> the total amount of memory is 112.5GB that is actually 22.5 for each of >> the 5 >> >> now i am a little lost because I thought I was running 5 node for 5 tm >> and the 6th (master one) as jm but it seems like I have to use the 5 core >> as both tm and jm >> >> >> >> btw which is a good parameter for number of buffer? >> >> >> thanks, >> Best >> michele >> >> >> Il giorno 24/lug/2015, alle ore 16:38, Robert Metzger < >> rmetz...@apache.org> ha scritto: >> >> Hi Michele, >> >> configuring a YARN cluster to allocate all available resources as good >> as possible is sometimes tricky, that is true. >> We are aware of these problems and there are actually the following two >> JIRAs for this: >> https://issues.apache.org/jira/browse/FLINK-937 (Change the YARN Client >> to allocate all cluster resources, if no argument given) --> I think the >> consensus on the issue was give users an option to allocate everything (so >> don't do it by default) >> https://issues.apache.org/jira/browse/FLINK-1288 (YARN ApplicationMaster >> sometimes fails to allocate the specified number of workers) >> >> How many NodeManager's is YARN reporting in the ResourceManager UI? (in >> "Active Nodes" column) (I suspect 6?) >> How much memory per NodeManager is YARN reporting? (You can see this in >> the "Nodes" page of the RM) >> >> > I would like to run 5 nodes with 8 slots each, is it correct? >> >> Yes. >> >> >> > Then i reduced memories, everything started but i get a runtime error >> of missing buffer >> >> What exactly is the exception? >> I guess you have to give the system a few more network buffers using the >> taskmanager.network.numberOfBuffers config parameter. >> >> > Can someone help me syep-by-step in a good configuration for such >> cluster? I think the documentation is really missing details >> >> When starting Flink on YARN, there are usually some WARN log messages >> in the beginning when the system detects that specified containers will not >> fit in the cluster. >> Also, in the ResourceManager UI, you can see the status of the scheduler. >> This often helps to understand what's going on, resource-wise. >> >> >> >> On Fri, Jul 24, 2015 at 3:58 PM, Michele Bertoni < >> michele1.bert...@mail.polimi.it> wrote: >> >>> Hi everybody, i need a help on how to configure a yarn cluster >>> I tried a lot of conf but none of them was correct >>> >>> We have a cluster on amazon emr let's say 1manager+5worker all of them >>> are m3.2xlarge then 8 core each and 30 GB of RAM each >>> >>> What is a good configuration for such cluster? >>> >>> I would like to run 5 nodes with 8 slots each, is it correct? >>> >>> Now the problems: by now i run all tests mistakenly using 40 task >>> managers each with 2048MB and 1 slot (at least it was working) >>> >>> Today i found the error and i tried run 5 task manager and setting a >>> default slot in conf-yaml of 8, giving a task manager memory of 23040 (-tm >>> 23040) that is the limit allowed by yarn but i am getting errors: one TM is >>> not running because there is no available memory. it seems like the jm is >>> not using memory from the master but from the nodes (in fact yarn says TM >>> number 5 is missing 2048 that is the memory for the jm) >>> >>> Then i reduced memories, everything started but i get a runtime error of >>> missing buffer >>> >>> Can someone help me syep-by-step in a good configuration for such >>> cluster? I think the documentation is really missing details >>> >>> Thanks a lot >>> Best >>> Michele >>> >> >> >> >> > >