So for anyone who is interested, here are some code references for getting started with Flink on Slurm.
I added basic start and stop scripts for Flink on Slurm in my fork: https://github.com/robert-schmidtke/flink/tree/flink-slurm/flink-dist/src/main/flink-bin/bin And I also created an example of how to configure and run it: https://github.com/robert-schmidtke/flink-slurm/blob/master/flink-slurm-example.sh I'm not sure I will add much more effort because it works for my setup right now. However if there's a wider interest I can add a bit more documentation and insight. Robert On Thu, Oct 1, 2015 at 11:51 AM, Robert Metzger <rmetz...@apache.org> wrote: > Feel free to contribute a documentation to Flink on how to run Flink on > SLURM. > > On Thu, Oct 1, 2015 at 11:45 AM, Robert Schmidtke <ro.schmid...@gmail.com> > wrote: > >> I see, thanks for the info. I only have access to my cluster via SLURM >> and we don't have ssh between our nodes which is why I haven't really >> considered the Standalone mode. A colleague has set up YARN on SLURM and it >> was just the easiest to use. I briefly looked into the Flink Standalone >> mode but dropped it because I thought YARN would be possible after all. It >> seems I'm going to have a deeper look into starting the master and slaves >> with SLURM's srun instead of ssh (I guess a slight modification of >> start-cluster.sh should do the job). >> >> On Thu, Oct 1, 2015 at 11:30 AM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >>> Hi, >>> there is currently no option for forcing certain containers onto >>> specific machines. >>> For running the JM (or any other YARN container) on the AM host, you >>> first need to have a NodeManager running on the host with the RM. Maybe >>> YARN is smart enough to schedule the small JM container onto that machine. >>> >>> I don't know your exact setup, but maybe it would make sense for you to >>> run Flink in the standalone cluster mode instead with YARN. It seems that >>> you have a very good idea how and where you want to run the Flink services >>> in your cluster. YARN is designed to be an abstraction between the cluster >>> and the application, that's why its a bit difficult to schedule the >>> containers to specific machines. >>> >>> Robert >>> >>> >>> >>> On Thu, Oct 1, 2015 at 11:24 AM, Robert Schmidtke < >>> ro.schmid...@gmail.com> wrote: >>> >>>> Hi Robert, >>>> >>>> I had a job failure yesterday with what I believe is the setup I have >>>> described above. However when trying to reproduce now, the behavior is the >>>> same: Flink waiting for resources to become available. So no hard error. >>>> >>>> Ok, the looping makes sense then. I haven't thought about shared >>>> setups. I'm still figuring out how all parameters play together, i.e. -yn, >>>> -yjm, -ytm and the memory limits in yarn-site.xml. This will need some >>>> testing and I'll come back with a proper description once I think I know >>>> what's going on. >>>> >>>> When running Flink on YARN, is it easily possible to place the Flink JM >>>> where the YARN Resource Manager sits, and all the TMs with the remaining >>>> Node Managers? >>>> >>>> Robert >>>> >>>> On Thu, Oct 1, 2015 at 10:53 AM, Robert Metzger <rmetz...@apache.org> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> It is interesting to note that when I set both >>>>> yarn.nodemanager.resource.memory-mb >>>>>> and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error >>>>>> when requesting 56G and 1M, but when setting >>>>>> yarn.nodemanager.resource.memory-mb >>>>>> to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get >>>>>> an error but the aforementioned endless loop. >>>>> >>>>> >>>>> is it a "hard error" (failing) you're getting or just "WARN" log >>>>> messages. I'm asking because I've added some code some time ago to do some >>>>> checks before deploying Flink on YARN. These checks will print WARN log >>>>> messages if the requested YARN session/job does not fit onto the cluster. >>>>> This "endless loop" exists because in many production environments >>>>> Flink can just wait for resources to become available, for example when >>>>> other containers are finishing. >>>>> >>>>> >>>>> Robert >>>>> >>>>> On Wed, Sep 30, 2015 at 6:33 PM, Robert Schmidtke < >>>>> ro.schmid...@gmail.com> wrote: >>>>> >>>>>> Hi Robert, >>>>>> >>>>>> thanks for your reply. It got me digging into my setup and I >>>>>> discovered that one TM was scheduled next to the JM. When specifying -yn >>>>>> 7 >>>>>> the documentation suggests that this is the number of TMs (of which I >>>>>> wanted 7), and I thought an additional container would be used for the JM >>>>>> (my YARN cluster has 8 containers). Anyway with this setup the memory >>>>>> added >>>>>> up to 56G and 1M (40G per TM and 16G 1M for the JM), but I set a hard >>>>>> maximum of 56G in my yarn-site.xml which is why the request could not be >>>>>> fulfilled. It is interesting to note that when I set >>>>>> both yarn.nodemanager.resource.memory-mb >>>>>> and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when >>>>>> requesting 56G and 1M, but when setting >>>>>> yarn.nodemanager.resource.memory-mb >>>>>> to 56G and yarn.scheduler.maximum-allocation-mb to 54G I don't get an >>>>>> error >>>>>> but the aforementioned endless loop. Note I >>>>>> have yarn.nodemanager.vmem-check-enabled set to false. This is probably a >>>>>> YARN issue then / my bad configuration. >>>>>> >>>>>> I'm in a rush now (to get to the Flink meetup) and thus will check >>>>>> the documentation later to see how to deploy the TMs and JM on separate >>>>>> machines each, since that is not what's happening at the moment, but this >>>>>> is what I'd like to have. Thanks again and see you in an hour. >>>>>> >>>>>> Cheers >>>>>> Robert >>>>>> >>>>>> On Wed, Sep 30, 2015 at 5:19 PM, Robert Metzger <rmetz...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi Robert, >>>>>>> >>>>>>> the problem here is that YARN's scheduler (there are different >>>>>>> schedulers in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's >>>>>>> ApplicationMaster/JobManager all the containers it is requesting. By >>>>>>> increasing the size of the AM/JM container, there is probably no memory >>>>>>> left to fit the last TaskManager container. >>>>>>> I also experienced this issue, when I wanted to run a Flink job on >>>>>>> YARN and the containers were fitting theoretically, but YARN was not >>>>>>> giving >>>>>>> me all the containers I requested. >>>>>>> Back then, I asked on the yarn-dev list [1] (there were also some >>>>>>> off-list emails) but we could not resolve the issue. >>>>>>> >>>>>>> Can you check the resource manager logs? Maybe there is a log >>>>>>> message which explains why the container request of Flink's AM is not >>>>>>> fulfilled. >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> http://search-hadoop.com/m/AsBtCilK5r1pKLjf1&subj=Re+QUESTION+Allocating+a+full+YARN+cluster >>>>>>> >>>>>>> On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke < >>>>>>> ro.schmid...@gmail.com> wrote: >>>>>>> >>>>>>>> It's me again. This is a strange issue, I hope I managed to find >>>>>>>> the right keywords. I got 8 machines, 1 for the JM, the other 7 are TMs >>>>>>>> with 64G of memory each. >>>>>>>> >>>>>>>> When running my job like so: >>>>>>>> >>>>>>>> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn >>>>>>>> 7 ..... >>>>>>>> >>>>>>>> The job completes without any problems. When running it like so: >>>>>>>> >>>>>>>> $FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16385 -ytm 40960 -yn >>>>>>>> 7 ..... >>>>>>>> >>>>>>>> (note the one more M of memory for the JM), the execution stalls, >>>>>>>> continuously reporting: >>>>>>>> >>>>>>>> ..... >>>>>>>> TaskManager status (6/7) >>>>>>>> TaskManager status (6/7) >>>>>>>> TaskManager status (6/7) >>>>>>>> ..... >>>>>>>> >>>>>>>> I did some poking around, but I couldn't find any direct >>>>>>>> correlation with the code. >>>>>>>> >>>>>>>> The JM log says: >>>>>>>> >>>>>>>> ..... >>>>>>>> 16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ >>>>>>>> - JVM Options: >>>>>>>> 16:49:01,893 INFO org.apache.flink.yarn.ApplicationMaster$ >>>>>>>> - -Xmx12289M >>>>>>>> ..... >>>>>>>> >>>>>>>> but then continues to report >>>>>>>> >>>>>>>> ..... >>>>>>>> 16:52:59,311 INFO >>>>>>>> org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The >>>>>>>> user >>>>>>>> requested 7 containers, 6 running. 1 containers missing >>>>>>>> 16:52:59,831 INFO >>>>>>>> org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The >>>>>>>> user >>>>>>>> requested 7 containers, 6 running. 1 containers missing >>>>>>>> 16:53:00,351 INFO >>>>>>>> org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - The >>>>>>>> user >>>>>>>> requested 7 containers, 6 running. 1 containers missing >>>>>>>> ..... >>>>>>>> >>>>>>>> forever until I cancel the job. >>>>>>>> >>>>>>>> If you have any ideas I'm happy to try them out. Thanks in advance >>>>>>>> for any hints! Cheers. >>>>>>>> >>>>>>>> Robert >>>>>>>> -- >>>>>>>> My GPG Key ID: 336E2680 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> My GPG Key ID: 336E2680 >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> My GPG Key ID: 336E2680 >>>> >>> >>> >> >> >> -- >> My GPG Key ID: 336E2680 >> > > -- My GPG Key ID: 336E2680