So for anyone who is interested, here are some code references for getting
started with Flink on Slurm.
I added basic start and stop scripts for Flink on Slurm in my fork:
https://github.com/robert-schmidtke/flink/tree/flink-slurm/flink-dist/src/main/flink-bin/bin
And I also created an example of
Feel free to contribute a documentation to Flink on how to run Flink on
SLURM.
On Thu, Oct 1, 2015 at 11:45 AM, Robert Schmidtke
wrote:
> I see, thanks for the info. I only have access to my cluster via SLURM and
> we don't have ssh between our nodes which is why I haven't really
> considered th
I see, thanks for the info. I only have access to my cluster via SLURM and
we don't have ssh between our nodes which is why I haven't really
considered the Standalone mode. A colleague has set up YARN on SLURM and it
was just the easiest to use. I briefly looked into the Flink Standalone
mode but d
Hi,
there is currently no option for forcing certain containers onto specific
machines.
For running the JM (or any other YARN container) on the AM host, you first
need to have a NodeManager running on the host with the RM. Maybe YARN is
smart enough to schedule the small JM container onto that mach
Hi Robert,
I had a job failure yesterday with what I believe is the setup I have
described above. However when trying to reproduce now, the behavior is the
same: Flink waiting for resources to become available. So no hard error.
Ok, the looping makes sense then. I haven't thought about shared set
Hi,
It is interesting to note that when I set both
yarn.nodemanager.resource.memory-mb
> and yarn.scheduler.maximum-allocation-mb to 56G I get a proper error when
> requesting 56G and 1M, but when setting yarn.nodemanager.resource.memory-mb
> to 56G and yarn.scheduler.maximum-allocation-mb to 54G
Hi Robert,
thanks for your reply. It got me digging into my setup and I discovered
that one TM was scheduled next to the JM. When specifying -yn 7 the
documentation suggests that this is the number of TMs (of which I wanted
7), and I thought an additional container would be used for the JM (my YAR
Hi Robert,
the problem here is that YARN's scheduler (there are different schedulers
in YARN: FIFO, CapacityScheduler, ...) is not giving Flink's
ApplicationMaster/JobManager all the containers it is requesting. By
increasing the size of the AM/JM container, there is probably no memory
left to fit
I should say I'm running the current Flink master branch.
On Wed, Sep 30, 2015 at 5:02 PM, Robert Schmidtke
wrote:
> It's me again. This is a strange issue, I hope I managed to find the right
> keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of
> memory each.
>
> When runn
It's me again. This is a strange issue, I hope I managed to find the right
keywords. I got 8 machines, 1 for the JM, the other 7 are TMs with 64G of
memory each.
When running my job like so:
$FLINK_HOME/bin/flink run -m yarn-cluster -yjm 16384 -ytm 40960 -yn 7 .
The job completes without any
10 matches
Mail list logo