Renjie:The log didn't go through. Consider logging a JIRA and attach the log there. Thanks -------- Original message --------From: Renjie Liu <liurenjie2...@gmail.com> Date: 3/23/18 1:38 AM (GMT-08:00) To: dev@flink.apache.org Subject: Re: Flip 6 mesos support Hi, Till:Attached is my log. I'm also looking into this, could you please assign this bug to me? I'm also trying to contribute to flink.
On Fri, Mar 23, 2018 at 4:11 PM Till Rohrmann <trohrm...@apache.org> wrote: HI Renjie, could you share the logs with us? This sounds like a bug we should fix. Cheers, Till On Fri, Mar 23, 2018 at 4:42 AM, Renjie Liu <liurenjie2...@gmail.com> wrote: > Hi, Till: > Has anybody succeeded to deploy flip 6 mode on mesos? > > I'm testing flip 6 using the master branch and I just can't run jobs. The > following are my configurations: > > *jobmanager.rpc.address: qt9ss.prod.mediav.com > <http://qt9ss.prod.mediav.com>* > *jobmanager.rpc.port: 6123* > *jobmanager.heap.mb: 1024* > *taskmanager.heap.mb: 1024* > *taskmanager.numberOfTaskSlots: 5* > *parallelism.default: 1* > *web.port: 8081* > *mesos.master: zk://dk71ss.jx.shbt2.qihoo.net:2191 > <http://dk71ss.jx.shbt2.qihoo.net:2191>,dk72ss.jx.shbt2.qihoo.net:2191 > <http://dk72ss.jx.shbt2.qihoo.net:2191>,dk5ss.jx.shbt2. > qihoo.net:2191/mesos > <http://dk5ss.jx.shbt2.qihoo.net:2191/mesos>* > *mesos.resourcemanager.tasks.container.type: docker* > *mesos.resourcemanager.tasks.container.image.name > <http://mesos.resourcemanager.tasks.container.image.name>: > dk1ss.prod.mediav.com:5000/adq/flink:1.6.0-SNAPSHOT > <http://dk1ss.prod.mediav.com:5000/adq/flink:1.6.0-SNAPSHOT>* > *mesos.resourcemanager.framework.user: mediav* > *mesos.resourcemanager.tasks.cpus: 5* > *mesos.resourcemanager.tasks.mem: 10240* > *mesos.resourcemanager.framework.name > <http://mesos.resourcemanager.framework.name>: Flink* > *mesos.failover-timeout: 60* > > From the mesos side, I can see that when I submit a job, flink master will > request a contianer with 5 cores. But the job submission still fails the > following error: > *org.apache.flink.runtime.jobmanager.scheduler. > NoResourceAvailableException: > Could not allocate all requires slots within timeout of 300000 ms. Slots > required: 1, slots allocated: 0* > > My job only requires 1 slot but job manager keeps reporting that no slots > avaiable. > > On Wed, Mar 21, 2018 at 10:42 PM Till Rohrmann <trohrm...@apache.org> > wrote: > > > The resources consumed by the JobMaster can be specified by > > `jobmanager.heap.mb`. > > > > Cheers, > > Till > > > > On Wed, Mar 21, 2018 at 3:20 PM, Renjie Liu <liurenjie2...@gmail.com> > > wrote: > > > > > Hi, Till: > > > > > > In fact, I want to ask the resources consume by job manager > > > > > > Till Rohrmann <trohrm...@apache.org> 于 2018年3月21日周三 下午8:17写道: > > > > > > > As many as the application needs to run. If you start a job with > > > > parallelism 10 then it will ask for 10 slots (assuming slot sharing). > > > > > > > > On Wed, Mar 21, 2018 at 12:04 PM, Renjie Liu < > liurenjie2...@gmail.com> > > > > wrote: > > > > > > > > > So how many slots a job manager may consume? > > > > > > > > > > On Wed, Mar 21, 2018 at 6:50 PM Till Rohrmann < > trohrm...@apache.org> > > > > > wrote: > > > > > > > > > > > At the moment this is not possible. In order to do this, you will > > > have > > > > to > > > > > > use the per-job mode and run each job on a dedicated Flink > cluster. > > > > > > > > > > > > On Wed, Mar 21, 2018 at 11:33 AM, Renjie Liu < > > > liurenjie2...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > For example, we have 2 jobs. > > > > > > > For job 1, I want to start job manger with 1 CPU and 100M > memory. > > > > Job 1 > > > > > > > need s10 slots, and I want to deploy these 10 slots in 2 task > > > > managers, > > > > > > > each with 5 cores and 1G memory. > > > > > > > > > > > > > > For job 2, I want to start job manager with 2 CPU and 200M > > memory. > > > > Job > > > > > 2 > > > > > > > needs 100 slots and I want to deploy these 100 slot in 10 task > > > > > managers, > > > > > > > each with 10 cores and 2G memory. > > > > > > > > > > > > > > Is this possible? > > > > > > > > > > > > > > On Wed, Mar 21, 2018 at 6:19 PM Till Rohrmann < > > > trohrm...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Renjie, > > > > > > > > > > > > > > > > what do you mean with specifying different JM and TM > resources > > > for > > > > > > > > different jobs exactly? > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Till > > > > > > > > > > > > > > > > On Wed, Mar 21, 2018 at 10:55 AM, Renjie Liu < > > > > > liurenjie2...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, Till: > > > > > > > > > > > > > > > > > > How to specify job manager and task manager resources for > > > > different > > > > > > > jobs > > > > > > > > in > > > > > > > > > session mode? > > > > > > > > > > > > > > > > > > On Sun, Mar 18, 2018 at 1:10 AM Till Rohrmann < > > > > > trohrm...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Shuyi, > > > > > > > > > > > > > > > > > > > > best if you look at the other e2e tests in the > > > > > > flink-end-to-end-tests > > > > > > > > > > module. For example the Kafka e2e test under > > > > > > > > > > flink/flink-end-to-end-tests/ > test-scripts/test_streaming_ > > > > > > > kafka010.sh. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > On Fri, Mar 16, 2018 at 10:20 PM, Shuyi Chen < > > > > suez1...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Till, > > > > > > > > > > > > > > > > > > > > > > For FLINK-8562, the test is passing now because it's > not > > > > really > > > > > > > > > > > checking the right thing. > > > > > > > > > > > > > > > > > > > > > > Yes, I can help with the Kerberos integration ticket. > > > > > > > > > > > > > > > > > > > > > > Is there an example on how the e2e test should be > > > structured > > > > > and > > > > > > > > > invoked? > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > Shuyi > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 16, 2018 at 6:51 AM, Till Rohrmann < > > > > > > > trohrm...@apache.org > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Shuyi, > > > > > > > > > > > > > > > > > > > > > > > > thanks for the working on FLINK-8562. Once this issue > > is > > > > > fixed, > > > > > > > it > > > > > > > > > will > > > > > > > > > > > > automatically be executed on the Flip-6 components. > In > > > fact > > > > > it > > > > > > is > > > > > > > > > > already > > > > > > > > > > > > being executed on Flip-6. > > > > > > > > > > > > > > > > > > > > > > > > But what you could help the community with is setting > > up > > > an > > > > > > > > automated > > > > > > > > > > > > end-to-end test for the Kerberos integration if you > > want: > > > > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-8981. > > > > > > > > > > > > > > > > > > > > > > > > The Flink community is currently working on > automating > > > more > > > > > and > > > > > > > > more > > > > > > > > > > > tests > > > > > > > > > > > > in order to facilitate faster releases and improve > the > > > test > > > > > > > > coverage. > > > > > > > > > > You > > > > > > > > > > > > can find more about this effort here: > > > > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-8970. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 15, 2018 at 8:45 PM, Shuyi Chen < > > > > > > suez1...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi Till, > > > > > > > > > > > > > > > > > > > > > > > > > > This is Shuyi :) Thanks a lot. In FLINK-8562, I > > already > > > > > sent > > > > > > a > > > > > > > PR > > > > > > > > > to > > > > > > > > > > > > > resolve the issue, your help to take a look will be > > > > great. > > > > > > > > > > > > > > > > > > > > > > > > > > Please let me know what I can help to test the > > Kerberos > > > > > > > > > > > authentication, I > > > > > > > > > > > > > am decently familiar with the Kerberos and YARN > > > security > > > > > part > > > > > > > in > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > > > > > > > As a starting point, I'd suggest to add an > > integration > > > > test > > > > > > > > similar > > > > > > > > > > to > > > > > > > > > > > > > YARNSessionFIFOSecuredITCase > > > > > > > > > > > > > for flip6. > > > > > > > > > > > > > > > > > > > > > > > > > > Shuyi > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 15, 2018 at 5:44 AM, Till Rohrmann < > > > > > > > > > trohrm...@apache.org > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Renjie, > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks for the pointer with the > > > > > > YARNSessionFIFOSecuredITCase. > > > > > > > > > > You're > > > > > > > > > > > > > right > > > > > > > > > > > > > > that we should fix this test. There is FLINK-8562 > > > which > > > > > > seems > > > > > > > > to > > > > > > > > > > > > address > > > > > > > > > > > > > > the problem. Will take a look. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Additionally, we want to test Kerberos > > authentication > > > > > > > > explicitly > > > > > > > > > as > > > > > > > > > > > > part > > > > > > > > > > > > > of > > > > > > > > > > > > > > the release testing for Flink 1.5. I will shortly > > > send > > > > > > > around a > > > > > > > > > > mail > > > > > > > > > > > > > where > > > > > > > > > > > > > > I will lay out the ongoing testing efforts and > > where > > > > more > > > > > > is > > > > > > > > > > needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 15, 2018 at 7:37 AM, Renjie Liu < > > > > > > > > > > liurenjie2...@gmail.com > > > > > > > > > > > > > > > > > > > &