Just had a quick chat with Ufuk. The issue is that in 1.x the Yarn properties file is loaded regardless of whether "-m yarn-cluster" is specified on the command-line. This loads the dynamic properties from the Yarn properties file and applies all configuration of the running (session) cluster cluster to the to-be-created cluster.
Will be fixed in 1.1 and probably backported to 1.0.4. On Wed, Jun 15, 2016 at 6:05 PM, Maximilian Michels <m...@apache.org> wrote: > Hi Arnaud, > > One issue per thread please. That makes things a lot easier for us :) > > Something positive first: We are reworking the resuming of existing > Flink Yarn applications. It'll be much easier to resume a cluster > using simply the Yarn ID or re-discoering the Yarn session using the > properties file. > > The dynamic properties are a shortcut to modifying the Flink > configuration of the cluster _only_ upon startup. Afterwards, they are > already set at the containers. We might change this for the 1.1.0 > release. It should work if you put "yarn.properties-file.location: > /custom/location" in your flink-conf.yaml before you execute > "./bin/flink". > > Cheers, > Max > > On Wed, Jun 15, 2016 at 3:14 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> > wrote: >> Ooopsss.... >> My mistake, snapshot/restore do works in a local env, I've had a weird >> configuration issue! >> >> But I still have the property file path issue :) >> >> -----Message d'origine----- >> De : LINZ, Arnaud >> Envoyé : mercredi 15 juin 2016 14:35 >> À : 'user@flink.apache.org' <user@flink.apache.org> >> Objet : RE: Yarn batch not working with standalone yarn job manager once a >> persistent, HA job manager is launched ? >> >> Hi, >> >> I haven't had the time to investigate the bad configuration file path issue >> yet (if you have any idea why yarn.properties-file.location is ignored you >> are welcome) , but I'm facing another HA-problem. >> >> I'm trying to make my custom streaming sources HA compliant by implementing >> snapshotState() & restoreState(). I would like to test that mechanism in my >> junit tests, because it can be complex, but I was unable to simulate a >> "recover" on a local flink environment: snapshotState() is never triggered >> and launching an exception inside the execution chain does not lead to >> recovery but ends the execution, despite the >> streamExecEnv.enableCheckpointing(timeout) call. >> >> Is there a way to locally test this mechanism (other than poorly simulating >> it by explicitly calling snapshot & restore in a overridden source) ? >> >> Thanks, >> Arnaud >> >> -----Message d'origine----- >> De : LINZ, Arnaud >> Envoyé : lundi 6 juin 2016 17:53 >> À : user@flink.apache.org >> Objet : RE: Yarn batch not working with standalone yarn job manager once a >> persistent, HA job manager is launched ? >> >> I've deleted the '/tmp/.yarn-properties-user' file created for the >> persistent containter, and the batches do go into their own right container. >> However, that's not a workable workaround as I'm no longer able to submit >> streaming apps in the persistant container that way :) So it's really a >> problem of flink finding the right property file. >> >> I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the >> batch command line (also configured in the JVM_ARGS var), with no change of >> behaviour. Note that I do have a standalone yarn container created, but the >> job is submitted in the other other one. >> >> Thanks, >> Arnaud >> >> -----Message d'origine----- >> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 16:01 À >> : user@flink.apache.org Objet : Re: Yarn batch not working with standalone >> yarn job manager once a persistent, HA job manager is launched ? >> >> Thanks for clarification. I think it might be related to the YARN properties >> file, which is still being used for the batch jobs. Can you try to delete it >> between submissions as a temporary workaround to check whether it's related? >> >> – Ufuk >> >> On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> >> wrote: >>> Hi, >>> >>> The zookeeper path is only for my persistent container, and I do use a >>> different one for all my persistent containers. >>> >>> The -Drecovery.mode=standalone was passed inside the JVM_ARGS >>> ("${JVM_ARGS} -Drecovery.mode=standalone >>> -Dyarn.properties-file.location=/tmp/flink/batch") >>> >>> I've tried using -yD recovery.mode=standalone on the flink command line >>> too, but it does not solve the pb; it stills use the pre-existing container. >>> >>> Complete line = >>> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu >>> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s >>> -yD recovery.mode=standalone --class >>> com.bouygtel.kubera.main.segstage.MainGeoSegStage >>> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT >>> -allinone.jar -j /usr/users/datcrypt/alinz/KBR/GOS/log -c >>> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg >>> >>> JVM_ARGS = >>> -Drecovery.mode=standalone >>> -Dyarn.properties-file.location=/tmp/flink/batch >>> >>> >>> Arnaud >>> >>> >>> -----Message d'origine----- >>> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 >>> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working >>> with standalone yarn job manager once a persistent, HA job manager is >>> launched ? >>> >>> Hey Arnaud, >>> >>> The cause of this is probably that both jobs use the same ZooKeeper root >>> path, in which case all task managers connect to the same leading job >>> manager. >>> >>> I think you forgot to the add the y in the -Drecovery.mode=standalone for >>> the batch jobs, e.g. >>> >>> -yDrecovery.mode=standalone >>> >>> Can you try this? >>> >>> – Ufuk >>> >>> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> >>> wrote: >>>> Hi, >>>> >>>> >>>> >>>> I use Flink 1.0.0. I have a persistent yarn container set (a >>>> persistent flink job manager) that I use for streaming jobs ; and I >>>> use the “yarn-cluster” mode to launch my batches. >>>> >>>> >>>> >>>> I’ve just switched “HA” mode on for my streaming persistent job >>>> manager and it seems to works ; however my batches are not working >>>> any longer because they now execute themselves inside the persistent >>>> container (and fail because it lacks slots) and not in a separate >>>> standalone job manager. >>>> >>>> >>>> >>>> My batch launch options: >>>> >>>> >>>> >>>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm >>>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD >>>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD akka.ask.timeout=300s" >>>> >>>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone >>>> -Dyarn.properties-file.location=/tmp/flink/batch" >>>> >>>> >>>> >>>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA >>>> $JAR_SUPP $listArgs $ACTION >>>> >>>> >>>> >>>> My persistent cluster launch option : >>>> >>>> >>>> >>>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10 >>>> -Drecovery.mode=zookeeper >>>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS} >>>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH} >>>> -Dstate.backend=filesystem >>>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PAT >>>> H >>>> }/checkpoints >>>> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/" >>>> >>>> >>>> >>>> $FLINK_DIR/yarn-session.sh >>>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO >>>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS -tm >>>> $FLINK_MEMORY -qu $FLINK_QUEUE -nm ${GANESH_TYPE_PF}_KuberaFlink >>>> >>>> >>>> >>>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the >>>> container for now, but I lack HA. >>>> >>>> >>>> >>>> Is it a (un)known bug or am I missing a magic option? >>>> >>>> >>>> >>>> Best regards, >>>> >>>> Arnaud >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> L'intégrité de ce message n'étant pas assurée sur internet, la >>>> société expéditrice ne peut être tenue responsable de son contenu ni >>>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée >>>> est interdite. Si vous n'êtes pas destinataire de ce message, merci >>>> de le détruire et d'avertir l'expéditeur. >>>> >>>> The integrity of this message cannot be guaranteed on the Internet. >>>> The company that sent this message cannot therefore be held liable >>>> for its content nor attachments. Any unauthorized use or >>>> dissemination is prohibited. If you are not the intended recipient of >>>> this message, then please delete it and notify the sender.