Hi, I haven't had the time to investigate the bad configuration file path issue yet (if you have any idea why yarn.properties-file.location is ignored you are welcome) , but I'm facing another HA-problem.
I'm trying to make my custom streaming sources HA compliant by implementing snapshotState() & restoreState(). I would like to test that mechanism in my junit tests, because it can be complex, but I was unable to simulate a "recover" on a local flink environment: snapshotState() is never triggered and launching an exception inside the execution chain does not lead to recovery but ends the execution, despite the streamExecEnv.enableCheckpointing(timeout) call. Is there a way to locally test this mechanism (other than poorly simulating it by explicitly calling snapshot & restore in a overridden source) ? Thanks, Arnaud -----Message d'origine----- De : LINZ, Arnaud Envoyé : lundi 6 juin 2016 17:53 À : user@flink.apache.org Objet : RE: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ? I've deleted the '/tmp/.yarn-properties-user' file created for the persistent containter, and the batches do go into their own right container. However, that's not a workable workaround as I'm no longer able to submit streaming apps in the persistant container that way :) So it's really a problem of flink finding the right property file. I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the batch command line (also configured in the JVM_ARGS var), with no change of behaviour. Note that I do have a standalone yarn container created, but the job is submitted in the other other one. Thanks, Arnaud -----Message d'origine----- De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 16:01 À : user@flink.apache.org Objet : Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ? Thanks for clarification. I think it might be related to the YARN properties file, which is still being used for the batch jobs. Can you try to delete it between submissions as a temporary workaround to check whether it's related? – Ufuk On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote: > Hi, > > The zookeeper path is only for my persistent container, and I do use a > different one for all my persistent containers. > > The -Drecovery.mode=standalone was passed inside the JVM_ARGS > ("${JVM_ARGS} -Drecovery.mode=standalone > -Dyarn.properties-file.location=/tmp/flink/batch") > > I've tried using -yD recovery.mode=standalone on the flink command line too, > but it does not solve the pb; it stills use the pre-existing container. > > Complete line = > /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu > batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s > -yD recovery.mode=standalone --class > com.bouygtel.kubera.main.segstage.MainGeoSegStage > /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT > -allinone.jar -j /usr/users/datcrypt/alinz/KBR/GOS/log -c > /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg > > JVM_ARGS = > -Drecovery.mode=standalone > -Dyarn.properties-file.location=/tmp/flink/batch > > > Arnaud > > > -----Message d'origine----- > De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 > 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working > with standalone yarn job manager once a persistent, HA job manager is > launched ? > > Hey Arnaud, > > The cause of this is probably that both jobs use the same ZooKeeper root > path, in which case all task managers connect to the same leading job manager. > > I think you forgot to the add the y in the -Drecovery.mode=standalone for the > batch jobs, e.g. > > -yDrecovery.mode=standalone > > Can you try this? > > – Ufuk > > On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote: >> Hi, >> >> >> >> I use Flink 1.0.0. I have a persistent yarn container set (a >> persistent flink job manager) that I use for streaming jobs ; and I >> use the “yarn-cluster” mode to launch my batches. >> >> >> >> I’ve just switched “HA” mode on for my streaming persistent job >> manager and it seems to works ; however my batches are not working >> any longer because they now execute themselves inside the persistent >> container (and fail because it lacks slots) and not in a separate standalone >> job manager. >> >> >> >> My batch launch options: >> >> >> >> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm >> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD >> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD akka.ask.timeout=300s" >> >> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone >> -Dyarn.properties-file.location=/tmp/flink/batch" >> >> >> >> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA >> $JAR_SUPP $listArgs $ACTION >> >> >> >> My persistent cluster launch option : >> >> >> >> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10 >> -Drecovery.mode=zookeeper >> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS} >> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH} >> -Dstate.backend=filesystem >> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PAT >> H >> }/checkpoints >> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/" >> >> >> >> $FLINK_DIR/yarn-session.sh >> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO >> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS -tm >> $FLINK_MEMORY -qu $FLINK_QUEUE -nm ${GANESH_TYPE_PF}_KuberaFlink >> >> >> >> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the >> container for now, but I lack HA. >> >> >> >> Is it a (un)known bug or am I missing a magic option? >> >> >> >> Best regards, >> >> Arnaud >> >> >> >> >> ________________________________ >> >> L'intégrité de ce message n'étant pas assurée sur internet, la >> société expéditrice ne peut être tenue responsable de son contenu ni >> de ses pièces jointes. Toute utilisation ou diffusion non autorisée >> est interdite. Si vous n'êtes pas destinataire de ce message, merci >> de le détruire et d'avertir l'expéditeur. >> >> The integrity of this message cannot be guaranteed on the Internet. >> The company that sent this message cannot therefore be held liable >> for its content nor attachments. Any unauthorized use or >> dissemination is prohibited. If you are not the intended recipient of >> this message, then please delete it and notify the sender.