Hi Arnaud, at the moment the environment variable is the only way to specify a different config directory for the CLIFrontend. But it totally makes sense to introduce a --configDir parameter for the flink shell script. I'll open an issue for this.
Cheers, Till On Thu, Jun 16, 2016 at 5:36 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote: > Okay, is there a way to specify the flink-conf.yaml to use on the > ./bin/flink command-line? I see no such option. I guess I have to set > FLINK_CONF_DIR before the call ? > > -----Message d'origine----- > De : Maximilian Michels [mailto:m...@apache.org] > Envoyé : mercredi 15 juin 2016 18:06 > À : user@flink.apache.org > Objet : Re: Yarn batch not working with standalone yarn job manager once a > persistent, HA job manager is launched ? > > Hi Arnaud, > > One issue per thread please. That makes things a lot easier for us :) > > Something positive first: We are reworking the resuming of existing Flink > Yarn applications. It'll be much easier to resume a cluster using simply > the Yarn ID or re-discoering the Yarn session using the properties file. > > The dynamic properties are a shortcut to modifying the Flink configuration > of the cluster _only_ upon startup. Afterwards, they are already set at the > containers. We might change this for the 1.1.0 release. It should work if > you put "yarn.properties-file.location: > /custom/location" in your flink-conf.yaml before you execute "./bin/flink". > > Cheers, > Max > > On Wed, Jun 15, 2016 at 3:14 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> > wrote: > > Ooopsss.... > > My mistake, snapshot/restore do works in a local env, I've had a weird > configuration issue! > > > > But I still have the property file path issue :) > > > > -----Message d'origine----- > > De : LINZ, Arnaud > > Envoyé : mercredi 15 juin 2016 14:35 > > À : 'user@flink.apache.org' <user@flink.apache.org> Objet : RE: Yarn > > batch not working with standalone yarn job manager once a persistent, HA > job manager is launched ? > > > > Hi, > > > > I haven't had the time to investigate the bad configuration file path > issue yet (if you have any idea why yarn.properties-file.location is > ignored you are welcome) , but I'm facing another HA-problem. > > > > I'm trying to make my custom streaming sources HA compliant by > implementing snapshotState() & restoreState(). I would like to test that > mechanism in my junit tests, because it can be complex, but I was unable to > simulate a "recover" on a local flink environment: snapshotState() is never > triggered and launching an exception inside the execution chain does not > lead to recovery but ends the execution, despite the > streamExecEnv.enableCheckpointing(timeout) call. > > > > Is there a way to locally test this mechanism (other than poorly > simulating it by explicitly calling snapshot & restore in a overridden > source) ? > > > > Thanks, > > Arnaud > > > > -----Message d'origine----- > > De : LINZ, Arnaud > > Envoyé : lundi 6 juin 2016 17:53 > > À : user@flink.apache.org > > Objet : RE: Yarn batch not working with standalone yarn job manager once > a persistent, HA job manager is launched ? > > > > I've deleted the '/tmp/.yarn-properties-user' file created for the > persistent containter, and the batches do go into their own right > container. However, that's not a workable workaround as I'm no longer able > to submit streaming apps in the persistant container that way :) So it's > really a problem of flink finding the right property file. > > > > I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the > batch command line (also configured in the JVM_ARGS var), with no change of > behaviour. Note that I do have a standalone yarn container created, but the > job is submitted in the other other one. > > > > Thanks, > > Arnaud > > > > -----Message d'origine----- > > De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 > 16:01 À : user@flink.apache.org Objet : Re: Yarn batch not working with > standalone yarn job manager once a persistent, HA job manager is launched ? > > > > Thanks for clarification. I think it might be related to the YARN > properties file, which is still being used for the batch jobs. Can you try > to delete it between submissions as a temporary workaround to check whether > it's related? > > > > – Ufuk > > > > On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> > wrote: > >> Hi, > >> > >> The zookeeper path is only for my persistent container, and I do use a > different one for all my persistent containers. > >> > >> The -Drecovery.mode=standalone was passed inside the JVM_ARGS > ("${JVM_ARGS} -Drecovery.mode=standalone > -Dyarn.properties-file.location=/tmp/flink/batch") > >> > >> I've tried using -yD recovery.mode=standalone on the flink command line > too, but it does not solve the pb; it stills use the pre-existing container. > >> > >> Complete line = > >> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu > >> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s > >> -yD recovery.mode=standalone --class > >> com.bouygtel.kubera.main.segstage.MainGeoSegStage > >> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHO > >> T -allinone.jar -j /usr/users/datcrypt/alinz/KBR/GOS/log -c > >> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg > >> > >> JVM_ARGS = > >> -Drecovery.mode=standalone > >> -Dyarn.properties-file.location=/tmp/flink/batch > >> > >> > >> Arnaud > >> > >> > >> -----Message d'origine----- > >> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 > >> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working > >> with standalone yarn job manager once a persistent, HA job manager is > launched ? > >> > >> Hey Arnaud, > >> > >> The cause of this is probably that both jobs use the same ZooKeeper > root path, in which case all task managers connect to the same leading job > manager. > >> > >> I think you forgot to the add the y in the -Drecovery.mode=standalone > for the batch jobs, e.g. > >> > >> -yDrecovery.mode=standalone > >> > >> Can you try this? > >> > >> – Ufuk > >> > >> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> > wrote: > >>> Hi, > >>> > >>> > >>> > >>> I use Flink 1.0.0. I have a persistent yarn container set (a > >>> persistent flink job manager) that I use for streaming jobs ; and I > >>> use the “yarn-cluster” mode to launch my batches. > >>> > >>> > >>> > >>> I’ve just switched “HA” mode on for my streaming persistent job > >>> manager and it seems to works ; however my batches are not working > >>> any longer because they now execute themselves inside the persistent > >>> container (and fail because it lacks slots) and not in a separate > standalone job manager. > >>> > >>> > >>> > >>> My batch launch options: > >>> > >>> > >>> > >>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm > >>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD > >>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD > akka.ask.timeout=300s" > >>> > >>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone > >>> -Dyarn.properties-file.location=/tmp/flink/batch" > >>> > >>> > >>> > >>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA > >>> $JAR_SUPP $listArgs $ACTION > >>> > >>> > >>> > >>> My persistent cluster launch option : > >>> > >>> > >>> > >>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10 > >>> -Drecovery.mode=zookeeper > >>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS} > >>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH} > >>> -Dstate.backend=filesystem > >>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PA > >>> T > >>> H > >>> }/checkpoints > >>> > -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/" > >>> > >>> > >>> > >>> $FLINK_DIR/yarn-session.sh > >>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO > >>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS > >>> -tm $FLINK_MEMORY -qu $FLINK_QUEUE -nm > >>> ${GANESH_TYPE_PF}_KuberaFlink > >>> > >>> > >>> > >>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the > >>> container for now, but I lack HA. > >>> > >>> > >>> > >>> Is it a (un)known bug or am I missing a magic option? > >>> > >>> > >>> > >>> Best regards, > >>> > >>> Arnaud > >>> > >>> > >>> > >>> > >>> ________________________________ > >>> > >>> L'intégrité de ce message n'étant pas assurée sur internet, la > >>> société expéditrice ne peut être tenue responsable de son contenu ni > >>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée > >>> est interdite. Si vous n'êtes pas destinataire de ce message, merci > >>> de le détruire et d'avertir l'expéditeur. > >>> > >>> The integrity of this message cannot be guaranteed on the Internet. > >>> The company that sent this message cannot therefore be held liable > >>> for its content nor attachments. Any unauthorized use or > >>> dissemination is prohibited. If you are not the intended recipient > >>> of this message, then please delete it and notify the sender. >