I've created an issue for this here: https://issues.apache.org/jira/browse/FLINK-4095
On Mon, Jun 20, 2016 at 11:09 AM, Maximilian Michels <m...@apache.org> wrote: > +1 for a CLI parameter for loading the config from a custom location > > On Thu, Jun 16, 2016 at 6:01 PM, Till Rohrmann <trohrm...@apache.org> wrote: >> Hi Arnaud, >> >> at the moment the environment variable is the only way to specify a >> different config directory for the CLIFrontend. But it totally makes sense >> to introduce a --configDir parameter for the flink shell script. I'll open >> an issue for this. >> >> Cheers, >> Till >> >> On Thu, Jun 16, 2016 at 5:36 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> >> wrote: >>> >>> Okay, is there a way to specify the flink-conf.yaml to use on the >>> ./bin/flink command-line? I see no such option. I guess I have to set >>> FLINK_CONF_DIR before the call ? >>> >>> -----Message d'origine----- >>> De : Maximilian Michels [mailto:m...@apache.org] >>> Envoyé : mercredi 15 juin 2016 18:06 >>> À : user@flink.apache.org >>> Objet : Re: Yarn batch not working with standalone yarn job manager once a >>> persistent, HA job manager is launched ? >>> >>> Hi Arnaud, >>> >>> One issue per thread please. That makes things a lot easier for us :) >>> >>> Something positive first: We are reworking the resuming of existing Flink >>> Yarn applications. It'll be much easier to resume a cluster using simply the >>> Yarn ID or re-discoering the Yarn session using the properties file. >>> >>> The dynamic properties are a shortcut to modifying the Flink configuration >>> of the cluster _only_ upon startup. Afterwards, they are already set at the >>> containers. We might change this for the 1.1.0 release. It should work if >>> you put "yarn.properties-file.location: >>> /custom/location" in your flink-conf.yaml before you execute >>> "./bin/flink". >>> >>> Cheers, >>> Max >>> >>> On Wed, Jun 15, 2016 at 3:14 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> >>> wrote: >>> > Ooopsss.... >>> > My mistake, snapshot/restore do works in a local env, I've had a weird >>> > configuration issue! >>> > >>> > But I still have the property file path issue :) >>> > >>> > -----Message d'origine----- >>> > De : LINZ, Arnaud >>> > Envoyé : mercredi 15 juin 2016 14:35 >>> > À : 'user@flink.apache.org' <user@flink.apache.org> Objet : RE: Yarn >>> > batch not working with standalone yarn job manager once a persistent, HA >>> > job manager is launched ? >>> > >>> > Hi, >>> > >>> > I haven't had the time to investigate the bad configuration file path >>> > issue yet (if you have any idea why yarn.properties-file.location is >>> > ignored >>> > you are welcome) , but I'm facing another HA-problem. >>> > >>> > I'm trying to make my custom streaming sources HA compliant by >>> > implementing snapshotState() & restoreState(). I would like to test that >>> > mechanism in my junit tests, because it can be complex, but I was unable >>> > to >>> > simulate a "recover" on a local flink environment: snapshotState() is >>> > never >>> > triggered and launching an exception inside the execution chain does not >>> > lead to recovery but ends the execution, despite the >>> > streamExecEnv.enableCheckpointing(timeout) call. >>> > >>> > Is there a way to locally test this mechanism (other than poorly >>> > simulating it by explicitly calling snapshot & restore in a overridden >>> > source) ? >>> > >>> > Thanks, >>> > Arnaud >>> > >>> > -----Message d'origine----- >>> > De : LINZ, Arnaud >>> > Envoyé : lundi 6 juin 2016 17:53 >>> > À : user@flink.apache.org >>> > Objet : RE: Yarn batch not working with standalone yarn job manager once >>> > a persistent, HA job manager is launched ? >>> > >>> > I've deleted the '/tmp/.yarn-properties-user' file created for the >>> > persistent containter, and the batches do go into their own right >>> > container. >>> > However, that's not a workable workaround as I'm no longer able to submit >>> > streaming apps in the persistant container that way :) So it's really a >>> > problem of flink finding the right property file. >>> > >>> > I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the >>> > batch command line (also configured in the JVM_ARGS var), with no change >>> > of >>> > behaviour. Note that I do have a standalone yarn container created, but >>> > the >>> > job is submitted in the other other one. >>> > >>> > Thanks, >>> > Arnaud >>> > >>> > -----Message d'origine----- >>> > De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 >>> > 16:01 À : user@flink.apache.org Objet : Re: Yarn batch not working with >>> > standalone yarn job manager once a persistent, HA job manager is launched >>> > ? >>> > >>> > Thanks for clarification. I think it might be related to the YARN >>> > properties file, which is still being used for the batch jobs. Can you try >>> > to delete it between submissions as a temporary workaround to check >>> > whether >>> > it's related? >>> > >>> > – Ufuk >>> > >>> > On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> >>> > wrote: >>> >> Hi, >>> >> >>> >> The zookeeper path is only for my persistent container, and I do use a >>> >> different one for all my persistent containers. >>> >> >>> >> The -Drecovery.mode=standalone was passed inside the JVM_ARGS >>> >> ("${JVM_ARGS} -Drecovery.mode=standalone >>> >> -Dyarn.properties-file.location=/tmp/flink/batch") >>> >> >>> >> I've tried using -yD recovery.mode=standalone on the flink command line >>> >> too, but it does not solve the pb; it stills use the pre-existing >>> >> container. >>> >> >>> >> Complete line = >>> >> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu >>> >> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s >>> >> -yD recovery.mode=standalone --class >>> >> com.bouygtel.kubera.main.segstage.MainGeoSegStage >>> >> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHO >>> >> T -allinone.jar -j /usr/users/datcrypt/alinz/KBR/GOS/log -c >>> >> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg >>> >> >>> >> JVM_ARGS = >>> >> -Drecovery.mode=standalone >>> >> -Dyarn.properties-file.location=/tmp/flink/batch >>> >> >>> >> >>> >> Arnaud >>> >> >>> >> >>> >> -----Message d'origine----- >>> >> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 >>> >> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working >>> >> with standalone yarn job manager once a persistent, HA job manager is >>> >> launched ? >>> >> >>> >> Hey Arnaud, >>> >> >>> >> The cause of this is probably that both jobs use the same ZooKeeper >>> >> root path, in which case all task managers connect to the same leading >>> >> job >>> >> manager. >>> >> >>> >> I think you forgot to the add the y in the -Drecovery.mode=standalone >>> >> for the batch jobs, e.g. >>> >> >>> >> -yDrecovery.mode=standalone >>> >> >>> >> Can you try this? >>> >> >>> >> – Ufuk >>> >> >>> >> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> >>> >> wrote: >>> >>> Hi, >>> >>> >>> >>> >>> >>> >>> >>> I use Flink 1.0.0. I have a persistent yarn container set (a >>> >>> persistent flink job manager) that I use for streaming jobs ; and I >>> >>> use the “yarn-cluster” mode to launch my batches. >>> >>> >>> >>> >>> >>> >>> >>> I’ve just switched “HA” mode on for my streaming persistent job >>> >>> manager and it seems to works ; however my batches are not working >>> >>> any longer because they now execute themselves inside the persistent >>> >>> container (and fail because it lacks slots) and not in a separate >>> >>> standalone job manager. >>> >>> >>> >>> >>> >>> >>> >>> My batch launch options: >>> >>> >>> >>> >>> >>> >>> >>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm >>> >>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD >>> >>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD >>> >>> akka.ask.timeout=300s" >>> >>> >>> >>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone >>> >>> -Dyarn.properties-file.location=/tmp/flink/batch" >>> >>> >>> >>> >>> >>> >>> >>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA >>> >>> $JAR_SUPP $listArgs $ACTION >>> >>> >>> >>> >>> >>> >>> >>> My persistent cluster launch option : >>> >>> >>> >>> >>> >>> >>> >>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10 >>> >>> -Drecovery.mode=zookeeper >>> >>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS} >>> >>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH} >>> >>> -Dstate.backend=filesystem >>> >>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PA >>> >>> T >>> >>> H >>> >>> }/checkpoints >>> >>> >>> >>> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/" >>> >>> >>> >>> >>> >>> >>> >>> $FLINK_DIR/yarn-session.sh >>> >>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO >>> >>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS >>> >>> -tm $FLINK_MEMORY -qu $FLINK_QUEUE -nm >>> >>> ${GANESH_TYPE_PF}_KuberaFlink >>> >>> >>> >>> >>> >>> >>> >>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the >>> >>> container for now, but I lack HA. >>> >>> >>> >>> >>> >>> >>> >>> Is it a (un)known bug or am I missing a magic option? >>> >>> >>> >>> >>> >>> >>> >>> Best regards, >>> >>> >>> >>> Arnaud >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________ >>> >>> >>> >>> L'intégrité de ce message n'étant pas assurée sur internet, la >>> >>> société expéditrice ne peut être tenue responsable de son contenu ni >>> >>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée >>> >>> est interdite. Si vous n'êtes pas destinataire de ce message, merci >>> >>> de le détruire et d'avertir l'expéditeur. >>> >>> >>> >>> The integrity of this message cannot be guaranteed on the Internet. >>> >>> The company that sent this message cannot therefore be held liable >>> >>> for its content nor attachments. Any unauthorized use or >>> >>> dissemination is prohibited. If you are not the intended recipient >>> >>> of this message, then please delete it and notify the sender. >> >>