Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

Till Rohrmann Thu, 16 Jun 2016 09:02:09 -0700

Hi Arnaud,

at the moment the environment variable is the only way to specify a
different config directory for the CLIFrontend. But it totally makes sense
to introduce a --configDir parameter for the flink shell script. I'll open
an issue for this.


Cheers,
Till

On Thu, Jun 16, 2016 at 5:36 PM, LINZ, Arnaud <al...@bouyguestelecom.fr>
wrote:

> Okay, is there a way to specify the flink-conf.yaml to use on the
> ./bin/flink command-line? I see no such option. I guess I have to set
> FLINK_CONF_DIR before the call ?
>
> -----Message d'origine-----
> De : Maximilian Michels [mailto:m...@apache.org]
> Envoyé : mercredi 15 juin 2016 18:06
> À : user@flink.apache.org
> Objet : Re: Yarn batch not working with standalone yarn job manager once a
> persistent, HA job manager is launched ?
>
> Hi Arnaud,
>
> One issue per thread please. That makes things a lot easier for us :)
>
> Something positive first: We are reworking the resuming of existing Flink
> Yarn applications. It'll be much easier to resume a cluster using simply
> the Yarn ID or re-discoering the Yarn session using the properties file.
>
> The dynamic properties are a shortcut to modifying the Flink configuration
> of the cluster _only_ upon startup. Afterwards, they are already set at the
> containers. We might change this for the 1.1.0 release. It should work if
> you put "yarn.properties-file.location:
> /custom/location" in your flink-conf.yaml before you execute "./bin/flink".
>
> Cheers,
> Max
>
> On Wed, Jun 15, 2016 at 3:14 PM, LINZ, Arnaud <al...@bouyguestelecom.fr>
> wrote:
> > Ooopsss....
> > My mistake, snapshot/restore do works in a local env, I've had a weird
> configuration issue!
> >
> > But I still have the property  file path issue  :)
> >
> > -----Message d'origine-----
> > De : LINZ, Arnaud
> > Envoyé : mercredi 15 juin 2016 14:35
> > À : 'user@flink.apache.org' <user@flink.apache.org> Objet : RE: Yarn
> > batch not working with standalone yarn job manager once a persistent, HA
> job manager is launched ?
> >
> > Hi,
> >
> > I haven't had the time to investigate the bad configuration file path
> issue yet (if you have any idea why yarn.properties-file.location is
> ignored you are welcome) , but I'm facing another HA-problem.
> >
> > I'm trying to make my custom streaming sources HA compliant by
> implementing snapshotState() & restoreState().  I would like to test that
> mechanism in my junit tests, because it can be complex, but I was unable to
> simulate a "recover" on a local flink environment: snapshotState() is never
> triggered and launching an exception inside the execution chain does not
> lead to recovery but ends the execution, despite the
> streamExecEnv.enableCheckpointing(timeout) call.
> >
> > Is there a way to locally test this mechanism (other than poorly
> simulating it by explicitly calling snapshot & restore in a overridden
> source) ?
> >
> > Thanks,
> > Arnaud
> >
> > -----Message d'origine-----
> > De : LINZ, Arnaud
> > Envoyé : lundi 6 juin 2016 17:53
> > À : user@flink.apache.org
> > Objet : RE: Yarn batch not working with standalone yarn job manager once
> a persistent, HA job manager is launched ?
> >
> > I've deleted the '/tmp/.yarn-properties-user' file created for the
> persistent containter, and the batches do go into their own right
> container. However, that's not a workable workaround as I'm no longer able
> to submit streaming apps in the persistant container that way :) So it's
> really a problem of flink finding the right property file.
> >
> > I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the
> batch command line (also configured in the JVM_ARGS var), with no change of
> behaviour. Note that I do have a standalone yarn container created, but the
> job is submitted in the other other one.
> >
> >  Thanks,
> > Arnaud
> >
> > -----Message d'origine-----
> > De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016
> 16:01 À : user@flink.apache.org Objet : Re: Yarn batch not working with
> standalone yarn job manager once a persistent, HA job manager is launched ?
> >
> > Thanks for clarification. I think it might be related to the YARN
> properties file, which is still being used for the batch jobs. Can you try
> to delete it between submissions as a temporary workaround to check whether
> it's related?
> >
> > – Ufuk
> >
> > On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <al...@bouyguestelecom.fr>
> wrote:
> >> Hi,
> >>
> >> The zookeeper path is only for my persistent container, and I do use a
> different one for all my persistent containers.
> >>
> >> The -Drecovery.mode=standalone was passed inside the    JVM_ARGS
> ("${JVM_ARGS} -Drecovery.mode=standalone
> -Dyarn.properties-file.location=/tmp/flink/batch")
> >>
> >> I've tried using -yD recovery.mode=standalone on the flink command line
> too, but it does not solve the pb; it stills use the pre-existing container.
> >>
> >> Complete line =
> >> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu
> >> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s
> >> -yD recovery.mode=standalone --class
> >> com.bouygtel.kubera.main.segstage.MainGeoSegStage
> >> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHO
> >> T -allinone.jar  -j /usr/users/datcrypt/alinz/KBR/GOS/log -c
> >> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg
> >>
> >> JVM_ARGS =
> >> -Drecovery.mode=standalone
> >> -Dyarn.properties-file.location=/tmp/flink/batch
> >>
> >>
> >> Arnaud
> >>
> >>
> >> -----Message d'origine-----
> >> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016
> >> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working
> >> with standalone yarn job manager once a persistent, HA job manager is
> launched ?
> >>
> >> Hey Arnaud,
> >>
> >> The cause of this is probably that both jobs use the same ZooKeeper
> root path, in which case all task managers connect to the same leading job
> manager.
> >>
> >> I think you forgot to the add the y in the -Drecovery.mode=standalone
> for the batch jobs, e.g.
> >>
> >> -yDrecovery.mode=standalone
> >>
> >> Can you try this?
> >>
> >> – Ufuk
> >>
> >> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <al...@bouyguestelecom.fr>
> wrote:
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I use Flink 1.0.0. I have a persistent yarn container set (a
> >>> persistent flink job manager) that I use for streaming jobs ; and I
> >>> use the “yarn-cluster” mode to launch my batches.
> >>>
> >>>
> >>>
> >>> I’ve just switched “HA” mode on for my streaming persistent job
> >>> manager and it seems to works ; however my batches are not working
> >>> any longer because they now execute themselves inside the persistent
> >>> container (and fail because it lacks slots) and not in a separate
> standalone job manager.
> >>>
> >>>
> >>>
> >>> My batch launch options:
> >>>
> >>>
> >>>
> >>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm
> >>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD
> >>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD
> akka.ask.timeout=300s"
> >>>
> >>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone
> >>> -Dyarn.properties-file.location=/tmp/flink/batch"
> >>>
> >>>
> >>>
> >>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA
> >>> $JAR_SUPP $listArgs $ACTION
> >>>
> >>>
> >>>
> >>> My persistent cluster launch option :
> >>>
> >>>
> >>>
> >>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10
> >>> -Drecovery.mode=zookeeper
> >>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS}
> >>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH}
> >>> -Dstate.backend=filesystem
> >>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PA
> >>> T
> >>> H
> >>> }/checkpoints
> >>>
> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/"
> >>>
> >>>
> >>>
> >>> $FLINK_DIR/yarn-session.sh
> >>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO
> >>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS
> >>> -tm $FLINK_MEMORY -qu $FLINK_QUEUE  -nm
> >>> ${GANESH_TYPE_PF}_KuberaFlink
> >>>
> >>>
> >>>
> >>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the
> >>> container for now, but I lack HA.
> >>>
> >>>
> >>>
> >>> Is it a (un)known bug or am I missing a magic option?
> >>>
> >>>
> >>>
> >>> Best regards,
> >>>
> >>> Arnaud
> >>>
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>
> >>> L'intégrité de ce message n'étant pas assurée sur internet, la
> >>> société expéditrice ne peut être tenue responsable de son contenu ni
> >>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée
> >>> est interdite. Si vous n'êtes pas destinataire de ce message, merci
> >>> de le détruire et d'avertir l'expéditeur.
> >>>
> >>> The integrity of this message cannot be guaranteed on the Internet.
> >>> The company that sent this message cannot therefore be held liable
> >>> for its content nor attachments. Any unauthorized use or
> >>> dissemination is prohibited. If you are not the intended recipient
> >>> of this message, then please delete it and notify the sender.
>

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

Reply via email to