Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

Maximilian Michels Wed, 15 Jun 2016 10:01:40 -0700

Just had a quick chat with Ufuk. The issue is that in 1.x the Yarn
properties file is loaded regardless of whether "-m yarn-cluster" is
specified on the command-line. This loads the dynamic properties from
the Yarn properties file and applies all configuration of the running
(session) cluster cluster to the to-be-created cluster.


Will be fixed in 1.1 and probably backported to 1.0.4.

On Wed, Jun 15, 2016 at 6:05 PM, Maximilian Michels <m...@apache.org> wrote:
> Hi Arnaud,
>
> One issue per thread please. That makes things a lot easier for us :)
>
> Something positive first: We are reworking the resuming of existing
> Flink Yarn applications. It'll be much easier to resume a cluster
> using simply the Yarn ID or re-discoering the Yarn session using the
> properties file.
>
> The dynamic properties are a shortcut to modifying the Flink
> configuration of the cluster _only_ upon startup. Afterwards, they are
> already set at the containers. We might change this for the 1.1.0
> release. It should work if you put "yarn.properties-file.location:
> /custom/location" in your flink-conf.yaml before you execute
> "./bin/flink".
>
> Cheers,
> Max
>
> On Wed, Jun 15, 2016 at 3:14 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> 
> wrote:
>> Ooopsss....
>> My mistake, snapshot/restore do works in a local env, I've had a weird 
>> configuration issue!
>>
>> But I still have the property  file path issue  :)
>>
>> -----Message d'origine-----
>> De : LINZ, Arnaud
>> Envoyé : mercredi 15 juin 2016 14:35
>> À : 'user@flink.apache.org' <user@flink.apache.org>
>> Objet : RE: Yarn batch not working with standalone yarn job manager once a 
>> persistent, HA job manager is launched ?
>>
>> Hi,
>>
>> I haven't had the time to investigate the bad configuration file path issue 
>> yet (if you have any idea why yarn.properties-file.location is ignored you 
>> are welcome) , but I'm facing another HA-problem.
>>
>> I'm trying to make my custom streaming sources HA compliant by implementing 
>> snapshotState() & restoreState().  I would like to test that mechanism in my 
>> junit tests, because it can be complex, but I was unable to simulate a 
>> "recover" on a local flink environment: snapshotState() is never triggered 
>> and launching an exception inside the execution chain does not lead to 
>> recovery but ends the execution, despite the 
>> streamExecEnv.enableCheckpointing(timeout) call.
>>
>> Is there a way to locally test this mechanism (other than poorly simulating 
>> it by explicitly calling snapshot & restore in a overridden source) ?
>>
>> Thanks,
>> Arnaud
>>
>> -----Message d'origine-----
>> De : LINZ, Arnaud
>> Envoyé : lundi 6 juin 2016 17:53
>> À : user@flink.apache.org
>> Objet : RE: Yarn batch not working with standalone yarn job manager once a 
>> persistent, HA job manager is launched ?
>>
>> I've deleted the '/tmp/.yarn-properties-user' file created for the 
>> persistent containter, and the batches do go into their own right container. 
>> However, that's not a workable workaround as I'm no longer able to submit 
>> streaming apps in the persistant container that way :) So it's really a 
>> problem of flink finding the right property file.
>>
>> I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the 
>> batch command line (also configured in the JVM_ARGS var), with no change of 
>> behaviour. Note that I do have a standalone yarn container created, but the 
>> job is submitted in the other other one.
>>
>>  Thanks,
>> Arnaud
>>
>> -----Message d'origine-----
>> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 16:01 À 
>> : user@flink.apache.org Objet : Re: Yarn batch not working with standalone 
>> yarn job manager once a persistent, HA job manager is launched ?
>>
>> Thanks for clarification. I think it might be related to the YARN properties 
>> file, which is still being used for the batch jobs. Can you try to delete it 
>> between submissions as a temporary workaround to check whether it's related?
>>
>> – Ufuk
>>
>> On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> 
>> wrote:
>>> Hi,
>>>
>>> The zookeeper path is only for my persistent container, and I do use a 
>>> different one for all my persistent containers.
>>>
>>> The -Drecovery.mode=standalone was passed inside the    JVM_ARGS 
>>> ("${JVM_ARGS} -Drecovery.mode=standalone  
>>> -Dyarn.properties-file.location=/tmp/flink/batch")
>>>
>>> I've tried using -yD recovery.mode=standalone on the flink command line 
>>> too, but it does not solve the pb; it stills use the pre-existing container.
>>>
>>> Complete line =
>>> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu
>>> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s
>>> -yD recovery.mode=standalone --class
>>> com.bouygtel.kubera.main.segstage.MainGeoSegStage
>>> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT
>>> -allinone.jar  -j /usr/users/datcrypt/alinz/KBR/GOS/log -c
>>> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg
>>>
>>> JVM_ARGS =
>>> -Drecovery.mode=standalone
>>> -Dyarn.properties-file.location=/tmp/flink/batch
>>>
>>>
>>> Arnaud
>>>
>>>
>>> -----Message d'origine-----
>>> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016
>>> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working
>>> with standalone yarn job manager once a persistent, HA job manager is 
>>> launched ?
>>>
>>> Hey Arnaud,
>>>
>>> The cause of this is probably that both jobs use the same ZooKeeper root 
>>> path, in which case all task managers connect to the same leading job 
>>> manager.
>>>
>>> I think you forgot to the add the y in the -Drecovery.mode=standalone for 
>>> the batch jobs, e.g.
>>>
>>> -yDrecovery.mode=standalone
>>>
>>> Can you try this?
>>>
>>> – Ufuk
>>>
>>> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> 
>>> wrote:
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I use Flink 1.0.0. I have a persistent yarn container set (a
>>>> persistent flink job manager) that I use for streaming jobs ; and I
>>>> use the “yarn-cluster” mode to launch my batches.
>>>>
>>>>
>>>>
>>>> I’ve just switched “HA” mode on for my streaming persistent job
>>>> manager and it seems to works ; however my batches are not working
>>>> any longer because they now execute themselves inside the persistent
>>>> container (and fail because it lacks slots) and not in a separate 
>>>> standalone job manager.
>>>>
>>>>
>>>>
>>>> My batch launch options:
>>>>
>>>>
>>>>
>>>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm
>>>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD
>>>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD akka.ask.timeout=300s"
>>>>
>>>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone
>>>> -Dyarn.properties-file.location=/tmp/flink/batch"
>>>>
>>>>
>>>>
>>>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA
>>>> $JAR_SUPP $listArgs $ACTION
>>>>
>>>>
>>>>
>>>> My persistent cluster launch option :
>>>>
>>>>
>>>>
>>>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10
>>>> -Drecovery.mode=zookeeper
>>>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS}
>>>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH}
>>>> -Dstate.backend=filesystem
>>>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PAT
>>>> H
>>>> }/checkpoints
>>>> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/"
>>>>
>>>>
>>>>
>>>> $FLINK_DIR/yarn-session.sh
>>>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO
>>>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS -tm
>>>> $FLINK_MEMORY -qu $FLINK_QUEUE  -nm ${GANESH_TYPE_PF}_KuberaFlink
>>>>
>>>>
>>>>
>>>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the
>>>> container for now, but I lack HA.
>>>>
>>>>
>>>>
>>>> Is it a (un)known bug or am I missing a magic option?
>>>>
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Arnaud
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> L'intégrité de ce message n'étant pas assurée sur internet, la
>>>> société expéditrice ne peut être tenue responsable de son contenu ni
>>>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée
>>>> est interdite. Si vous n'êtes pas destinataire de ce message, merci
>>>> de le détruire et d'avertir l'expéditeur.
>>>>
>>>> The integrity of this message cannot be guaranteed on the Internet.
>>>> The company that sent this message cannot therefore be held liable
>>>> for its content nor attachments. Any unauthorized use or
>>>> dissemination is prohibited. If you are not the intended recipient of
>>>> this message, then please delete it and notify the sender.

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

Reply via email to