RE: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

LINZ, Arnaud Thu, 16 Jun 2016 08:37:53 -0700

Okay, is there a way to specify the flink-conf.yaml to use on the ./bin/flink 
command-line? I see no such option. I guess I have to set FLINK_CONF_DIR before 
the call ?


-----Message d'origine-----
De : Maximilian Michels [mailto:m...@apache.org] 
Envoyé : mercredi 15 juin 2016 18:06
À : user@flink.apache.org
Objet : Re: Yarn batch not working with standalone yarn job manager once a 
persistent, HA job manager is launched ?

Hi Arnaud,

One issue per thread please. That makes things a lot easier for us :)

Something positive first: We are reworking the resuming of existing Flink Yarn 
applications. It'll be much easier to resume a cluster using simply the Yarn ID 
or re-discoering the Yarn session using the properties file.

The dynamic properties are a shortcut to modifying the Flink configuration of 
the cluster _only_ upon startup. Afterwards, they are already set at the 
containers. We might change this for the 1.1.0 release. It should work if you 
put "yarn.properties-file.location:
/custom/location" in your flink-conf.yaml before you execute "./bin/flink".

Cheers,
Max

On Wed, Jun 15, 2016 at 3:14 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote:
> Ooopsss....
> My mistake, snapshot/restore do works in a local env, I've had a weird 
> configuration issue!
>
> But I still have the property  file path issue  :)
>
> -----Message d'origine-----
> De : LINZ, Arnaud
> Envoyé : mercredi 15 juin 2016 14:35
> À : 'user@flink.apache.org' <user@flink.apache.org> Objet : RE: Yarn 
> batch not working with standalone yarn job manager once a persistent, HA job 
> manager is launched ?
>
> Hi,
>
> I haven't had the time to investigate the bad configuration file path issue 
> yet (if you have any idea why yarn.properties-file.location is ignored you 
> are welcome) , but I'm facing another HA-problem.
>
> I'm trying to make my custom streaming sources HA compliant by implementing 
> snapshotState() & restoreState().  I would like to test that mechanism in my 
> junit tests, because it can be complex, but I was unable to simulate a 
> "recover" on a local flink environment: snapshotState() is never triggered 
> and launching an exception inside the execution chain does not lead to 
> recovery but ends the execution, despite the 
> streamExecEnv.enableCheckpointing(timeout) call.
>
> Is there a way to locally test this mechanism (other than poorly simulating 
> it by explicitly calling snapshot & restore in a overridden source) ?
>
> Thanks,
> Arnaud
>
> -----Message d'origine-----
> De : LINZ, Arnaud
> Envoyé : lundi 6 juin 2016 17:53
> À : user@flink.apache.org
> Objet : RE: Yarn batch not working with standalone yarn job manager once a 
> persistent, HA job manager is launched ?
>
> I've deleted the '/tmp/.yarn-properties-user' file created for the persistent 
> containter, and the batches do go into their own right container. However, 
> that's not a workable workaround as I'm no longer able to submit streaming 
> apps in the persistant container that way :) So it's really a problem of 
> flink finding the right property file.
>
> I've added -yD yarn.properties-file.location=/tmp/flink/batch inside the 
> batch command line (also configured in the JVM_ARGS var), with no change of 
> behaviour. Note that I do have a standalone yarn container created, but the 
> job is submitted in the other other one.
>
>  Thanks,
> Arnaud
>
> -----Message d'origine-----
> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016 16:01 À 
> : user@flink.apache.org Objet : Re: Yarn batch not working with standalone 
> yarn job manager once a persistent, HA job manager is launched ?
>
> Thanks for clarification. I think it might be related to the YARN properties 
> file, which is still being used for the batch jobs. Can you try to delete it 
> between submissions as a temporary workaround to check whether it's related?
>
> – Ufuk
>
> On Mon, Jun 6, 2016 at 3:18 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote:
>> Hi,
>>
>> The zookeeper path is only for my persistent container, and I do use a 
>> different one for all my persistent containers.
>>
>> The -Drecovery.mode=standalone was passed inside the    JVM_ARGS 
>> ("${JVM_ARGS} -Drecovery.mode=standalone  
>> -Dyarn.properties-file.location=/tmp/flink/batch")
>>
>> I've tried using -yD recovery.mode=standalone on the flink command line too, 
>> but it does not solve the pb; it stills use the pre-existing container.
>>
>> Complete line =
>> /usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 8192 -yqu
>> batch1 -ys 4 -yD yarn.heap-cutoff-ratio=0.3 -yD akka.ask.timeout=300s 
>> -yD recovery.mode=standalone --class 
>> com.bouygtel.kubera.main.segstage.MainGeoSegStage
>> /usr/users/datcrypt/alinz/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHO
>> T -allinone.jar  -j /usr/users/datcrypt/alinz/KBR/GOS/log -c 
>> /usr/users/datcrypt/alinz/KBR/GOS/cfg/KBR_GOS_Config.cfg
>>
>> JVM_ARGS =
>> -Drecovery.mode=standalone
>> -Dyarn.properties-file.location=/tmp/flink/batch
>>
>>
>> Arnaud
>>
>>
>> -----Message d'origine-----
>> De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : lundi 6 juin 2016
>> 14:37 À : user@flink.apache.org Objet : Re: Yarn batch not working 
>> with standalone yarn job manager once a persistent, HA job manager is 
>> launched ?
>>
>> Hey Arnaud,
>>
>> The cause of this is probably that both jobs use the same ZooKeeper root 
>> path, in which case all task managers connect to the same leading job 
>> manager.
>>
>> I think you forgot to the add the y in the -Drecovery.mode=standalone for 
>> the batch jobs, e.g.
>>
>> -yDrecovery.mode=standalone
>>
>> Can you try this?
>>
>> – Ufuk
>>
>> On Mon, Jun 6, 2016 at 2:19 PM, LINZ, Arnaud <al...@bouyguestelecom.fr> 
>> wrote:
>>> Hi,
>>>
>>>
>>>
>>> I use Flink 1.0.0. I have a persistent yarn container set (a 
>>> persistent flink job manager) that I use for streaming jobs ; and I 
>>> use the “yarn-cluster” mode to launch my batches.
>>>
>>>
>>>
>>> I’ve just switched “HA” mode on for my streaming persistent job 
>>> manager and it seems to works ; however my batches are not working 
>>> any longer because they now execute themselves inside the persistent 
>>> container (and fail because it lacks slots) and not in a separate 
>>> standalone job manager.
>>>
>>>
>>>
>>> My batch launch options:
>>>
>>>
>>>
>>> CONTAINER_OPTIONS="-m yarn-cluster -yn $FLINK_NBCONTAINERS -ytm 
>>> $FLINK_MEMORY -yqu $FLINK_QUEUE -ys $FLINK_NBSLOTS -yD 
>>> yarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO -yD akka.ask.timeout=300s"
>>>
>>> JVM_ARGS="${JVM_ARGS} -Drecovery.mode=standalone 
>>> -Dyarn.properties-file.location=/tmp/flink/batch"
>>>
>>>
>>>
>>> $FLINK_DIR/flink run $CONTAINER_OPTIONS --class $MAIN_CLASS_KUBERA 
>>> $JAR_SUPP $listArgs $ACTION
>>>
>>>
>>>
>>> My persistent cluster launch option :
>>>
>>>
>>>
>>> export FLINK_HA_OPTIONS="-Dyarn.application-attempts=10
>>> -Drecovery.mode=zookeeper
>>> -Drecovery.zookeeper.quorum=${FLINK_HA_ZOOKEEPER_SERVERS}
>>> -Drecovery.zookeeper.path.root=${FLINK_HA_ZOOKEEPER_PATH}
>>> -Dstate.backend=filesystem
>>> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PA
>>> T
>>> H
>>> }/checkpoints
>>> -Drecovery.zookeeper.storageDir=hdfs:///tmp/${FLINK_HA_ZOOKEEPER_PATH}/recovery/"
>>>
>>>
>>>
>>> $FLINK_DIR/yarn-session.sh
>>> -Dyarn.heap-cutoff-ratio=$FLINK_HEAP_CUTOFF_RATIO
>>> $FLINK_HA_OPTIONS -st -d -n $FLINK_NBCONTAINERS -s $FLINK_NBSLOTS 
>>> -tm $FLINK_MEMORY -qu $FLINK_QUEUE  -nm 
>>> ${GANESH_TYPE_PF}_KuberaFlink
>>>
>>>
>>>
>>> I’ve switched back to the FLINK_HA_OPTIONS="" way of launching the 
>>> container for now, but I lack HA.
>>>
>>>
>>>
>>> Is it a (un)known bug or am I missing a magic option?
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Arnaud
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> L'intégrité de ce message n'étant pas assurée sur internet, la 
>>> société expéditrice ne peut être tenue responsable de son contenu ni 
>>> de ses pièces jointes. Toute utilisation ou diffusion non autorisée 
>>> est interdite. Si vous n'êtes pas destinataire de ce message, merci 
>>> de le détruire et d'avertir l'expéditeur.
>>>
>>> The integrity of this message cannot be guaranteed on the Internet.
>>> The company that sent this message cannot therefore be held liable 
>>> for its content nor attachments. Any unauthorized use or 
>>> dissemination is prohibited. If you are not the intended recipient 
>>> of this message, then please delete it and notify the sender.

RE: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

Reply via email to