Oopss... False joy. 

In fact, it does start another container, but this container ends immediately 
because the job is not submitted to that container but to the streaming one.

Log details: 

Command = 
#  JVM_ARGS =  -DCluster.Parallelisme=150  -Drecovery.mode=standalone
/usr/lib/flink/bin/flink run -m yarn-cluster -yn 48 -ytm 5120 -yqu batch1 -ys 4 
--class com.bouygtel.kubera.main.segstage.MainGeoSegStage 
/home/voyager/KBR/GOS/lib/KUBERA-GEO-SOURCE-0.0.1-SNAPSHOT-allinone.jar  -j 
/home/voyager/KBR/GOS/log -c /home/voyager/KBR/GOS/cfg/KBR_GOS_Config.cfg 

Log = 
Found YARN properties file /tmp/.yarn-properties-voyager
YARN properties set default parallelism to 24
Using JobManager address from YARN properties 
bt1shli3.bpa.bouyguestelecom.fr/172.21.125.28:36700
YARN cluster mode detected. Switching Log4j output to console
11:39:18,192 INFO  org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl    
 - Timeline service address: 
http://h1r1dn02.bpa.bouyguestelecom.fr:8188/ws/v1/timeline/
11:39:18,349 INFO  org.apache.hadoop.yarn.client.RMProxy                        
 - Connecting to ResourceManager at 
h1r1nn01.bpa.bouyguestelecom.fr/172.21.125.3:8050
11:39:18,504 INFO  org.apache.flink.client.FlinkYarnSessionCli                  
 - No path for the flink jar passed. Using the location of class 
org.apache.flink.yarn.FlinkYarnClient to locate the jar
11:39:18,513 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - Using values:
11:39:18,515 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 -   TaskManager count = 48
11:39:18,515 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 -   JobManager memory = 1024
11:39:18,515 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 -   TaskManager memory = 5120
11:39:18,641 WARN  org.apache.flink.yarn.FlinkYarnClient                        
 - The JobManager or TaskManager memory is below the smallest possible YARN 
Container size. The value of 'yarn.scheduler.minimum-allocation-mb' is '2048'. 
Please increase the memory size.YARN will allocate the smaller containers but 
the scheduler will account for the minimum-allocation-mb, maybe not all 
instances you requested will start.
11:39:19,102 INFO  org.apache.flink.yarn.Utils                                  
 - Copying from file:/usr/lib/flink/lib/flink-dist_2.11-0.10.0.jar to 
hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/flink-dist_2.11-0.10.0.jar
11:39:19,653 INFO  org.apache.flink.yarn.Utils                                  
 - Copying from /usr/lib/flink/conf/flink-conf.yaml to 
hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/flink-conf.yaml
11:39:19,667 INFO  org.apache.flink.yarn.Utils                                  
 - Copying from file:/usr/lib/flink/conf/logback.xml to 
hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/logback.xml
11:39:19,679 INFO  org.apache.flink.yarn.Utils                                  
 - Copying from file:/usr/lib/flink/conf/log4j.properties to 
hdfs://h1r1nn01.bpa.bouyguestelecom.fr:8020/user/voyager/.flink/application_1449127732314_0046/log4j.properties
11:39:19,698 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - Submitting application master application_1449127732314_0046
11:39:19,723 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl        
 - Submitted application application_1449127732314_0046
11:39:19,723 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - Waiting for the cluster to be allocated
11:39:19,725 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - Deploying cluster, current state ACCEPTED
11:39:20,727 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - Deploying cluster, current state ACCEPTED
11:39:21,728 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - Deploying cluster, current state ACCEPTED
11:39:22,730 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - Deploying cluster, current state ACCEPTED
11:39:23,731 INFO  org.apache.flink.yarn.FlinkYarnClient                        
 - YARN application has been deployed successfully.
11:39:23,734 INFO  org.apache.flink.yarn.FlinkYarnCluster                       
 - Start actor system.
11:39:24,192 INFO  org.apache.flink.yarn.FlinkYarnCluster                       
 - Start application client.
YARN cluster started
JobManager web interface address 
http://h1r1nn01.bpa.bouyguestelecom.fr:8088/proxy/application_1449127732314_0046/
Waiting until all TaskManagers have connected
11:39:24,202 INFO  org.apache.flink.yarn.ApplicationClient                      
 - Notification about new leader address 
akka.tcp://flink@172.21.125.16:59907/user/jobmanager with session ID null.
No status updates from the YARN cluster received so far. Waiting ...
11:39:24,206 INFO  org.apache.flink.yarn.ApplicationClient                      
 - Received address of new leader 
akka.tcp://flink@172.21.125.16:59907/user/jobmanager with session ID null.
11:39:24,206 INFO  org.apache.flink.yarn.ApplicationClient                      
 - Disconnect from JobManager null.
11:39:24,210 INFO  org.apache.flink.yarn.ApplicationClient                      
 - Trying to register at JobManager 
akka.tcp://flink@172.21.125.16:59907/user/jobmanager.
11:39:24,377 INFO  org.apache.flink.yarn.ApplicationClient                      
 - Successfully registered at the JobManager 
Actor[akka.tcp://flink@172.21.125.16:59907/user/jobmanager#-801507205]
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (0/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (12/48)
TaskManager status (46/48)
TaskManager status (46/48)
TaskManager status (46/48)
TaskManager status (46/48)
All TaskManagers are connected
Using the parallelism provided by the remote cluster (192). To use another 
parallelism, set it at the ./bin/flink client.
12/03/2015 11:39:55  Job execution switched to status RUNNING.
12/03/2015 11:39:55  CHAIN DataSource (at 
createInput(ExecutionEnvironment.java:508) 
(com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at 
readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1)(1/150) switched to 
SCHEDULED 
12/03/2015 11:39:55  CHAIN DataSource (at 
createInput(ExecutionEnvironment.java:508) 
(com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at 
readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1)(1/150) switched to 
DEPLOYING
=> The job starts

Then it crashes :

org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not 
enough free slots available to run the job. You can decrease the operator 
parallelism or increase the number of slots per TaskManager in the 
configuration. Task to schedule: < Attempt #0 (CHAIN DataSource (at 
createInput(ExecutionEnvironment.java:508) 
(com.bouygtel.kuberasdk.hive.HiveHCatDAO$1)) -> FlatMap (FlatMap at 
readTable(HiveHCatDAO.java:120)) -> Map (Key Extractor 1) (5/150)) @ 
(unassigned) - [SCHEDULED] > with groupID < 7b9e554a93d3ea946d13d239a99bb6ae > 
in sharing group < SlotSharingGroup [0c9285747d113d8dd85962602b674497, 
9f30db9a30430385e1cd9d0f5010ed9e, 36b825566212059be3f888e3bbdf0d96, 
f95ba68c3916346efe497b937393eb49, e73522cce11e699022c285180fd1024d, 
988b776310ef3d8a2a3875227008a30e, 7b9e554a93d3ea946d13d239a99bb6ae, 
08af3a01b9cb49b76e6aedcd57d57788, 3f91660c6ab25f0f77d8e55d54397b01] >. 
Resources available to scheduler: Number of instances=6, total number of 
slots=24, available slots=0

Stating that I have only 24 slots on my 48 container cluster !




-----Message d'origine-----
De : LINZ, Arnaud 
Envoyé : jeudi 3 décembre 2015 11:26
À : user@flink.apache.org
Objet : RE: HA Mode and standalone containers compatibility ?

Hi,

The batch job does not need to be HA.
I stopped everything, cleaned the temp files, added -Drecovery.mode=standalone 
and it seems to work now !
Strange, but good for me for now.

Thanks,
Arnaud

-----Message d'origine-----
De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : jeudi 3 décembre 2015 11:11 
À : user@flink.apache.org Objet : Re: HA Mode and standalone containers 
compatibility ?

Hey Arnaud,

thanks for reporting this. I think Till’s suggestion will help to debug this 
(checking whether a second YARN application has been started)…

You don’t want to run the batch application in HA mode, correct?

I sounds like the batch job is submitted with the same config keys. Could you 
start the batch job explicitly with -Drecovery.mode=standalone?

If you do want the batch job to be HA as well, you have to configure separate 
Zookeeper root paths:

recovery.zookeeper.path.root: /flink-streaming-1 # for the streaming session

recovery.zookeeper.path.root: /flink-batch # for the batch session

– Ufuk

> On 03 Dec 2015, at 11:01, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote:
> 
> Yes, it does interfere, I do have additional task managers. My batch 
> application comes in my streaming cluster Flink’s GUI instead of creating its 
> own container with its own GUI despite the –m yarn-cluster option.
>  
> De : Till Rohrmann [mailto:trohrm...@apache.org] Envoyé : jeudi 3 
> décembre 2015 10:36 À : user@flink.apache.org Objet : Re: HA Mode and 
> standalone containers compatibility ?
>  
> Hi Arnaud,
>  
> as long as you don't have HA activated for your batch jobs, HA shouldn't have 
> an influence on the batch execution. If it interferes, then you should see 
> additional task manager connected to the streaming cluster when you execute 
> the batch job. Could you check that? Furthermore, could you check that 
> actually a second yarn application is started when you run the batch jobs?
>  
> Cheers,
> Till
>  
> On Thu, Dec 3, 2015 at 9:57 AM, LINZ, Arnaud <al...@bouyguestelecom.fr> wrote:
> Hello,
> 
>  
> 
> I have both streaming applications & batch applications. Since the memory 
> needs are not the same, I was using a long-living container for my streaming 
> apps and new short-lived containers for hosting each batch execution.
> 
>  
> 
> For that, I submit streaming jobs with "flink run"  and batch jobs with 
> "flink run -m yarn-cluster"
> 
>  
> 
> This was working fine until I turned zookeeper HA mode on for my streaming 
> applications.
> 
> Even if I don't set it up in the yaml flink configuration file, but with -D 
> options on the yarn_session.sh command line, now my batch jobs try to run in 
> the streaming container, and fails because of the lack of ressources.
> 
>  
> 
> My HA options are :
> 
> -Dyarn.application-attempts=10 -Drecovery.mode=zookeeper
> -Drecovery.zookeeper.quorum=h1r1en01:2181
> -Drecovery.zookeeper.path.root=/flink  -Dstate.backend=filesystem 
> -Dstate.backend.fs.checkpointdir=hdfs:///tmp/flink/checkpoints
> -Drecovery.zookeeper.storageDir=hdfs:///tmp/flink/recovery/
> 
>  
> 
> Am I missing something ?
> 
>  
> 
> Best regards,
> 
> Aranud
> 
>  
> 
> L'intégrité de ce message n'étant pas assurée sur internet, la société 
> expéditrice ne peut être tenue responsable de son contenu ni de ses pièces 
> jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous 
> n'êtes pas destinataire de ce message, merci de le détruire et d'avertir 
> l'expéditeur.
> 
> The integrity of this message cannot be guaranteed on the Internet. The 
> company that sent this message cannot therefore be held liable for its 
> content nor attachments. Any unauthorized use or dissemination is prohibited. 
> If you are not the intended recipient of this message, then please delete it 
> and notify the sender.

Reply via email to