Re: java program Get Stuck at broadcasting

Akhil Das Wed, 20 May 2015 10:37:15 -0700

This is more like an issue with your HDFS setup, can you check in the
datanode logs? Also try putting a new file in HDFS and see if that works.


Thanks
Best Regards

On Wed, May 20, 2015 at 11:47 AM, allanjie <allanmcgr...@gmail.com> wrote:

> Hi All,
> The variable I need to broadcast is just 468 MB.
>
>
> When broadcasting, it just “stop” at here:
>
> *
> 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.tip.id is
> deprecated. Instead, use mapreduce.task.id
> 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.id is
> deprecated. Instead, use mapreduce.task.attempt.id
> 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.is.map is
> deprecated. Instead, use mapreduce.task.ismap
> 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.task.partition is
> deprecated. Instead, use mapreduce.task.partition
> 15/05/20 11:36:14 INFO Configuration.deprecation: mapred.job.id is
> deprecated. Instead, use mapreduce.job.id
> 15/05/20 11:36:14 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 15/05/20 11:36:14 INFO spark.SparkContext: Starting job: saveAsTextFile at
> Test1.java:90
> 15/05/20 11:36:15 INFO scheduler.DAGScheduler: Got job 0 (saveAsTextFile at
> Test1.java:90) with 4 output partitions (allowLocal=false)
> 15/05/20 11:36:15 INFO scheduler.DAGScheduler: Final stage: Stage
> 0(saveAsTextFile at Test1.java:90)
> 15/05/20 11:36:15 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
> 15/05/20 11:36:15 INFO scheduler.DAGScheduler: Missing parents: List()
> 15/05/20 11:36:15 INFO scheduler.DAGScheduler: Submitting Stage 0
> (MapPartitionsRDD[3] at saveAsTextFile at Test1.java:90), which has no
> missing parents
> 15/05/20 11:36:15 INFO storage.MemoryStore: ensureFreeSpace(129264) called
> with curMem=988453294, maxMem=2061647216
> 15/05/20 11:36:15 INFO storage.MemoryStore: Block broadcast_2 stored as
> values in memory (estimated size 126.2 KB, free 1023.4 MB)
> 15/05/20 11:36:15 INFO storage.MemoryStore: ensureFreeSpace(78190) called
> with curMem=988582558, maxMem=2061647216
> 15/05/20 11:36:15 INFO storage.MemoryStore: Block broadcast_2_piece0 stored
> as bytes in memory (estimated size 76.4 KB, free 1023.3 MB)
> 15/05/20 11:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0
> in
> memory on HadoopV26Master:44855 (size: 76.4 KB, free: 1492.4 MB)
> 15/05/20 11:36:15 INFO storage.BlockManagerMaster: Updated info of block
> broadcast_2_piece0
> 15/05/20 11:36:15 INFO spark.SparkContext: Created broadcast 2 from
> broadcast at DAGScheduler.scala:839
> 15/05/20 11:36:15 INFO scheduler.DAGScheduler: Submitting 4 missing tasks
> from Stage 0 (MapPartitionsRDD[3] at saveAsTextFile at Test1.java:90)
> 15/05/20 11:36:15 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0
> with
> 4 tasks
> 15/05/20 11:36:15 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
> 0.0 (TID 0, HadoopV26Slave5, NODE_LOCAL, 1387 bytes)
> 15/05/20 11:36:15 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
> 0.0 (TID 1, HadoopV26Slave3, NODE_LOCAL, 1387 bytes)
> 15/05/20 11:36:15 INFO scheduler.TaskSetManager: Starting task 2.0 in stage
> 0.0 (TID 2, HadoopV26Slave4, NODE_LOCAL, 1387 bytes)
> 15/05/20 11:36:15 INFO scheduler.TaskSetManager: Starting task 3.0 in stage
> 0.0 (TID 3, HadoopV26Slave1, NODE_LOCAL, 1387 bytes)
> 15/05/20 11:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0
> in
> memory on HadoopV26Slave5:45357 (size: 76.4 KB, free: 2.1 GB)
> 15/05/20 11:36:15 INFO storage.BlockManagerInfo: Added broadcast_2_piece0
> in
> memory on HadoopV26Slave3:57821 (size: 76.4 KB, free: 2.1 GB)
> …….
> 15/05/20 11:36:28 INFO storage.BlockManagerInfo: Added broadcast_1_piece1
> in
> memory on HadoopV26Slave5:45357 (size: 4.0 MB, free: 1646.3 MB)
> *
>
> And didn’t go forward as I still waiting, basically not stop, but more like
> stuck.
>
> I have 6 workers/VMs: each of them has 8GB memory and 12GB disk storage.
> After a few mins pass, the program stopped and showed something like this:
>
>
> 15/05/20 11:42:45 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0
> (TID 1, HadoopV26Slave3):
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>
> /user/output/_temporary/0/_temporary/attempt_201505201136_0000_m_000001_1/part-00001
> could only be replicated to 0 nodes instead of minReplication (=1).  There
> are 6 datanode(s) running and no node(s) are excluded in this operation.
>         at
>
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1549)
>         at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3200)
>         at
>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:641)
>         at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482)
>         at
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:1468)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>         at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>         at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
>         at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399)
>         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>         at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
>         at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532)
>         at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349)
>         at
>
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
>
> And then I check the volume of each slave, seems that almost all the
> storage
> has been dominated. But the variable I broadcast is just 468MB.
>
> Originally it is saved in HDFS. And In java program I read it from hdfs and
> then broadcast that variable.
>
> Anyone can help? Really thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/java-program-Get-Stuck-at-broadcasting-tp22953.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: java program Get Stuck at broadcasting

Reply via email to