Re: pyspark yarn got exception

Oleg Ruchovets Tue, 02 Sep 2014 21:53:11 -0700

Hi ,
  I change my command to :
  ./bin/spark-submit --master spark://HDOP-B.AGT:7077 --num-executors 3
 --driver-memory 4g --executor-memory 2g --executor-cores 1
examples/src/main/python/pi.py   1000
and it fixed the problem.


I still have couple of questions:
   PROCESS_LOCAL is not Yarn execution , right? how should I configure the
running on yarn? Should I exeture start-all script on all machine or only
one?  Where is the UI / LOGS of spark execution?





152152SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:140.2 s00SUCCESS
PROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.9 s39 ms22SUCCESSPROCESS_LOCAL
HDOP-B.AGT2014/09/03 12:35:090.9 s39
ms33SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03
12:35:090.9 s39 ms1 ms44SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.8
s39 ms2 ms55SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.8 s39 ms1 ms6
6SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.8 s1 ms77SUCCESS
PROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.9 s88SUCCESSPROCESS_LOCAL
HDOP-B.AGT2014/09/03 12:35:100.3 s99SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03
12:35:100.4 s1010SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:100.3 s1 ms
1111SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:100.3 s


On Wed, Sep 3, 2014 at 12:19 PM, Oleg Ruchovets <oruchov...@gmail.com>
wrote:

> Hi Andrew.
>    what should I do to set master on yarn, can you please pointing me on
> command or documentation how to do it?
>
>
> I am doing the following:
>    executed start-all.sh
>    [root@HDOP-B sbin]# ./start-all.sh
> starting org.apache.spark.deploy.master.Master, logging to
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-HDOP-B.AGT.out
> localhost: Warning: Permanently added 'localhost' (RSA) to the list of
> known hosts.
> localhost: starting org.apache.spark.deploy.worker.Worker, logging to
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-HDOP-B.AGT.out
>
>
> after execute the command:
>     ./bin/spark-submit --master spark://HDOP-B.AGT:7077
> examples/src/main/python/pi.py 1000
>
>
> the result is the following:
>
>    /usr/jdk64/jdk1.7.0_45/bin/java
>
> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> 14/09/03 12:10:06 INFO SecurityManager: Using Spark's default log4j
> profile: org/apache/spark/log4j-defaults.properties
> 14/09/03 12:10:06 INFO SecurityManager: Changing view acls to: root
> 14/09/03 12:10:06 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(root)
> 14/09/03 12:10:07 INFO Slf4jLogger: Slf4jLogger started
> 14/09/03 12:10:07 INFO Remoting: Starting remoting
> 14/09/03 12:10:07 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sp...@hdop-b.agt:38944]
> 14/09/03 12:10:07 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://sp...@hdop-b.agt:38944]
> 14/09/03 12:10:07 INFO SparkEnv: Registering MapOutputTracker
> 14/09/03 12:10:07 INFO SparkEnv: Registering BlockManagerMaster
> 14/09/03 12:10:08 INFO DiskBlockManager: Created local directory at
> /tmp/spark-local-20140903121008-cf09
> 14/09/03 12:10:08 INFO MemoryStore: MemoryStore started with capacity
> 294.9 MB.
> 14/09/03 12:10:08 INFO ConnectionManager: Bound socket to port 45041 with
> id = ConnectionManagerId(HDOP-B.AGT,45041)
> 14/09/03 12:10:08 INFO BlockManagerMaster: Trying to register BlockManager
> 14/09/03 12:10:08 INFO BlockManagerInfo: Registering block manager
> HDOP-B.AGT:45041 with 294.9 MB RAM
> 14/09/03 12:10:08 INFO BlockManagerMaster: Registered BlockManager
> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
> 14/09/03 12:10:08 INFO HttpBroadcast: Broadcast server started at
> http://10.193.1.76:59336
> 14/09/03 12:10:08 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-7bf5c3c3-1c02-41e8-9fb0-983e175dd45c
> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
> 14/09/03 12:10:08 INFO SparkUI: Started SparkUI at http://HDOP-B.AGT:4040
> 14/09/03 12:10:09 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/09/03 12:10:09 INFO Utils: Copying
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
> to /tmp/spark-4e252376-70cb-4171-bf2c-d804524e816c/pi.py
> 14/09/03 12:10:09 INFO SparkContext: Added file
> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
> at http://10.193.1.76:45893/files/pi.py with timestamp 1409717409277
> 14/09/03 12:10:09 INFO AppClient$ClientActor: Connecting to master
> spark://HDOP-B.AGT:7077...
> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Connected to Spark
> cluster with app ID app-20140903121009-0000
> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor added:
> app-20140903121009-0000/0 on worker-20140903120712-HDOP-B.AGT-51161
> (HDOP-B.AGT:51161) with 8 cores
> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20140903121009-0000/0 on hostPort HDOP-B.AGT:51161 with 8 cores, 512.0
> MB RAM
> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor updated:
> app-20140903121009-0000/0 is now RUNNING
> 14/09/03 12:10:12 INFO SparkDeploySchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkexecu...@hdop-b.agt:38143/user/Executor#1295757828]
> with ID 0
> 14/09/03 12:10:12 INFO BlockManagerInfo: Registering block manager
> HDOP-B.AGT:38670 with 294.9 MB RAM
> Traceback (most recent call last):
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 38, in <module>
>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
> line 271, in parallelize
>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 537, in __call__
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
> : java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>  at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
>  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>  at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> What should I do to fix the issue
>
> Thanks
> Oleg.
>
>
> On Tue, Sep 2, 2014 at 10:32 PM, Andrew Or <and...@databricks.com> wrote:
>
>> Hi Oleg,
>>
>> If you are running Spark on a yarn cluster, you should set --master to
>> yarn. By default this runs in client mode, which redirects all output of
>> your application to your console. This is failing because it is trying to
>> connect to a standalone master that you probably did not start. I am
>> somewhat puzzled as to how you ran into an OOM from this configuration,
>> however. Does this problem still occur if you set the correct master?
>>
>> -Andrew
>>
>>
>> 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets <oruchov...@gmail.com>:
>>
>> Hi ,
>>>    I've installed pyspark on hpd hortonworks cluster.
>>>   Executing pi example:
>>>
>>> command:
>>>        spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> ./bin/spark-submit --master spark://10.193.1.71:7077
>>> examples/src/main/python/pi.py   1000
>>>
>>> exception:
>>>
>>>     14/09/02 17:34:02 INFO SecurityManager: Using Spark's default log4j
>>> profile: org/apache/spark/log4j-defaults.properties
>>> 14/09/02 17:34:02 INFO SecurityManager: Changing view acls to: root
>>> 14/09/02 17:34:02 INFO SecurityManager: SecurityManager: authentication
>>> disabled; ui acls disabled; users with view permissions: Set(root)
>>> 14/09/02 17:34:02 INFO Slf4jLogger: Slf4jLogger started
>>> 14/09/02 17:34:02 INFO Remoting: Starting remoting
>>> 14/09/02 17:34:03 INFO Remoting: Remoting started; listening on
>>> addresses :[akka.tcp://sp...@hdop-m.agt:41059]
>>> 14/09/02 17:34:03 INFO Remoting: Remoting now listens on addresses:
>>> [akka.tcp://sp...@hdop-m.agt:41059]
>>> 14/09/02 17:34:03 INFO SparkEnv: Registering MapOutputTracker
>>> 14/09/02 17:34:03 INFO SparkEnv: Registering BlockManagerMaster
>>> 14/09/02 17:34:03 INFO DiskBlockManager: Created local directory at
>>> /tmp/spark-local-20140902173403-cda8
>>> 14/09/02 17:34:03 INFO MemoryStore: MemoryStore started with capacity
>>> 294.9 MB.
>>> 14/09/02 17:34:03 INFO ConnectionManager: Bound socket to port 34931
>>> with id = ConnectionManagerId(HDOP-M.AGT,34931)
>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Trying to register
>>> BlockManager
>>> 14/09/02 17:34:03 INFO BlockManagerInfo: Registering block manager
>>> HDOP-M.AGT:34931 with 294.9 MB RAM
>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Registered BlockManager
>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>> 14/09/02 17:34:03 INFO HttpBroadcast: Broadcast server started at
>>> http://10.193.1.71:54341
>>> 14/09/02 17:34:03 INFO HttpFileServer: HTTP File server directory is
>>> /tmp/spark-77c7a7dc-181e-4069-a014-8103a6a6330a
>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>> 14/09/02 17:34:04 INFO SparkUI: Started SparkUI at
>>> http://HDOP-M.AGT:4040
>>> 14/09/02 17:34:04 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 14/09/02 17:34:04 INFO Utils: Copying
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> to /tmp/spark-f2e0cc0f-59cb-4f6c-9d48-f16205a40c7e/pi.py
>>> 14/09/02 17:34:04 INFO SparkContext: Added file
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> at http://10.193.1.71:52938/files/pi.py with timestamp 1409650444941
>>> 14/09/02 17:34:05 INFO AppClient$ClientActor: Connecting to master
>>> spark://10.193.1.71:7077...
>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> 14/09/02 17:34:25 INFO AppClient$ClientActor: Connecting to master
>>> spark://10.193.1.71:7077...
>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> Traceback (most recent call last):
>>>   File
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> line 38, in <module>
>>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>>   File
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>> line 271, in parallelize
>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>>   File
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> line 537, in __call__
>>>   File
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> line 300, in get_return_value
>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>>> at
>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>> at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>  at py4j.Gateway.invoke(Gateway.java:259)
>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>  at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>  at java.lang.Thread.run(Thread.java:744)
>>>
>>>
>>>
>>> Question:
>>>     how can I know spark master and port? Where is it defined?
>>>
>>> Thanks
>>> Oleg.
>>>
>>
>>
>

Re: pyspark yarn got exception

Reply via email to