Hi , I change my command to : ./bin/spark-submit --master spark://HDOP-B.AGT:7077 --num-executors 3 --driver-memory 4g --executor-memory 2g --executor-cores 1 examples/src/main/python/pi.py 1000 and it fixed the problem.
I still have couple of questions: PROCESS_LOCAL is not Yarn execution , right? how should I configure the running on yarn? Should I exeture start-all script on all machine or only one? Where is the UI / LOGS of spark execution? 152152SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:140.2 s00SUCCESS PROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.9 s39 ms22SUCCESSPROCESS_LOCAL HDOP-B.AGT2014/09/03 12:35:090.9 s39 ms33SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.9 s39 ms1 ms44SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.8 s39 ms2 ms55SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.8 s39 ms1 ms6 6SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.8 s1 ms77SUCCESS PROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:090.9 s88SUCCESSPROCESS_LOCAL HDOP-B.AGT2014/09/03 12:35:100.3 s99SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:100.4 s1010SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:100.3 s1 ms 1111SUCCESSPROCESS_LOCALHDOP-B.AGT2014/09/03 12:35:100.3 s On Wed, Sep 3, 2014 at 12:19 PM, Oleg Ruchovets <oruchov...@gmail.com> wrote: > Hi Andrew. > what should I do to set master on yarn, can you please pointing me on > command or documentation how to do it? > > > I am doing the following: > executed start-all.sh > [root@HDOP-B sbin]# ./start-all.sh > starting org.apache.spark.deploy.master.Master, logging to > /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-HDOP-B.AGT.out > localhost: Warning: Permanently added 'localhost' (RSA) to the list of > known hosts. > localhost: starting org.apache.spark.deploy.worker.Worker, logging to > /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-HDOP-B.AGT.out > > > after execute the command: > ./bin/spark-submit --master spark://HDOP-B.AGT:7077 > examples/src/main/python/pi.py 1000 > > > the result is the following: > > /usr/jdk64/jdk1.7.0_45/bin/java > > ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar > -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m > 14/09/03 12:10:06 INFO SecurityManager: Using Spark's default log4j > profile: org/apache/spark/log4j-defaults.properties > 14/09/03 12:10:06 INFO SecurityManager: Changing view acls to: root > 14/09/03 12:10:06 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root) > 14/09/03 12:10:07 INFO Slf4jLogger: Slf4jLogger started > 14/09/03 12:10:07 INFO Remoting: Starting remoting > 14/09/03 12:10:07 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://sp...@hdop-b.agt:38944] > 14/09/03 12:10:07 INFO Remoting: Remoting now listens on addresses: > [akka.tcp://sp...@hdop-b.agt:38944] > 14/09/03 12:10:07 INFO SparkEnv: Registering MapOutputTracker > 14/09/03 12:10:07 INFO SparkEnv: Registering BlockManagerMaster > 14/09/03 12:10:08 INFO DiskBlockManager: Created local directory at > /tmp/spark-local-20140903121008-cf09 > 14/09/03 12:10:08 INFO MemoryStore: MemoryStore started with capacity > 294.9 MB. > 14/09/03 12:10:08 INFO ConnectionManager: Bound socket to port 45041 with > id = ConnectionManagerId(HDOP-B.AGT,45041) > 14/09/03 12:10:08 INFO BlockManagerMaster: Trying to register BlockManager > 14/09/03 12:10:08 INFO BlockManagerInfo: Registering block manager > HDOP-B.AGT:45041 with 294.9 MB RAM > 14/09/03 12:10:08 INFO BlockManagerMaster: Registered BlockManager > 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server > 14/09/03 12:10:08 INFO HttpBroadcast: Broadcast server started at > http://10.193.1.76:59336 > 14/09/03 12:10:08 INFO HttpFileServer: HTTP File server directory is > /tmp/spark-7bf5c3c3-1c02-41e8-9fb0-983e175dd45c > 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server > 14/09/03 12:10:08 INFO SparkUI: Started SparkUI at http://HDOP-B.AGT:4040 > 14/09/03 12:10:09 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/09/03 12:10:09 INFO Utils: Copying > /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py > to /tmp/spark-4e252376-70cb-4171-bf2c-d804524e816c/pi.py > 14/09/03 12:10:09 INFO SparkContext: Added file > file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py > at http://10.193.1.76:45893/files/pi.py with timestamp 1409717409277 > 14/09/03 12:10:09 INFO AppClient$ClientActor: Connecting to master > spark://HDOP-B.AGT:7077... > 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Connected to Spark > cluster with app ID app-20140903121009-0000 > 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor added: > app-20140903121009-0000/0 on worker-20140903120712-HDOP-B.AGT-51161 > (HDOP-B.AGT:51161) with 8 cores > 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Granted executor ID > app-20140903121009-0000/0 on hostPort HDOP-B.AGT:51161 with 8 cores, 512.0 > MB RAM > 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor updated: > app-20140903121009-0000/0 is now RUNNING > 14/09/03 12:10:12 INFO SparkDeploySchedulerBackend: Registered executor: > Actor[akka.tcp://sparkexecu...@hdop-b.agt:38143/user/Executor#1295757828] > with ID 0 > 14/09/03 12:10:12 INFO BlockManagerInfo: Registering block manager > HDOP-B.AGT:38670 with 294.9 MB RAM > Traceback (most recent call last): > File > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", > line 38, in <module> > count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) > File > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py", > line 271, in parallelize > jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices) > File > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", > line 537, in __call__ > File > "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.readRDDFromFile. > : java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279) > at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:744) > > > > What should I do to fix the issue > > Thanks > Oleg. > > > On Tue, Sep 2, 2014 at 10:32 PM, Andrew Or <and...@databricks.com> wrote: > >> Hi Oleg, >> >> If you are running Spark on a yarn cluster, you should set --master to >> yarn. By default this runs in client mode, which redirects all output of >> your application to your console. This is failing because it is trying to >> connect to a standalone master that you probably did not start. I am >> somewhat puzzled as to how you ran into an OOM from this configuration, >> however. Does this problem still occur if you set the correct master? >> >> -Andrew >> >> >> 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets <oruchov...@gmail.com>: >> >> Hi , >>> I've installed pyspark on hpd hortonworks cluster. >>> Executing pi example: >>> >>> command: >>> spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]# >>> ./bin/spark-submit --master spark://10.193.1.71:7077 >>> examples/src/main/python/pi.py 1000 >>> >>> exception: >>> >>> 14/09/02 17:34:02 INFO SecurityManager: Using Spark's default log4j >>> profile: org/apache/spark/log4j-defaults.properties >>> 14/09/02 17:34:02 INFO SecurityManager: Changing view acls to: root >>> 14/09/02 17:34:02 INFO SecurityManager: SecurityManager: authentication >>> disabled; ui acls disabled; users with view permissions: Set(root) >>> 14/09/02 17:34:02 INFO Slf4jLogger: Slf4jLogger started >>> 14/09/02 17:34:02 INFO Remoting: Starting remoting >>> 14/09/02 17:34:03 INFO Remoting: Remoting started; listening on >>> addresses :[akka.tcp://sp...@hdop-m.agt:41059] >>> 14/09/02 17:34:03 INFO Remoting: Remoting now listens on addresses: >>> [akka.tcp://sp...@hdop-m.agt:41059] >>> 14/09/02 17:34:03 INFO SparkEnv: Registering MapOutputTracker >>> 14/09/02 17:34:03 INFO SparkEnv: Registering BlockManagerMaster >>> 14/09/02 17:34:03 INFO DiskBlockManager: Created local directory at >>> /tmp/spark-local-20140902173403-cda8 >>> 14/09/02 17:34:03 INFO MemoryStore: MemoryStore started with capacity >>> 294.9 MB. >>> 14/09/02 17:34:03 INFO ConnectionManager: Bound socket to port 34931 >>> with id = ConnectionManagerId(HDOP-M.AGT,34931) >>> 14/09/02 17:34:03 INFO BlockManagerMaster: Trying to register >>> BlockManager >>> 14/09/02 17:34:03 INFO BlockManagerInfo: Registering block manager >>> HDOP-M.AGT:34931 with 294.9 MB RAM >>> 14/09/02 17:34:03 INFO BlockManagerMaster: Registered BlockManager >>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server >>> 14/09/02 17:34:03 INFO HttpBroadcast: Broadcast server started at >>> http://10.193.1.71:54341 >>> 14/09/02 17:34:03 INFO HttpFileServer: HTTP File server directory is >>> /tmp/spark-77c7a7dc-181e-4069-a014-8103a6a6330a >>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server >>> 14/09/02 17:34:04 INFO SparkUI: Started SparkUI at >>> http://HDOP-M.AGT:4040 >>> 14/09/02 17:34:04 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> 14/09/02 17:34:04 INFO Utils: Copying >>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py >>> to /tmp/spark-f2e0cc0f-59cb-4f6c-9d48-f16205a40c7e/pi.py >>> 14/09/02 17:34:04 INFO SparkContext: Added file >>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py >>> at http://10.193.1.71:52938/files/pi.py with timestamp 1409650444941 >>> 14/09/02 17:34:05 INFO AppClient$ClientActor: Connecting to master >>> spark://10.193.1.71:7077... >>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> 14/09/02 17:34:25 INFO AppClient$ClientActor: Connecting to master >>> spark://10.193.1.71:7077... >>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to >>> akka.tcp://sparkMaster@10.193.1.71:7077: >>> akka.remote.EndpointAssociationException: Association failed with >>> [akka.tcp://sparkMaster@10.193.1.71:7077] >>> Traceback (most recent call last): >>> File >>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py", >>> line 38, in <module> >>> count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add) >>> File >>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py", >>> line 271, in parallelize >>> jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices) >>> File >>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", >>> line 537, in __call__ >>> File >>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", >>> line 300, in get_return_value >>> py4j.protocol.Py4JJavaError: An error occurred while calling >>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile. >>> : java.lang.OutOfMemoryError: GC overhead limit exceeded >>> at >>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279) >>> at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >>> at py4j.Gateway.invoke(Gateway.java:259) >>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) >>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>> at py4j.GatewayConnection.run(GatewayConnection.java:207) >>> at java.lang.Thread.run(Thread.java:744) >>> >>> >>> >>> Question: >>> how can I know spark master and port? Where is it defined? >>> >>> Thanks >>> Oleg. >>> >> >> >