Hi,
I am a newbie with Spark.
I tried installing 2 virtual machines, one as a client and one as standalone
mode worker+master.
Everything seems to run and connect fine, but when I try to run a simple
script, I get weird errors.
Here is the traceback, notice my program is just a one-liner:
vagrant@precise32:/usr/local/spark$ MASTER=spark://192.168.16.109:7077
bin/pyspark
Python 2.7.3 (default, Apr 20 2012, 22:44:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
14/03/28 06:45:54 INFO Utils: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/03/28 06:45:54 WARN Utils: Your hostname, precise32 resolves to a
loopback address: 127.0.1.1; using 192.168.16.107 instead (on interface
eth0)
14/03/28 06:45:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
14/03/28 06:45:55 INFO Slf4jLogger: Slf4jLogger started
14/03/28 06:45:55 INFO Remoting: Starting remoting
14/03/28 06:45:55 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@192.168.16.107:55440]
14/03/28 06:45:55 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@192.168.16.107:55440]
14/03/28 06:45:55 INFO SparkEnv: Registering BlockManagerMaster
14/03/28 06:45:55 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20140328064555-5a1f
14/03/28 06:45:55 INFO MemoryStore: MemoryStore started with capacity 297.0
MB.
14/03/28 06:45:55 INFO ConnectionManager: Bound socket to port 55114 with id
= ConnectionManagerId(192.168.16.107,55114)
14/03/28 06:45:55 INFO BlockManagerMaster: Trying to register BlockManager
14/03/28 06:45:55 INFO BlockManagerMasterActor$BlockManagerInfo: Registering
block manager 192.168.16.107:55114 with 297.0 MB RAM
14/03/28 06:45:55 INFO BlockManagerMaster: Registered BlockManager
14/03/28 06:45:55 INFO HttpServer: Starting HTTP Server
14/03/28 06:45:55 INFO HttpBroadcast: Broadcast server started at
http://192.168.16.107:58268
14/03/28 06:45:55 INFO SparkEnv: Registering MapOutputTracker
14/03/28 06:45:55 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-2a1f1a0b-f4d9-402a-ac17-a41d9f9aea0c
14/03/28 06:45:55 INFO HttpServer: Starting HTTP Server
14/03/28 06:45:56 INFO SparkUI: Started Spark Web UI at
http://192.168.16.107:4040
14/03/28 06:45:56 INFO AppClient$ClientActor: Connecting to master
spark://192.168.16.109:7077...
14/03/28 06:45:56 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Welcome to
__
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 0.9.0
/_/
Using Python version 2.7.3 (default, Apr 20 2012 22:44:07)
Spark context available as sc.
>>> 14/03/28 06:45:58 INFO SparkDeploySchedulerBackend: Connected to Spark
>>> cluster with app ID app-20140327234558-
14/03/28 06:47:03 INFO AppClient$ClientActor: Executor added:
app-20140327234558-/0 on worker-20140327234702-192.168.16.109-41619
(192.168.16.109:41619) with 1 cores
14/03/28 06:47:03 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140327234558-/0 on hostPort 192.168.16.109:41619 with 1 cores,
512.0 MB RAM
14/03/28 06:47:04 INFO AppClient$ClientActor: Executor updated:
app-20140327234558-/0 is now RUNNING
14/03/28 06:47:06 INFO SparkDeploySchedulerBackend: Registered executor:
Actor[akka.tcp://sparkExecutor@192.168.16.109:45642/user/Executor#-154634467]
with ID 0
14/03/28 06:47:07 INFO BlockManagerMasterActor$BlockManagerInfo: Registering
block manager 192.168.16.109:60587 with 297.0 MB RAM
>>>
>>> sc.parallelize([1,2]).count()
14/03/28 06:47:35 INFO SparkContext: Starting job: count at :1
14/03/28 06:47:35 INFO DAGScheduler: Got job 0 (count at :1) with 2
output partitions (allowLocal=false)
14/03/28 06:47:35 INFO DAGScheduler: Final stage: Stage 0 (count at
:1)
14/03/28 06:47:35 INFO DAGScheduler: Parents of final stage: List()
14/03/28 06:47:35 INFO DAGScheduler: Missing parents: List()
14/03/28 06:47:35 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[1] at
count at :1), which has no missing parents
14/03/28 06:47:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0
(PythonRDD[1] at count at :1)
14/03/28 06:47:35 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/03/28 06:47:35 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on
executor 0: 192.168.16.109 (PROCESS_LOCAL)
14/03/28 06:47:35 INFO TaskSetManager: Serialized task 0.0:0 as 2546 bytes
in 4 ms
14/03/28 06:47:37 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on
executor 0: 192.168.16.109 (PROCESS_LOCAL)
14/03/28 06:47:37 INFO TaskSetManager: Serialized task 0.0:1 as 2546 bytes
in 1 ms
14/03/28 06:47:37 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/03/28 06:47:37 WARN TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
File "/usr/local/spark/py