FYI After 17 min. only 26112/228155 have succeeded
This seems very slow Kind regards Andy From: Andrew Davidson <[email protected]> Date: Tuesday, November 17, 2015 at 2:22 PM To: "user @spark" <[email protected]> Subject: WARN LoadSnappy: Snappy native library not loaded >I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I >have 3 slaves. In general I am running into trouble even with small work >loads. I am using IPython notebooks running on my spark cluster. >Everything is painfully slow. I am using the standAlone cluster manager. >I noticed that I am getting the following warning on my driver console. >Any idea what the problem might be? > > > >15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for >source because spark.app.id is not set. >15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop >library for your platform... using builtin-java classes where applicable >15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded > > > >Here is an overview of my POS app. I have a file on hdfs containing about >5000 twitter status strings. > >tweetStrings = sc.textFile(dataURL) > >jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10)) > > >Generated the following error ³error occurred while calling >o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded² > >Any idea what we need to do to improve new spark user¹s out of the box >experience? > >Kind regards > >Andy > >export PYSPARK_PYTHON=python3.4 >export PYSPARK_DRIVER_PYTHON=python3.4 >export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN" > >MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077 > > >numCores=2 >$SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores >$numCores $* --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
