FYI

After 17 min. only 26112/228155 have succeeded

This seems very slow

Kind regards

Andy



From:  Andrew Davidson <[email protected]>
Date:  Tuesday, November 17, 2015 at 2:22 PM
To:  "user @spark" <[email protected]>
Subject:  WARN LoadSnappy: Snappy native library not loaded


>I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I
>have 3 slaves. In general I am running into trouble even with small work
>loads. I am using IPython notebooks running on my spark cluster.
>Everything is painfully slow. I am using the standAlone cluster manager.
>I noticed that I am getting the following warning on my driver console.
>Any idea what the problem might be?
>
>
>
>15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
>source because spark.app.id is not set.
>15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
>library for your platform... using builtin-java classes where applicable
>15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded
>
>
>
>Here is an overview of my POS app. I have a file on hdfs containing about
>5000 twitter status strings.
>
>tweetStrings = sc.textFile(dataURL)
>
>jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))
>
>
>Generated the following error ³error occurred while calling
>o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded²
>
>Any idea what we need to do to improve new spark user¹s out of the box
>experience?
>
>Kind regards
>
>Andy
>
>export PYSPARK_PYTHON=python3.4
>export PYSPARK_DRIVER_PYTHON=python3.4
>export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"
>
>MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077
>
>
>numCores=2
>$SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
>$numCores $*



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to