I forgot to mention. I am using spark-1.5.1-bin-hadoop2.6

From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Tuesday, November 17, 2015 at 2:26 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  Re: WARN LoadSnappy: Snappy native library not loaded

> FYI
> 
> After 17 min. only 26112/228155 have succeeded
> 
> This seems very slow
> 
> Kind regards
> 
> Andy
> 
> 
> 
> From:  Andrew Davidson <a...@santacruzintegration.com>
> Date:  Tuesday, November 17, 2015 at 2:22 PM
> To:  "user @spark" <user@spark.apache.org>
> Subject:  WARN LoadSnappy: Snappy native library not loaded
> 
> 
>> I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I
>> have 3 slaves. In general I am running into trouble even with small work
>> loads. I am using IPython notebooks running on my spark cluster.
>> Everything is painfully slow. I am using the standAlone cluster manager.
>> I noticed that I am getting the following warning on my driver console.
>> Any idea what the problem might be?
>> 
>> 
>> 
>> 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
>> source because spark.app.id is not set.
>> 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded
>> 
>> 
>> 
>> Here is an overview of my POS app. I have a file on hdfs containing about
>> 5000 twitter status strings.
>> 
>> tweetStrings = sc.textFile(dataURL)
>> 
>> jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))
>> 
>> 
>> Generated the following error ³error occurred while calling
>> o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded²
>> 
>> Any idea what we need to do to improve new spark user¹s out of the box
>> experience?
>> 
>> Kind regards
>> 
>> Andy
>> 
>> export PYSPARK_PYTHON=python3.4
>> export PYSPARK_DRIVER_PYTHON=python3.4
>> export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"
>> 
>> MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077
>> 
>> 
>> numCores=2
>> $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
>> $numCores $*


Reply via email to