On my master

grep native /root/spark/conf/spark-env.sh

SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/l
ib/native/"



$ ls /root/ephemeral-hdfs/lib/native/

libhadoop.a       libhadoop.so        libhadooputils.a  libsnappy.so
libsnappy.so.1.1.3  Linux-i386-32

libhadooppipes.a  libhadoop.so.1.0.0  libhdfs.a         libsnappy.so.1
Linux-amd64-64


From:  Andrew Davidson <a...@santacruzintegration.com>
Date:  Tuesday, November 17, 2015 at 2:29 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  Re: WARN LoadSnappy: Snappy native library not loaded

> I forgot to mention. I am using spark-1.5.1-bin-hadoop2.6
> 
> From:  Andrew Davidson <a...@santacruzintegration.com>
> Date:  Tuesday, November 17, 2015 at 2:26 PM
> To:  "user @spark" <user@spark.apache.org>
> Subject:  Re: WARN LoadSnappy: Snappy native library not loaded
> 
>> FYI
>> 
>> After 17 min. only 26112/228155 have succeeded
>> 
>> This seems very slow
>> 
>> Kind regards
>> 
>> Andy
>> 
>> 
>> 
>> From:  Andrew Davidson <a...@santacruzintegration.com>
>> Date:  Tuesday, November 17, 2015 at 2:22 PM
>> To:  "user @spark" <user@spark.apache.org>
>> Subject:  WARN LoadSnappy: Snappy native library not loaded
>> 
>> 
>>> I started a spark POC. I created a ec2 cluster on AWS using spark-c2. I
>>> have 3 slaves. In general I am running into trouble even with small work
>>> loads. I am using IPython notebooks running on my spark cluster.
>>> Everything is painfully slow. I am using the standAlone cluster manager.
>>> I noticed that I am getting the following warning on my driver console.
>>> Any idea what the problem might be?
>>> 
>>> 
>>> 
>>> 15/11/17 22:01:59 WARN MetricsSystem: Using default name DAGScheduler for
>>> source because spark.app.id is not set.
>>> 15/11/17 22:03:05 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 15/11/17 22:03:05 WARN LoadSnappy: Snappy native library not loaded
>>> 
>>> 
>>> 
>>> Here is an overview of my POS app. I have a file on hdfs containing about
>>> 5000 twitter status strings.
>>> 
>>> tweetStrings = sc.textFile(dataURL)
>>> 
>>> jTweets = (tweetStrings.map(lambda x: json.loads(x)).take(10))
>>> 
>>> 
>>> Generated the following error ³error occurred while calling
>>> o78.partitions.: java.lang.OutOfMemoryError: GC overhead limit exceeded²
>>> 
>>> Any idea what we need to do to improve new spark user¹s out of the box
>>> experience?
>>> 
>>> Kind regards
>>> 
>>> Andy
>>> 
>>> export PYSPARK_PYTHON=python3.4
>>> export PYSPARK_DRIVER_PYTHON=python3.4
>>> export IPYTHON_OPTS="notebook --no-browser --port=7000 --log-level=WARN"
>>> 
>>> MASTER_URL=spark://ec2-55-218-207-122.us-west-1.compute.amazonaws.com:7077
>>> 
>>> 
>>> numCores=2
>>> $SPARK_ROOT/bin/pyspark --master $MASTER_URL --total-executor-cores
>>> $numCores $*


Reply via email to