I have a HDFS cluster managed with CDH Manager. Version is CDH 5.1 with
matching GPLEXTRAS parcel. LZO works with Hive and Pig, but I can't make it
work with Spark 1.0.0. I've tried:
* Setting this:
HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS
-Djava.library.path=/opt/cloudera
That does appear to be the case. Thanks!
For posterity, I ran my pyspark like this:
$ sudo su yarn
$ pyspark --driver-library-path
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native/
>>> p = sc.textFile("/some/file")
>>> p.count()
everything appears to be working now.
--
View this message