Nicholas,

Thanks!

How do I make spark assemble against a local version of Hadoop?

I have 2.4.1 running on a test cluster and I did "SPARK_HADOOP_VERSION=2.4.1 sbt/sbt assembly" but all it did was pull in hadoop-2.4.1 dependencies via sbt (which is sufficient for using a 2.4.1 HDFS). I am guessing my local version of Hadoop libraries/jars is not used. Alternatively, how do I add the hadoop-gpl-compression-0.1.0.jar (responsible for the lzo stuff) to this hand assembled Spark?

I am running the spark-shell like this:
bin/spark-shell --jars /home/ec2-user/hadoop/lib/hadoop-gpl-compression-0.1.0.jar

and getting this:

scala> val f = sc.newAPIHadoopFile("hdfs://10.10.0.98:54310/data/1gram.lzo",classOf[com.hadoop.mapreduce.LzoTextInputFormat],classOf[org.apache.hadoop.io.LongWritable],classOf[org.apache.hadoop.io.Text]) 14/07/13 16:53:01 INFO MemoryStore: ensureFreeSpace(216014) called with curMem=0, maxMem=311387750 14/07/13 16:53:01 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 211.0 KB, free 296.8 MB) f: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.LongWritable, org.apache.hadoop.io.Text)] = NewHadoopRDD[0] at newAPIHadoopFile at <console>:12

scala> f.take(1)
14/07/13 16:53:08 INFO FileInputFormat: Total input paths to process : 1
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at com.hadoop.mapreduce.LzoTextInputFormat.listStatus(LzoTextInputFormat.java:67)

which makes me think something is not linked to something properly (not a Java expert unfortunately).

Thanks!
Ognen


On 7/13/14, 10:35 AM, Nicholas Chammas wrote:

If you’re still seeing gibberish, it’s because Spark is not using the LZO libraries properly. In your case, I believe you should be calling |newAPIHadoopFile()| instead of |textFile()|.

For example:

|sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data",
   classOf[com.hadoop.mapreduce.LzoTextInputFormat],
   classOf[org.apache.hadoop.io.LongWritable],
   classOf[org.apache.hadoop.io.Text])
|

On a side note, here’s a related JIRA issue: SPARK-2394: Make it easier to read LZO-compressed files from EC2 clusters <https://issues.apache.org/jira/browse/SPARK-2394>

Nick

​


On Sun, Jul 13, 2014 at 10:49 AM, Ognen Duzlevski <ognen.duzlev...@gmail.com <mailto:ognen.duzlev...@gmail.com>> wrote:

    Hello,

    I have been trying to play with the Google ngram dataset provided
    by Amazon in form of LZO compressed files.

    I am having trouble understanding what is going on ;). I have
    added the compression jar and native library to the underlying
    Hadoop/HDFS installation, restarted the name node and the
    datanodes, Spark can obviously see the file but I get gibberish on
    a read. Any ideas?

    See output below:

    14/07/13 14:39:19 INFO SparkContext: Added JAR
    file:/home/ec2-user/hadoop/lib/hadoop-gpl-compression-0.1.0.jar at
    http://10.10.0.100:40100/jars/hadoop-gpl-compression-0.1.0.jar
    with timestamp 1405262359777
    14/07/13 14:39:20 INFO SparkILoop: Created spark context..
    Spark context available as sc.

    scala> val f = sc.textFile("hdfs://10.10.0.98:54310/data/1gram.lzo
    <http://10.10.0.98:54310/data/1gram.lzo>")
    14/07/13 14:39:34 INFO MemoryStore: ensureFreeSpace(163793) called
    with curMem=0, maxMem=311387750
    14/07/13 14:39:34 INFO MemoryStore: Block broadcast_0 stored as
    values to memory (estimated size 160.0 KB, free 296.8 MB)
    f: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
    <console>:12

    scala> f.take(10)
    14/07/13 14:39:43 INFO SparkContext: Job finished: take at
    <console>:15, took 0.419708348 s
    res0: Array[String] =
    
Array(SEQ?!org.apache.hadoop.io.LongWritable?org.apache.hadoop.io.Text??#com.hadoop.compression.lzo.LzoCodec????���\<N�#^�??d^�k�������\<N�#^�??d^�k��3��??�3???�??????
    
?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????�?????�?�m??��??hx??????????�??�???�??�??�??�??�??�?
    �?, �? �? �?, �??�??�??�??�??�??�??�??�??�??�??�??�??�??�? �? �?
    �? �?
    
�?!�?"�?#�?$�?%�?&�?'�?(�?)�?*�?+�?,�?-�?.�?/�?0�?1�?2�?3�?4�?5�?6�?7�?8�?9�?:�?;�?<�?=�?>�??�?@�?A�?B�?C�?D�?E�?F�?G�?H�?I�?J�?K�?L�?M�?N�?O�?P�?Q�?R�?S�?T�?U�?V�?W�?X�?Y�?Z�?[�?\�?]�?^�?_�?`�?a�?b�?c�?d�?e�?f�?g�?h�?i�?j�?k�?l�?m�?n�?o�?p�?q�?r�?s�?t�?u�?v�?w�?x�?y�?z�?{�?|�?}�?~�?
    �?��?��?��?��?��?��?��?��?��?��?��?��?��?��?��?��?��?...

    Thanks!
    Ognen



Reply via email to