Re: Problem reading in LZO compressed files

2014-07-14 Thread Ognen Duzlevski
Nicholas, thanks nevertheless! I am going to spend some time to try and figure this out and report back :-) Ognen On 7/13/14, 7:05 PM, Nicholas Chammas wrote: I actually never got this to work, which is part of the reason why I filed that JIRA. Apart from using |--jar| when starting the shell

Re: Problem reading in LZO compressed files

2014-07-13 Thread Nicholas Chammas
I actually never got this to work, which is part of the reason why I filed that JIRA. Apart from using --jar when starting the shell, I don’t have any more pointers for you. :( ​ On Sun, Jul 13, 2014 at 12:57 PM, Ognen Duzlevski wrote: > Nicholas, > > Thanks! > > How do I make spark assemble a

Re: Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
Nicholas, Thanks! How do I make spark assemble against a local version of Hadoop? I have 2.4.1 running on a test cluster and I did "SPARK_HADOOP_VERSION=2.4.1 sbt/sbt assembly" but all it did was pull in hadoop-2.4.1 dependencies via sbt (which is sufficient for using a 2.4.1 HDFS). I am gue

Re: Problem reading in LZO compressed files

2014-07-13 Thread Nicholas Chammas
If you’re still seeing gibberish, it’s because Spark is not using the LZO libraries properly. In your case, I believe you should be calling newAPIHadoopFile() instead of textFile(). For example: sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data", c

Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
Hello, I have been trying to play with the Google ngram dataset provided by Amazon in form of LZO compressed files. I am having trouble understanding what is going on ;). I have added the compression jar and native library to the underlying Hadoop/HDFS installation, restarted the name node a