LZO-compressed files

2015-09-03 Thread Bertrand
Hello everybody, I am trying to read a LZO-compressed files with pyspark 1.4.1 on EC2. I followed the steps from https://issues.apache.org/jira/browse/SPARK-2394 to read LZO-compressed files but even after these modifications I still have the following issue : files = sc.newAPIHadoopFile(&quo

Re: Problem reading in LZO compressed files

2014-07-14 Thread Ognen Duzlevski
op.mapreduce.LzoTextInputFormat], classOf[org.apache.hadoop.io.LongWritable], classOf[org.apache.hadoop.io.Text]) | On a side note, here’s a related JIRA issue: SPARK-2394: Make it easier to read LZO-compressed files from EC2 clusters <https://issues.apache.org/jira/browse/SPARK-2394

Re: Problem reading in LZO compressed files

2014-07-13 Thread Nicholas Chammas
3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data", > classOf[com.hadoop.mapreduce.LzoTextInputFormat], > classOf[org.apache.hadoop.io.LongWritable], > classOf[org.apache.hadoop.io.Text]) > > On a side note, here’s a related JIRA issue: SPAR

Re: Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
.hadoop.io.LongWritable], classOf[org.apache.hadoop.io.Text]) | On a side note, here’s a related JIRA issue: SPARK-2394: Make it easier to read LZO-compressed files from EC2 clusters <https://issues.apache.org/jira/browse/SPARK-2394> Nick ​ On Sun, Jul 13, 2014 at 10:49 AM, Ognen Duzlevski

Re: Problem reading in LZO compressed files

2014-07-13 Thread Nicholas Chammas
/data", classOf[com.hadoop.mapreduce.LzoTextInputFormat], classOf[org.apache.hadoop.io.LongWritable], classOf[org.apache.hadoop.io.Text]) On a side note, here’s a related JIRA issue: SPARK-2394: Make it easier to read LZO-compressed files from EC2 clusters <https://issues.apache.org/jira

Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
Hello, I have been trying to play with the Google ngram dataset provided by Amazon in form of LZO compressed files. I am having trouble understanding what is going on ;). I have added the compression jar and native library to the underlying Hadoop/HDFS installation, restarted the name node