Hello everybody,
I am trying to read a LZO-compressed files with pyspark 1.4.1 on EC2.
I followed the steps from https://issues.apache.org/jira/browse/SPARK-2394
to read LZO-compressed files but even after these modifications I still have
the following issue :
files =
sc.newAPIHadoopFile(&quo
op.mapreduce.LzoTextInputFormat],
classOf[org.apache.hadoop.io.LongWritable],
classOf[org.apache.hadoop.io.Text])
|
On a side note, here’s a related JIRA issue: SPARK-2394: Make it
easier to read LZO-compressed files from EC2 clusters
<https://issues.apache.org/jira/browse/SPARK-2394
3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data",
> classOf[com.hadoop.mapreduce.LzoTextInputFormat],
> classOf[org.apache.hadoop.io.LongWritable],
> classOf[org.apache.hadoop.io.Text])
>
> On a side note, here’s a related JIRA issue: SPAR
.hadoop.io.LongWritable],
classOf[org.apache.hadoop.io.Text])
|
On a side note, here’s a related JIRA issue: SPARK-2394: Make it
easier to read LZO-compressed files from EC2 clusters
<https://issues.apache.org/jira/browse/SPARK-2394>
Nick
On Sun, Jul 13, 2014 at 10:49 AM, Ognen Duzlevski
/data",
classOf[com.hadoop.mapreduce.LzoTextInputFormat],
classOf[org.apache.hadoop.io.LongWritable],
classOf[org.apache.hadoop.io.Text])
On a side note, here’s a related JIRA issue: SPARK-2394: Make it easier to
read LZO-compressed files from EC2 clusters
<https://issues.apache.org/jira
Hello,
I have been trying to play with the Google ngram dataset provided by
Amazon in form of LZO compressed files.
I am having trouble understanding what is going on ;). I have added the
compression jar and native library to the underlying Hadoop/HDFS
installation, restarted the name node