Ah, indeed it looks like I need to install this separately
<https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ?redir=1>
as it is not part of the core.

Nick



On Sun, Jul 6, 2014 at 2:22 AM, Gurvinder Singh <gurvinder.si...@uninett.no>
wrote:

> On 07/06/2014 05:19 AM, Nicholas Chammas wrote:
> > On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh
> > <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> wrote:
> >
> >     csv =
> >
> sc.newAPIHadoopFile(opts.input,"com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text").count()
> >
> > Does anyone know what the rough equivalent of this would be in the Scala
> > API?
> >
> I am not sure, I haven't tested it using scala.
> com.hadoop.mapreduce.LzoTextInputFormat class is from this package
> https://github.com/twitter/hadoop-lzo
>
> I have installed it from clourdera "hadoop-lzo" package with liblzo2-2
> debian package on all of my workers. Make sure you have hadoop-lzo.jar
> in your class path for spark.
>
> - Gurvinder
>
> > I am trying the following, but the first import yields an error on my
> > |spark-ec2| cluster:
> >
> > |import com.hadoop.mapreduce.LzoTextInputFormat
> > import org.apache.hadoop.io.LongWritable
> > import org.apache.hadoop.io.Text
> >
> >
> sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data",
> LzoTextInputFormat, LongWritable, Text)
> > |
> >
> > |scala> import com.hadoop.mapreduce.LzoTextInputFormat
> > <console>:12: error: object hadoop is not a member of package com
> >        import com.hadoop.mapreduce.LzoTextInputFormat
> > |
> >
> > Nick
> >
> > ​
>
>
>

Reply via email to