Ah, indeed it looks like I need to install this separately <https://code.google.com/a/apache-extras.org/p/hadoop-gpl-compression/wiki/FAQ?redir=1> as it is not part of the core.
Nick On Sun, Jul 6, 2014 at 2:22 AM, Gurvinder Singh <gurvinder.si...@uninett.no> wrote: > On 07/06/2014 05:19 AM, Nicholas Chammas wrote: > > On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh > > <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> wrote: > > > > csv = > > > sc.newAPIHadoopFile(opts.input,"com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text").count() > > > > Does anyone know what the rough equivalent of this would be in the Scala > > API? > > > I am not sure, I haven't tested it using scala. > com.hadoop.mapreduce.LzoTextInputFormat class is from this package > https://github.com/twitter/hadoop-lzo > > I have installed it from clourdera "hadoop-lzo" package with liblzo2-2 > debian package on all of my workers. Make sure you have hadoop-lzo.jar > in your class path for spark. > > - Gurvinder > > > I am trying the following, but the first import yields an error on my > > |spark-ec2| cluster: > > > > |import com.hadoop.mapreduce.LzoTextInputFormat > > import org.apache.hadoop.io.LongWritable > > import org.apache.hadoop.io.Text > > > > > sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data", > LzoTextInputFormat, LongWritable, Text) > > | > > > > |scala> import com.hadoop.mapreduce.LzoTextInputFormat > > <console>:12: error: object hadoop is not a member of package com > > import com.hadoop.mapreduce.LzoTextInputFormat > > | > > > > Nick > > > > > > >