On 07/06/2014 05:19 AM, Nicholas Chammas wrote:
> On Fri, Jul 4, 2014 at 3:33 PM, Gurvinder Singh
> <[email protected] <mailto:[email protected]>> wrote:
>
> csv =
>
> sc.newAPIHadoopFile(opts.input,"com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text").count()
>
> Does anyone know what the rough equivalent of this would be in the Scala
> API?
>
I am not sure, I haven't tested it using scala.
com.hadoop.mapreduce.LzoTextInputFormat class is from this package
https://github.com/twitter/hadoop-lzo
I have installed it from clourdera "hadoop-lzo" package with liblzo2-2
debian package on all of my workers. Make sure you have hadoop-lzo.jar
in your class path for spark.
- Gurvinder
> I am trying the following, but the first import yields an error on my
> |spark-ec2| cluster:
>
> |import com.hadoop.mapreduce.LzoTextInputFormat
> import org.apache.hadoop.io.LongWritable
> import org.apache.hadoop.io.Text
>
> sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram/data",
> LzoTextInputFormat, LongWritable, Text)
> |
>
> |scala> import com.hadoop.mapreduce.LzoTextInputFormat
> <console>:12: error: object hadoop is not a member of package com
> import com.hadoop.mapreduce.LzoTextInputFormat
> |
>
> Nick
>
>