Hello everybody, I am trying to read a LZO-compressed files with pyspark 1.4.1 on EC2.
I followed the steps from https://issues.apache.org/jira/browse/SPARK-2394 to read LZO-compressed files but even after these modifications I still have the following issue : files = sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram","com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/root/spark/python/pyspark/context.py", line 574, in newAPIHadoopFile jconf, batchSize) File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile. : java.lang.ClassNotFoundException: com.hadoop.mapreduce.LzoTextInputFormat v Could you please help me read a LZO file with pyspark ? Thank you for your help, Cheers, Bertrand -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LZO-compressed-files-tp24568.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org