Hello everybody,

I followed the steps from https://issues.apache.org/jira/browse/SPARK-2394
to read LZO-compressed files, but now I cannot even open a file with :

lines =
sc.textFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram")


>>> lines.first()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/spark/python/pyspark/rdd.py", line 1295, in first
    rs = self.take(1)
  File "/root/spark/python/pyspark/rdd.py", line 1247, in take
    totalParts = self.getNumPartitions()
  File "/root/spark/python/pyspark/rdd.py", line 355, in getNumPartitions
    return self._jrdd.partitions().size()
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o23.partitions.
: java.lang.RuntimeException: Error in configuring object




lines =
sc.sequenceFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/spark/python/pyspark/context.py", line 544, in sequenceFile
    keyConverter, valueConverter, minSplits, batchSize)
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.sequenceFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
(TID 3, 172.31.12.23): java.lang.IllegalArgumentException: Unknown codec:
com.hadoop.compression.lzo.LzoCodec





Thanks for your help,

Cheers,

Bertrand




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Question-about-Google-Books-Ngrams-with-pyspark-1-4-1-tp24542p24546.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to