Hello, There seems to be very little documentation on the usage of newAPIHadoopFile and even less of it in conjunction with opening LZO compressed files. I've hit a wall with some unexpected behavior that I don't know how to interpret.
This is a test program I'm running in an effort to get this working, after finding previous threads on this subject. The job runs on a yarn cluster and input is the path of a very much non-empty LZO file sitting in hdfs, which I can manually decompress and read as a textfile, with a count of ~3 million. What I don't know how to interpret is that the above code runs without complaints and prints 0. I would appreciate some guidance with where to go; there are no error messages to point me anywhere, just an empty RDD. Thanks, Kevin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Empty-RDD-after-LzoTextInputFormat-in-newAPIHadoopFile-tp10873.html Sent from the Apache Spark User List mailing list archive at Nabble.com.