As I understand it, it's down to how Hadoop FileInputFormats work, and
questions of mutability. If you were to read a file from Hadoop via an
InputFormat with a simple Java program, the InputFormat's RecordReader
creates a single, mutable instance of the Writable key class and a single,
mutable ins
I'm trying to solve a problem of the history server spamming my logs with
EOFExceptions when it tries to read a history file from HDFS that is both
lz4 compressed and incomplete. The actual exception is:
java.io.EOFException: Stream ended prematurely
at
net.jpountz.lz4.LZ4BlockInputStream.
So, I actually tried this, and it built without problems, but publishing the
artifacts to artifactory ended up with some strangeness in the child poms,
where the property wasn’t resolved. This leads to issues pulling them into
other projects of: “Could not find
org.apache.spark:spark-parent_2.1
I've got an interesting challenge in building Spark. For various reasons we
do a few different builds of spark, typically with a few different profile
options (e.g. against different versions of Hadoop, some with/without Hive
etc.). We mirror the spark repo internally and have a buildserver that
bu