Hi Spark Devs, We're processing a large number of Avro files with Spark and found that the Avro reader is missing the ability to handle malformed or truncated files like the JSON reader. Currently the Avro reader throws exceptions when it encounters any bad or truncated record in an Avro file, causing the entire Spark job to fail from a single dodgy file.
Ideally the AvroFileFormat would accept a Permissive or DropMalformed ParseMode like Spark's JSON format. This would enable the the Avro reader to drop bad records and continue processing the good records rather than abort the entire job. I've searched through Jira and haven’t found any related issues, but it’s a relatively straight-forward change that brings consistency across the readers. Obviously the default could remain as FailFastMode, which is the current effective behavior, so this wouldn’t break any existing users. Is there any reason why this behavior doesn't exist or obvious workaround that I missed? If not, are there any further details needed to consider adding this capability to Spark's Avro reader? I’m happy to propose a solution and contribute this update if somebody isn't already working on it. Thanks, Tim -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org