from:"lminer"

Run spark with hadoop snapshot

2016-11-18 Thread lminer

I'm trying to figure out how to run spark with a snapshot of Hadoop 2.8 that I built myself. I'm unclear on the configuration needed to get spark to work with the snapshot. I'm running spark on mesos. Per the spark documentation, I run spark-submit as follows using the `spark-2.0.2-bin-without-had

SAXParseException while writing to parquet on s3

2016-11-04 Thread lminer

I'm trying to read in some json, infer a schema, and write it out again as parquet to s3 (s3a). For some reason, about a third of the way through the writing portion of the run, spark always errors out with the error included below. I can't find any obvious reasons for the issue: it isn't out of me