Looks like the exception was caused by resolved.get(prefix ++ a) returning None : a => StructField(a.head, resolved.get(prefix ++ a).get, nullable = true)
There are three occurrences of resolved.get() in createSchema() - None should be better handled in these places. My two cents. On Wed, May 27, 2015 at 1:46 PM, Michael Stone <mst...@mathom.us> wrote: > On Wed, May 27, 2015 at 01:13:43PM -0700, Ted Yu wrote: > >> Can you tell us a bit more about (schema of) your JSON ? >> > > It's fairly simple, consisting of 22 fields with values that are mostly > strings or integers, except that some of the fields are objects > with http header/value pairs. I'd guess it's something in those latter > fields that is causing the problems. The data is 800M rows that I didn't > create in the first place and I'm in the process of making a simpler test > case. What I was mostly wondering is if there were an obvious mechanism > that I'm just missing to get jsonRDD to spit out more information about > which specific rows it's having problems with. > > You can find sample JSON >> in sql/core/src/test//scala/org/apache/spark/sql/json/ >> TestJsonData.scala >> > > I know the jsonRDD works in general, I've used it before without problems. > It even works on subsets of this data. > Mike Stone >