This helps immensely. Thanks Michael! On Fri, Jul 17, 2015 at 4:33 PM, Michael Armbrust <[email protected]> wrote:
> I'll add there is a JIRA to override the default past some threshold of # > of unique keys: https://issues.apache.org/jira/browse/SPARK-4476 > <https://issues.apache.org/jira/browse/SPARK-4476> > > On Fri, Jul 17, 2015 at 1:32 PM, Michael Armbrust <[email protected]> > wrote: > >> The difference between a map and a struct here is that in a struct all >> possible keys are defined as part of the schema and can each can have a >> different type (and we don't support union types). JSON doesn't have >> differentiated data structures so we go with the one that gives you more >> information when doing inference by default. If you pass in a schema to >> JSON however, you can override this and have a JSON object parsed as a map. >> >> On Fri, Jul 17, 2015 at 11:02 AM, Corey Nolet <[email protected]> wrote: >> >>> I notice JSON objects are all parsed as Map[String,Any] in Jackson but >>> for some reason, the "inferSchema" tools in Spark SQL extracts the schema >>> of nested JSON objects as StructTypes. >>> >>> This makes it really confusing when trying to rectify the object >>> hierarchy when I have maps because the Catalyst conversion layer underneath >>> is expecting a Row or Product and not a Map. >>> >>> Why wasn't MapType used here? Is there any significant difference >>> between the two of these types that would cause me not to use a MapType >>> when I'm constructing my own schema representing a set of nested >>> Map[String,_]'s? >>> >>> >>> >>> >> >
