I have very very large json and I want to save by avoiding Spark to make scan
over data to infer the schema. Instead since I already know the data, I
would prefer to provide the schema myself with
sqlContext.read().schema(mySchema).json(jsonFilePath)
however the problem is the json data format is kind of weird
[
{
"apiTypeName": "someApi",
"allFieldsAndValues": {
"Field_1": "Value",
"Field_2": "Value",
"Field_3": 779.0,
"Field_4": "Value",
"Field_5": true
}
},
{
"apiTypeName": "someApi",
"allFieldsAndValues": {
"Field_1": "Value",
"Field_2": "Value",
"Field_3": 779.0,
"Field_4": "Value",
"Field_5": true }
}
]
I can't seem to construct a schema for this kind of data that Spark could
use to avoid inferring schema on its own. Every which I have tried to
create schema from StructType, StructField or Array combinations to build
the schema, spark wouldn't pick it up as i intend it to
Any help is appreciated
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-reading-json-with-pre-defined-schema-tp25353.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]