If you already know the schema, then you can run the read with the schema
parameter like this:
val path = "examples/src/main/resources/jsonfile"
val jsonSchema = StructType(
StructField("id",StringType,true) ::
StructField("reference",LongType,true) ::
StructField("details",detailsSchema, true) ::
StructField("value",StringType,true) ::Nil)
val people = sqlContext.read.schema(jsonSchema).json(path)
If you have the schema defined as a separate small JSON file, then you can load
it by running something like this line to load it directly:
val jsonSchema = sqlContext.read.json(“path/to/schema”).schema
Thanks,
Ewan
From: Gavin Yue [mailto:[email protected]]
Sent: 06 January 2016 07:14
To: user <[email protected]>
Subject: How to accelerate reading json file?
I am trying to read json files following the example:
val path = "examples/src/main/resources/jsonfile"
val people = sqlContext.read.json(path)
I have 1 Tb size files in the path. It took 1.2 hours to finish the reading to
infer the schema.
But I already know the schema. Could I make this process short?
Thanks a lot.