Hi, I hope I understood correctly. This is a simplified procedures.
Precondition - JSON file is written line by line. Each is each JSON document. - Root array is supported, eg. [{...}, {...} {...}] Procedures - Schema inference (If user schema is not given) 1. Read line JSON document. 2. Read token by token by Jackson parser (recursively if needed) 3. Find (or infer) ,from each value , types appropriate with Spark. (eg. DateType, StringType and DecimalType) 4. 3. returns a StructType for underlying a single JSON document. 5. 1.-4. becomes an RDD, RDD[DataType] 6. Aggregates each DataType (StructType normally) into a single StructType compatible across all the StructTypes. 7. Use the aggregated StructType as a schema. - Parse JSON data. 1. Read line JSON document. 2. Read token by token by Jackson parser (recursively if needed) 3. Convert each value to the given type (above). 4. 3. returns a Row for underlying a single JSON document. Thanks! 2016-04-20 11:07 GMT+09:00 resonance <marco_ro...@live.com>: > Hi, this is more of a theoretical question and I'm asking it here because I > have no idea where to find documentation for this stuff. > > I am currently working with Spark SQL and am considering using data > contained within JSON datasets. I am aware of the .jsonFile() method in > Spark SQL. > > What is the general strategy used by Spark SQL .jsonFile() to parse/decode > a > JSON dataset? > > (For example, an answer I might be looking for is that the JSON file is > read > into an ETL pipeline and transformed into a predefined data structure.) > > I am deeply appreciative of any help provided. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-does-jsonFile-work-tp26802.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >