Hi durin, I just tried this example (nice data, by the way!), *with each JSON object on one line*, and it worked fine:
scala> rdd.printSchema() root |-- entities: org.apache.spark.sql.catalyst.types.StructType$@13b6cdef | |-- friends: ArrayType[org.apache.spark.sql.catalyst.types.StructType$@13b6cdef] | | |-- id: IntegerType | | |-- indices: ArrayType[IntegerType] | | |-- name: StringType | |-- weapons: ArrayType[StringType] |-- field1: StringType |-- id: IntegerType |-- lang: StringType |-- place: StringType |-- read: BooleanType |-- user: org.apache.spark.sql.catalyst.types.StructType$@13b6cdef | |-- id: IntegerType | |-- name: StringType | |-- num_heads: IntegerType On Wed, Jun 25, 2014 at 10:57 AM, durin <[email protected]> wrote: > I'm using Spark 1.0.0-SNAPSHOT (downloaded and compiled on 2014/06/23). > I'm trying to execute the following code: > > import org.apache.spark.SparkContext._ > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > val table = > sqlContext.jsonFile("hdfs://host:9100/user/myuser/data.json") > table.printSchema() > > data.json looks like this (3 shortened lines shown here): > > {"field1":"content","id":12312213,"read":false,"user":{"id":121212,"name":"E. > Stark","num_heads":0},"place":"Winterfell","entities":{"weapons":[],"friends":[{"name":"R. > Baratheon","id":23234,"indices":[0,16]}]},"lang":"en"} > {"field1":"content","id":56756765,"read":false,"user":{"id":121212,"name":"E. > Stark","num_heads":0},"place":"Winterfell","entities":{"weapons":[],"friends":[{"name":"R. > Baratheon","id":23234,"indices":[0,16]}]},"lang":"en"} > {"field1":"content","id":56765765,"read":false,"user":{"id":121212,"name":"E. > Stark","num_heads":0},"place":"Winterfell","entities":{"weapons":[],"friends":[{"name":"R. > Baratheon","id":23234,"indices":[0,16]}]},"lang":"en"} > > The JSON-Object in each line is valid according to the JSON-Validator I use, > and as jsonFile is defined as > > def jsonFile(path: String): SchemaRDD > Loads a JSON file (one object per line), returning the result as a > SchemaRDD. > > I would assume this should work. However, executing this code return this > error: > > 14/06/25 10:05:09 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:11) > 14/06/25 10:05:09 WARN scheduler.TaskSetManager: Loss was due to > com.fasterxml.jackson.databind.JsonMappingException > com.fasterxml.jackson.databind.JsonMappingException: No content to map due > to end-of-input > at [Source: java.io.StringReader@238df2e4; line: 1, column: 1] > at > com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164) > ... > > > Does anyone know where the problem lies? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/jsonFile-function-in-SQLContext-does-not-work-tp8273.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
