Hi durin,

I just tried this example (nice data, by the way!), *with each JSON
object on one line*, and it worked fine:

scala> rdd.printSchema()
root
 |-- entities: org.apache.spark.sql.catalyst.types.StructType$@13b6cdef
 |    |-- friends:
ArrayType[org.apache.spark.sql.catalyst.types.StructType$@13b6cdef]
 |    |    |-- id: IntegerType
 |    |    |-- indices: ArrayType[IntegerType]
 |    |    |-- name: StringType
 |    |-- weapons: ArrayType[StringType]
 |-- field1: StringType
 |-- id: IntegerType
 |-- lang: StringType
 |-- place: StringType
 |-- read: BooleanType
 |-- user: org.apache.spark.sql.catalyst.types.StructType$@13b6cdef
 |    |-- id: IntegerType
 |    |-- name: StringType
 |    |-- num_heads: IntegerType

On Wed, Jun 25, 2014 at 10:57 AM, durin <[email protected]> wrote:
> I'm using Spark 1.0.0-SNAPSHOT (downloaded and compiled on 2014/06/23).
> I'm trying to execute the following code:
>
>     import org.apache.spark.SparkContext._
>     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>     val table =
> sqlContext.jsonFile("hdfs://host:9100/user/myuser/data.json")
>     table.printSchema()
>
> data.json looks like this (3 shortened lines shown here):
>
> {"field1":"content","id":12312213,"read":false,"user":{"id":121212,"name":"E.
> Stark","num_heads":0},"place":"Winterfell","entities":{"weapons":[],"friends":[{"name":"R.
> Baratheon","id":23234,"indices":[0,16]}]},"lang":"en"}
> {"field1":"content","id":56756765,"read":false,"user":{"id":121212,"name":"E.
> Stark","num_heads":0},"place":"Winterfell","entities":{"weapons":[],"friends":[{"name":"R.
> Baratheon","id":23234,"indices":[0,16]}]},"lang":"en"}
> {"field1":"content","id":56765765,"read":false,"user":{"id":121212,"name":"E.
> Stark","num_heads":0},"place":"Winterfell","entities":{"weapons":[],"friends":[{"name":"R.
> Baratheon","id":23234,"indices":[0,16]}]},"lang":"en"}
>
> The JSON-Object in each line is valid according to the JSON-Validator I use,
> and as jsonFile is defined as
>
>     def jsonFile(path: String): SchemaRDD
>     Loads a JSON file (one object per line), returning the result as a
> SchemaRDD.
>
> I would assume this should work. However, executing this code return this
> error:
>
> 14/06/25 10:05:09 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:11)
> 14/06/25 10:05:09 WARN scheduler.TaskSetManager: Loss was due to
> com.fasterxml.jackson.databind.JsonMappingException
> com.fasterxml.jackson.databind.JsonMappingException: No content to map due
> to end-of-input
>  at [Source: java.io.StringReader@238df2e4; line: 1, column: 1]
>         at
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
>         ...
>
>
> Does anyone know where the problem lies?
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/jsonFile-function-in-SQLContext-does-not-work-tp8273.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to