Thank you so much for the reply, here is my code.
1. val conf = new SparkConf().setAppName("Simple Application")
2. conf.setMaster("local")
3. val sc = new SparkContext(conf)
4. val sqlContext = new org.apache.spark.sql.SQLContext(sc)
5. import sqlContext.createSchemaRDD
6. val path1 = "./data/people.json"
7. val people = sqlContext.jsonFile(path1)
8. people.registerAsTable("people")
9. var sql="SELECT name FROM people WHERE schools.time>2"
10. val result = sqlContext.sql(sql)
11. result.collect().foreach(println)
the content of people.json is:
{"name":"Michael",
"schools":[{"name":"ABC","time":1994},{"name":"EFG","time":2000}]}
{"name":"Andy", "age":30,"scores":{"eng":98,"phy":89}}
{"name":"Justin", "age":19}
What I have tried is:
*1. use HiveSQL:*
I have tried to replace:
line 4 with
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
line 10 with
val result = sqlContext.hql(sql)
(i have recomplie the spark jar with hive support), but seems got the same
error.
*2. use []. for the access:*
I have tried to replace:
line 9 with:
var sql="SELECT name FROM people WHERE schools[0].time>2", but got the
error:
14/07/15 14:37:49 INFO SparkContext: Job finished: reduce at
JsonRDD.scala:40, took 0.98412 s
Exception in thread "main" java.lang.RuntimeException: [1.41] failure:
``UNION'' expected but identifier .time found
SELECT name FROM people WHERE schools[0].time>2
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:185)
at SimpleApp$.main(SimpleApp.scala:32)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
seems not supported.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Query-the-nested-JSON-data-With-Spark-SQL-1-0-1-tp9544p9731.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.