You can do this: $ sbt/sbt hive/console
scala> jsonRDD(sparkContext.parallelize("""{ "name":"John", "age":53, "locations": [{ "street":"Rodeo Dr", "number":2300 }]}""" :: Nil)).registerTempTable("people") scala> sql("SELECT name FROM people LATERAL VIEW explode(locations) l AS location WHERE location.number = 2300").collect() res0: Array[org.apache.spark.sql.Row] = Array([John]) This will double show people who have more than one matching address. On Tue, Oct 28, 2014 at 5:52 PM, Corey Nolet <cjno...@gmail.com> wrote: > So it wouldn't be possible to have a json string like this: > > { "name":"John", "age":53, "locations": [{ "street":"Rodeo Dr", > "number":2300 }]} > > And query all people who have a location with number = 2300? > > > > > On Tue, Oct 28, 2014 at 5:30 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet <cjno...@gmail.com> wrote: >> >>> Is it possible to select if, say, there was an addresses field that had >>> a json array? >>> >> You can get the Nth item by "address".getItem(0). If you want to walk >> through the whole array look at LATERAL VIEW EXPLODE in HiveQL >> >> > >