use df.selectExpr to evaluate complex expression (instead of just column names).
On Thu, May 5, 2016 at 11:53 AM, Xinh Huynh <xinh.hu...@gmail.com> wrote: > Hi, > > I am having trouble accessing an array element in JSON data with a > dataframe. Here is the schema: > > val json1 = """{"f1":"1", "f1a":[{"f2":"2"}] } }""" > val rdd1 = sc.parallelize(List(json1)) > val df1 = sqlContext.read.json(rdd1) > df1.printSchema() > > root |-- f1: string (nullable = true) |-- f1a: array (nullable = true) | > |-- element: struct (containsNull = true) | | |-- f2: string (nullable = > true) > > I would expect to be able to select the first element of "f1a" this way: > df1.select("f1a[0]").show() > > org.apache.spark.sql.AnalysisException: cannot resolve 'f1a[0]' given > input columns f1, f1a; > > This is with Spark 1.6.0. > > Please help. A follow-up question is: can I access arbitrary levels of > nested JSON array of struct of array of struct? > > Thanks, > Xinh >