I’m trying to create a view on a nested JSON file (converted to a dict) using PySpark 1.4.1. The SQL looks like this:
create view myView asselect myColA, myStruct.ColB, myStruct.nestedColCfrom myTblwhere myColD = "some value"; The select statement by itself runs fine, but when I try to create the view I get the following error: SQL Error [500051] [HY000]: [Simba][HiveJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, errorCode:0, errorMessage:org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException [Error 10004]: Line 2:7 Invalid table alias or column reference 'myColA': (possible column names are: col)), ... I get a similar error using the Hive Context from PySpark: An error occurred while calling o111.sql. : org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException [Error 10004]: Line 1:133 Invalid table alias or column reference 'myColD': (possible column names are: col) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:349) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155) at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326) at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473) at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) What am I doing wrong, or is this perhaps a bug? Thanks, Dan