Hi, I am still new to Spark. Sorry if similar questions are asked here before. I am trying to read a Hive table; then run a query and save the result into a Hive partitioned Parquet table.
For example, I was able to run the following in Hive: INSERT INTO TABLE target_table PARTITION (partition_field) select field1, field2, partition_field FROM source_table DISTRIBUTE BY field1 SORT BY field2 But when I tried running it in spark-sql, it gave me the following error: java.lang.RuntimeException: Unsupported language features in query: INSERT INTO TABLE ... I also tried the following Java code and I saw the same error: SparkConf sparkConf = new SparkConf().setAppName("Example"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); JavaHiveContext hiveCtx = new JavaHiveContext(ctx); JavaSchemaRDD rdd = hiveCtx.sql("INSERT INTO TABLE target_table PARTITION (partition_field) select field1, field2, partition_field FROM source_table DISTRIBUTE BY field1 SORT BY field2"); ... rdd.count(); //Just for running the query If I take out "INSERT INTO TABLE target_table PARTITION (partition_field)" from the sql statement and run that in hiveCtx.sql(), I got a RDD but I only seem to do rdd.saveAsParquetFile(target_table_location). But that is not partitioned correctly. Any help is much appreciated. Thanks. Regards, BH