How to write data into Hive partitioned Parquet table?

Banias H Tue, 14 Oct 2014 18:45:06 -0700

Hi,

I am still new to Spark. Sorry if similar questions are asked here before.
I am trying to read a Hive table; then run a query and save the result into
a Hive partitioned Parquet table.


For example, I was able to run the following in Hive:
INSERT INTO TABLE target_table PARTITION (partition_field) select field1,
field2, partition_field FROM source_table DISTRIBUTE BY field1 SORT BY
field2

But when I tried running it in spark-sql, it gave me the following error:

java.lang.RuntimeException:
Unsupported language features in query: INSERT INTO TABLE ...

I also tried the following Java code and I saw the same error:

SparkConf sparkConf = new SparkConf().setAppName("Example");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
JavaHiveContext hiveCtx = new JavaHiveContext(ctx);
JavaSchemaRDD rdd = hiveCtx.sql("INSERT INTO TABLE target_table PARTITION
(partition_field) select field1, field2, partition_field FROM source_table
DISTRIBUTE BY field1 SORT BY field2");
...
rdd.count(); //Just for running the query

If I take out "INSERT INTO TABLE target_table PARTITION (partition_field)"
from the sql statement and run that in hiveCtx.sql(), I got a RDD but I
only seem to do rdd.saveAsParquetFile(target_table_location). But that is
not partitioned correctly.

Any help is much appreciated. Thanks.

Regards,
BH

How to write data into Hive partitioned Parquet table?

Reply via email to