Hi, I have done this Spark-shell and Hive itself so it works.
I am exploring whether I can do it programmatically. The problem I encounter was that I tried to register the DF as temporary table. The problem is that trying to insert from temporary table into Hive table, II was getting the following error sqltext = "INSERT INTO TABLE t3 SELECT * FROM tmp" sqlContext.sql(sqltext) Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead. When I switched to HiveContext, it could not see the temporary table Do decided to save the Spark table as follows: val a = df.filter(col("Total") > "").map(x => (x.getString(0),x.getString(1), x.getString(2).substring(1).replace(",", "").toDouble, x.getString(3).substring(1).replace(",", "").toDouble, x.getString(4).substring(1).replace(",", "").toDouble)) --delete the file in hdfs if already exists val hadoopConf = new org.apache.hadoop.conf.Configuration() val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://rhes564:9000"), hadoopConf) val output = "hdfs://rhes564:9000/user/hduser/t3_parquet" try { hdfs.delete(new org.apache.hadoop.fs.Path(output), true) } catch { case _ : Throwable => { } } -- save it as Parquet file a.toDF.saveAsParquetFile(output) -- Hive table t3 is created as a simple textfile. ORC did not work! HiveContext.sql("LOAD DATA INPATH '/user/hduser/t3_parquet' into table t3") OK that works but very cumbersome. I checked the web but there are conflicting attempts to solve this issue. Please note that this can be done easily with spark-shell as it is built in HiveContext. Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com