Hi,

I have done this Spark-shell and Hive itself so it works.

I am exploring whether I can do it programmatically. The problem I
encounter was that I tried to register the DF as temporary table. The
problem is that trying to insert from temporary table into Hive table, II
was getting the following error

sqltext = "INSERT INTO TABLE t3 SELECT * FROM tmp"

sqlContext.sql(sqltext)

Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

When I switched to HiveContext, it could not see the temporary table

Do decided to save the Spark table as follows:

val a = df.filter(col("Total") > "").map(x =>
(x.getString(0),x.getString(1), x.getString(2).substring(1).replace(",",
"").toDouble, x.getString(3).substring(1).replace(",", "").toDouble,
x.getString(4).substring(1).replace(",", "").toDouble))

--delete the file in hdfs if already exists
val hadoopConf = new org.apache.hadoop.conf.Configuration()
val hdfs = org.apache.hadoop.fs.FileSystem.get(new
java.net.URI("hdfs://rhes564:9000"), hadoopConf)
val output = "hdfs://rhes564:9000/user/hduser/t3_parquet"
try { hdfs.delete(new org.apache.hadoop.fs.Path(output), true) } catch {
case _ : Throwable => { } }

-- save it as Parquet file
a.toDF.saveAsParquetFile(output)

-- Hive table t3 is created as a simple textfile. ORC did not work!

HiveContext.sql("LOAD DATA INPATH '/user/hduser/t3_parquet' into table t3")

OK that works but very cumbersome.

I checked the web but there are conflicting attempts to solve this issue.

Please note that this can be done easily with spark-shell as it is built in
HiveContext.

Thanks



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to