Hi,
Thanks for your guidelines. I will try it out.
Btw how do you know HiveContext.sql (and also
DataFrame.registerTempTable) is only expected to be invoked on driver
side? Where can I find document?
BR,
Patcharee
On 07. juni 2015 16:40, Cheng Lian wrote:
Spark SQL supports Hive dynamic partitioning, so one possible
workaround is to create a Hive table partitioned by zone, z, year, and
month dynamically, and then insert the whole dataset into it directly.
In 1.4, we also provides dynamic partitioning support for non-Hive
environment, and you can do something like this:
df.write.partitionBy("zone", "z", "year",
"month").format("parquet").mode("overwrite").saveAsTable("tbl")
Cheng
On 6/7/15 9:48 PM, patcharee wrote:
Hi,
How can I expect to work on HiveContext on the executor? If only the
driver can see HiveContext, does it mean I have to collect all
datasets (very large) to the driver and use HiveContext there? It
will be memory overload on the driver and fail.
BR,
Patcharee
On 07. juni 2015 11:51, Cheng Lian wrote:
Hi,
This is expected behavior. HiveContext.sql (and also
DataFrame.registerTempTable) is only expected to be invoked on
driver side. However, the closure passed to RDD.foreach is executed
on executor side, where no viable HiveContext instance exists.
Cheng
On 6/7/15 10:06 AM, patcharee wrote:
Hi,
I try to insert data into a partitioned hive table. The groupByKey
is to combine dataset into a partition of the hive table. After the
groupByKey, I converted the iterable[X] to DB by X.toList.toDF().
But the hiveContext.sql throws NullPointerException, see below.
Any suggestions? What could be wrong? Thanks!
val varWHeightFlatRDD =
varWHeightRDD.flatMap(FlatMapUtilClass().flatKeyFromWrf).groupByKey()
.foreach(
x => {
val zone = x._1._1
val z = x._1._2
val year = x._1._3
val month = x._1._4
val df_table_4dim = x._2.toList.toDF()
df_table_4dim.registerTempTable("table_4Dim")
hiveContext.sql("INSERT OVERWRITE table 4dim partition
(zone=" + zone + ",z=" + z + ",year=" + year + ",month=" + month +
") " +
"select date, hh, x, y, height, u, v, w, ph, phb, t, p,
pb, qvapor, qgraup, qnice, qnrain, tke_pbl, el_pbl from table_4Dim");
})
java.lang.NullPointerException
at
org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:100)
at
no.uni.computing.etl.LoadWrfIntoHiveOptReduce1$$anonfun$7.apply(LoadWrfIntoHiveOptReduce1.scala:113)
at
no.uni.computing.etl.LoadWrfIntoHiveOptReduce1$$anonfun$7.apply(LoadWrfIntoHiveOptReduce1.scala:103)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at
org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)
at
org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:798)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1511)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1511)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org