Re: Re: HiveContext setConf seems not stable

guoqing0...@yahoo.com.hk Thu, 23 Apr 2015 03:57:02 -0700

Hi all , 
My understanding for this problem is SQLConf will be overwrite by the 
hiveconfig in initialization phase when setConf(key: String, value: String)  
being called in the first time as below code snippets , so it is correctly in 
later. I`m not sure whether it is right , any point are welcome. Thanks.
@transient protected[hive] lazy val hiveconf: HiveConf = {
  setConf(sessionState.getConf.getAllProperties)
  sessionState.getConf
}
protected def runHive(cmd: String, maxRows: Int = 1000): Seq[String] = 
synchronized {
  try {
    val cmd_trimmed: String = cmd.trim()
    val tokens: Array[String] = cmd_trimmed.split("\\s+")
    val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
    val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), 
hiveconf)    .    .    .}protected[sql] def runSqlHive(sql: String): 
Seq[String] = {
  val maxResults = 100000
  val results = runHive(sql, maxResults)
  // It is very confusing when you only get back some of the results...
  if (results.size == maxResults) sys.error("RESULTS POSSIBLY TRUNCATED")
  results
}override def setConf(key: String, value: String): Unit = {
  super.setConf(key, value)
  runSqlHive(s"SET $key=$value")


}
 
From: madhu phatak
Date: 2015-04-23 02:17
To: Michael Armbrust
CC: Ophir Cohen; Hao Ren; user
Subject: Re: HiveContext setConf seems not stable
Hi,
calling getConf don't solve the issue. Even many hive specific queries are 
broken. Seems like no hive configurations are getting passed properly. 




Regards,
Madhukara Phatak
http://datamantra.io/

On Wed, Apr 22, 2015 at 2:19 AM, Michael Armbrust <mich...@databricks.com> 
wrote:
As a workaround, can you call getConf first before any setConf?

On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen <oph...@gmail.com> wrote:
I think I encounter the same problem, I'm trying to turn on the compression of 
Hive.
I have the following lines:
def initHiveContext(sc: SparkContext): HiveContext = {
    val hc: HiveContext = new HiveContext(sc)
    hc.setConf("hive.exec.compress.output", "true")
    hc.setConf("mapreduce.output.fileoutputformat.compress.codec", 
"org.apache.hadoop.io.compress.SnappyCodec")
    hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK")


    logger.info(hc.getConf("hive.exec.compress.output"))
    logger.info(hc.getConf("mapreduce.output.fileoutputformat.compress.codec"))
    logger.info(hc.getConf("mapreduce.output.fileoutputformat.compress.type"))

    hc
  }
And the log for calling it twice:
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: 
org.apache.hadoop.io.compress.SnappyCodec
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: 
org.apache.hadoop.io.compress.SnappyCodec
15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK

BTW
It worked on 1.2.1...


On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <inv...@gmail.com> wrote:
Hi,

Jira created: https://issues.apache.org/jira/browse/SPARK-6675

Thank you.


On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <mich...@databricks.com> wrote:
Can you open a JIRA please?

On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <inv...@gmail.com> wrote:
Hi,

I find HiveContext.setConf does not work correctly. Here are some code snippets 
showing the problem:

snippet 1:
----------------------------------------------------------------------------------------------------------------
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

object Main extends App {

  val conf = new SparkConf()
    .setAppName("context-test")
    .setMaster("local[8]")
  val sc = new SparkContext(conf)
  val hc = new HiveContext(sc)

  hc.setConf("spark.sql.shuffle.partitions", "10")
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
  hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach println
}
----------------------------------------------------------------------------------------------------------------

Results:
(hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)
(spark.sql.shuffle.partitions,10)

snippet 2:
----------------------------------------------------------------------------------------------------------------
...
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.setConf("spark.sql.shuffle.partitions", "10")
  hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
  hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach println
...
----------------------------------------------------------------------------------------------------------------

Results:
(hive.metastore.warehouse.dir,/user/hive/warehouse)
(spark.sql.shuffle.partitions,10)

You can see that I just permuted the two setConf call, then that leads to two 
different Hive configuration.
It seems that HiveContext can not set a new value on 
"hive.metastore.warehouse.dir" key in one or the first "setConf" call.
You need another "setConf" call before changing "hive.metastore.warehouse.dir". 
For example, set "hive.metastore.warehouse.dir" twice and the snippet 1

snippet 3:
----------------------------------------------------------------------------------------------------------------
...
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test")
  hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println
...
----------------------------------------------------------------------------------------------------------------

Results:
(hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test)


You can reproduce this if you move to the latest branch-1.3 (1.3.1-snapshot, 
htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)

I have also tested the released 1.3.0 (htag = 
4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.

Please tell me if I am missing something. Any help is highly appreciated.

Hao

-- 
Hao Ren

{Data, Software} Engineer @ ClaraVista

Paris, France




-- 
Hao Ren

{Data, Software} Engineer @ ClaraVista

Paris, France

Re: Re: HiveContext setConf seems not stable

Reply via email to