Hi all , My understanding for this problem is SQLConf will be overwrite by the hiveconfig in initialization phase when setConf(key: String, value: String) being called in the first time as below code snippets , so it is correctly in later. I`m not sure whether it is right , any point are welcome. Thanks. @transient protected[hive] lazy val hiveconf: HiveConf = { setConf(sessionState.getConf.getAllProperties) sessionState.getConf } protected def runHive(cmd: String, maxRows: Int = 1000): Seq[String] = synchronized { try { val cmd_trimmed: String = cmd.trim() val tokens: Array[String] = cmd_trimmed.split("\\s+") val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim() val proc: CommandProcessor = HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf) . . .}protected[sql] def runSqlHive(sql: String): Seq[String] = { val maxResults = 100000 val results = runHive(sql, maxResults) // It is very confusing when you only get back some of the results... if (results.size == maxResults) sys.error("RESULTS POSSIBLY TRUNCATED") results }override def setConf(key: String, value: String): Unit = { super.setConf(key, value) runSqlHive(s"SET $key=$value")
} From: madhu phatak Date: 2015-04-23 02:17 To: Michael Armbrust CC: Ophir Cohen; Hao Ren; user Subject: Re: HiveContext setConf seems not stable Hi, calling getConf don't solve the issue. Even many hive specific queries are broken. Seems like no hive configurations are getting passed properly. Regards, Madhukara Phatak http://datamantra.io/ On Wed, Apr 22, 2015 at 2:19 AM, Michael Armbrust <mich...@databricks.com> wrote: As a workaround, can you call getConf first before any setConf? On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen <oph...@gmail.com> wrote: I think I encounter the same problem, I'm trying to turn on the compression of Hive. I have the following lines: def initHiveContext(sc: SparkContext): HiveContext = { val hc: HiveContext = new HiveContext(sc) hc.setConf("hive.exec.compress.output", "true") hc.setConf("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec") hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK") logger.info(hc.getConf("hive.exec.compress.output")) logger.info(hc.getConf("mapreduce.output.fileoutputformat.compress.codec")) logger.info(hc.getConf("mapreduce.output.fileoutputformat.compress.type")) hc } And the log for calling it twice: 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: org.apache.hadoop.io.compress.SnappyCodec 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: org.apache.hadoop.io.compress.SnappyCodec 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK BTW It worked on 1.2.1... On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <inv...@gmail.com> wrote: Hi, Jira created: https://issues.apache.org/jira/browse/SPARK-6675 Thank you. On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <mich...@databricks.com> wrote: Can you open a JIRA please? On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <inv...@gmail.com> wrote: Hi, I find HiveContext.setConf does not work correctly. Here are some code snippets showing the problem: snippet 1: ---------------------------------------------------------------------------------------------------------------- import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkConf, SparkContext} object Main extends App { val conf = new SparkConf() .setAppName("context-test") .setMaster("local[8]") val sc = new SparkContext(conf) val hc = new HiveContext(sc) hc.setConf("spark.sql.shuffle.partitions", "10") hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test") hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach println } ---------------------------------------------------------------------------------------------------------------- Results: (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) (spark.sql.shuffle.partitions,10) snippet 2: ---------------------------------------------------------------------------------------------------------------- ... hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test") hc.setConf("spark.sql.shuffle.partitions", "10") hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach println ... ---------------------------------------------------------------------------------------------------------------- Results: (hive.metastore.warehouse.dir,/user/hive/warehouse) (spark.sql.shuffle.partitions,10) You can see that I just permuted the two setConf call, then that leads to two different Hive configuration. It seems that HiveContext can not set a new value on "hive.metastore.warehouse.dir" key in one or the first "setConf" call. You need another "setConf" call before changing "hive.metastore.warehouse.dir". For example, set "hive.metastore.warehouse.dir" twice and the snippet 1 snippet 3: ---------------------------------------------------------------------------------------------------------------- ... hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test") hc.setConf("hive.metastore.warehouse.dir", "/home/spark/hive/warehouse_test") hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println ... ---------------------------------------------------------------------------------------------------------------- Results: (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) You can reproduce this if you move to the latest branch-1.3 (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33) I have also tested the released 1.3.0 (htag = 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem. Please tell me if I am missing something. Any help is highly appreciated. Hao -- Hao Ren {Data, Software} Engineer @ ClaraVista Paris, France -- Hao Ren {Data, Software} Engineer @ ClaraVista Paris, France