As a workaround, can you call getConf first before any setConf? On Tue, Apr 21, 2015 at 1:58 AM, Ophir Cohen <oph...@gmail.com> wrote:
> I think I encounter the same problem, I'm trying to turn on the > compression of Hive. > I have the following lines: > def initHiveContext(sc: SparkContext): HiveContext = { > val hc: HiveContext = new HiveContext(sc) > hc.setConf("hive.exec.compress.output", "true") > hc.setConf("mapreduce.output.fileoutputformat.compress.codec", > "org.apache.hadoop.io.compress.SnappyCodec") > hc.setConf("mapreduce.output.fileoutputformat.compress.type", "BLOCK") > > > logger.info(hc.getConf("hive.exec.compress.output")) > logger.info > (hc.getConf("mapreduce.output.fileoutputformat.compress.codec")) > logger.info > (hc.getConf("mapreduce.output.fileoutputformat.compress.type")) > > hc > } > And the log for calling it twice: > 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: false > 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: > org.apache.hadoop.io.compress.SnappyCodec > 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK > 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: true > 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: > org.apache.hadoop.io.compress.SnappyCodec > 15/04/21 08:37:39 INFO util.SchemaRDDUtils$: BLOCK > > BTW > It worked on 1.2.1... > > > On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren <inv...@gmail.com> wrote: > >> Hi, >> >> Jira created: https://issues.apache.org/jira/browse/SPARK-6675 >> >> Thank you. >> >> >> On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust <mich...@databricks.com> >> wrote: >> >>> Can you open a JIRA please? >>> >>> On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren <inv...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I find HiveContext.setConf does not work correctly. Here are some code >>>> snippets showing the problem: >>>> >>>> snippet 1: >>>> >>>> ---------------------------------------------------------------------------------------------------------------- >>>> import org.apache.spark.sql.hive.HiveContext >>>> import org.apache.spark.{SparkConf, SparkContext} >>>> >>>> object Main extends App { >>>> >>>> val conf = new SparkConf() >>>> .setAppName("context-test") >>>> .setMaster("local[8]") >>>> val sc = new SparkContext(conf) >>>> val hc = new HiveContext(sc) >>>> >>>> *hc.setConf("spark.sql.shuffle.partitions", "10")* >>>> * hc.setConf("hive.metastore.warehouse.dir", >>>> "/home/spark/hive/warehouse_test")* >>>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>>> hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach >>>> println >>>> } >>>> >>>> ---------------------------------------------------------------------------------------------------------------- >>>> >>>> *Results:* >>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) >>>> (spark.sql.shuffle.partitions,10) >>>> >>>> snippet 2: >>>> >>>> ---------------------------------------------------------------------------------------------------------------- >>>> ... >>>> *hc.setConf("hive.metastore.warehouse.dir", >>>> "/home/spark/hive/warehouse_test")* >>>> * hc.setConf("spark.sql.shuffle.partitions", "10")* >>>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>>> hc.getAllConfs filter(_._1.contains("shuffle.partitions")) foreach >>>> println >>>> ... >>>> >>>> ---------------------------------------------------------------------------------------------------------------- >>>> >>>> *Results:* >>>> (hive.metastore.warehouse.dir,/user/hive/warehouse) >>>> (spark.sql.shuffle.partitions,10) >>>> >>>> *You can see that I just permuted the two setConf call, then that leads >>>> to two different Hive configuration.* >>>> *It seems that HiveContext can not set a new value on >>>> "hive.metastore.warehouse.dir" key in one or the first "setConf" call.* >>>> *You need another "setConf" call before changing >>>> "hive.metastore.warehouse.dir". For example, set >>>> "hive.metastore.warehouse.dir" twice and the snippet 1* >>>> >>>> snippet 3: >>>> >>>> ---------------------------------------------------------------------------------------------------------------- >>>> ... >>>> * hc.setConf("hive.metastore.warehouse.dir", >>>> "/home/spark/hive/warehouse_test")* >>>> * hc.setConf("hive.metastore.warehouse.dir", >>>> "/home/spark/hive/warehouse_test")* >>>> hc.getAllConfs filter(_._1.contains("warehouse.dir")) foreach println >>>> ... >>>> >>>> ---------------------------------------------------------------------------------------------------------------- >>>> >>>> *Results:* >>>> (hive.metastore.warehouse.dir,/home/spark/hive/warehouse_test) >>>> >>>> >>>> *You can reproduce this if you move to the latest branch-1.3 >>>> (1.3.1-snapshot, htag = 7d029cb1eb6f1df1bce1a3f5784fb7ce2f981a33)* >>>> >>>> *I have also tested the released 1.3.0 (htag = >>>> 4aaf48d46d13129f0f9bdafd771dd80fe568a7dc). It has the same problem.* >>>> >>>> *Please tell me if I am missing something. Any help is highly >>>> appreciated.* >>>> >>>> Hao >>>> >>>> -- >>>> Hao Ren >>>> >>>> {Data, Software} Engineer @ ClaraVista >>>> >>>> Paris, France >>>> >>> >>> >> >> >> -- >> Hao Ren >> >> {Data, Software} Engineer @ ClaraVista >> >> Paris, France >> > >