ah, that explains it, many thanks! On Sat, May 16, 2015 at 7:41 PM, Yana Kadiyska <[email protected]> wrote:
> oh...metastore_db location is not controlled by > hive.metastore.warehouse.dir -- one is the location of your metastore DB, > the other is the physical location of your stored data. Checkout this SO > thread: > http://stackoverflow.com/questions/13624893/metastore-db-created-wherever-i-run-hive > > > On Sat, May 16, 2015 at 9:07 AM, Tamas Jambor <[email protected]> wrote: > >> Gave it another try - it seems that it picks up the variable and prints >> out the correct value, but still puts the metatore_db folder in the current >> directory, regardless. >> >> On Sat, May 16, 2015 at 1:13 PM, Tamas Jambor <[email protected]> wrote: >> >>> Thank you for the reply. >>> >>> I have tried your experiment, it seems that it does not print the >>> settings out in spark-shell (I'm using 1.3 by the way). >>> >>> Strangely I have been experimenting with an SQL connection instead, >>> which works after all (still if I go to spark-shell and try to print out >>> the SQL settings that I put in hive-site.xml, it does not print them). >>> >>> >>> On Fri, May 15, 2015 at 7:22 PM, Yana Kadiyska <[email protected]> >>> wrote: >>> >>>> My point was more to how to verify that properties are picked up from >>>> the hive-site.xml file. You don't really need hive.metastore.uris if >>>> you're not running against an external metastore. I just did an >>>> experiment with warehouse.dir. >>>> >>>> My hive-site.xml looks like this: >>>> >>>> <configuration> >>>> <property> >>>> <name>hive.metastore.warehouse.dir</name> >>>> <value>/home/ykadiysk/Github/warehouse_dir</value> >>>> <description>location of default database for the >>>> warehouse</description> >>>> </property> >>>> </configuration> >>>> >>>> >>>> >>>> and spark-shell code: >>>> >>>> scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc) >>>> hc: org.apache.spark.sql.hive.HiveContext = >>>> org.apache.spark.sql.hive.HiveContext@3036c16f >>>> >>>> scala> hc.sql("show tables").collect >>>> 15/05/15 14:12:57 INFO HiveMetaStore: 0: Opening raw store with >>>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore >>>> 15/05/15 14:12:57 INFO ObjectStore: ObjectStore, initialize called >>>> 15/05/15 14:12:57 INFO Persistence: Property datanucleus.cache.level2 >>>> unknown - will be ignored >>>> 15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in >>>> CLASSPATH (or one of dependencies) >>>> 15/05/15 14:12:58 WARN Connection: BoneCP specified but not present in >>>> CLASSPATH (or one of dependencies) >>>> 15/05/15 14:13:03 INFO ObjectStore: Setting MetaStore object pin classes >>>> with >>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" >>>> 15/05/15 14:13:03 INFO ObjectStore: Initialized ObjectStore >>>> 15/05/15 14:13:04 WARN ObjectStore: Version information not found in >>>> metastore. hive.metastore.schema.verification is not enabled so recording >>>> the schema version 0.12.0-protobuf-2.5 >>>> 15/05/15 14:13:05 INFO HiveMetaStore: 0: get_tables: db=default pat=.* >>>> 15/05/15 14:13:05 INFO audit: ugi=ykadiysk ip=unknown-ip-addr >>>> cmd=get_tables: db=default pat=.* >>>> 15/05/15 14:13:05 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>>> "embedded-only" so does not have its own datastore table. >>>> 15/05/15 14:13:05 INFO Datastore: The class >>>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as >>>> "embedded-only" so does not have its own datastore table. >>>> res0: Array[org.apache.spark.sql.Row] = Array() >>>> >>>> scala> hc.getConf("hive.metastore.warehouse.dir") >>>> res1: String = /home/ykadiysk/Github/warehouse_dir >>>> >>>> >>>> >>>> I have not tried an HDFS path but you should be at least able to verify >>>> that the variable is being read. It might be that your value is read but is >>>> otherwise not liked... >>>> >>>> On Fri, May 15, 2015 at 2:03 PM, Tamas Jambor <[email protected]> >>>> wrote: >>>> >>>>> thanks for the reply. I am trying to use it without hive setup >>>>> (spark-standalone), so it prints something like this: >>>>> >>>>> hive_ctx.sql("show tables").collect() >>>>> 15/05/15 17:59:03 INFO HiveMetaStore: 0: Opening raw store with >>>>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore >>>>> 15/05/15 17:59:03 INFO ObjectStore: ObjectStore, initialize called >>>>> 15/05/15 17:59:04 INFO Persistence: Property datanucleus.cache.level2 >>>>> unknown - will be ignored >>>>> 15/05/15 17:59:04 INFO Persistence: Property >>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored >>>>> 15/05/15 17:59:04 WARN Connection: BoneCP specified but not present in >>>>> CLASSPATH (or one of dependencies) >>>>> 15/05/15 17:59:05 WARN Connection: BoneCP specified but not present in >>>>> CLASSPATH (or one of dependencies) >>>>> 15/05/15 17:59:08 INFO BlockManagerMasterActor: Registering block >>>>> manager xxxx:42819 with 3.0 GB RAM, BlockManagerId(2, xxx, 42819) >>>>> >>>>> [0/1844] >>>>> 15/05/15 17:59:18 INFO ObjectStore: Setting MetaStore object pin >>>>> classes with >>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" >>>>> 15/05/15 17:59:18 INFO MetaStoreDirectSql: MySQL check failed, >>>>> assuming we are not on mysql: Lexical error at line 1, column 5. >>>>> Encountered: "@" (64), after : "". >>>>> 15/05/15 17:59:20 INFO Datastore: The class >>>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>>>> "embedded-only" so does not have its own datastore table. >>>>> 15/05/15 17:59:20 INFO Datastore: The class >>>>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as >>>>> "embedded-only" so does not have its own datastore table. >>>>> 15/05/15 17:59:28 INFO Datastore: The class >>>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as >>>>> "embedded-only" so does not have its own datastore table. >>>>> 15/05/15 17:59:29 INFO Datastore: The class >>>>> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as >>>>> "embedded-only" so does not have its own datastore table. >>>>> 15/05/15 17:59:31 INFO ObjectStore: Initialized ObjectStore >>>>> 15/05/15 17:59:32 WARN ObjectStore: Version information not found in >>>>> metastore. hive.metastore.schema.verification is not enabled so recording >>>>> the schema version 0.13.1aa >>>>> 15/05/15 17:59:33 WARN MetricsConfig: Cannot locate configuration: >>>>> tried >>>>> hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties >>>>> 15/05/15 17:59:33 INFO MetricsSystemImpl: Scheduled snapshot period at >>>>> 10 second(s). >>>>> 15/05/15 17:59:33 INFO MetricsSystemImpl: azure-file-system metrics >>>>> system started >>>>> 15/05/15 17:59:33 INFO HiveMetaStore: Added admin role in metastore >>>>> 15/05/15 17:59:34 INFO HiveMetaStore: Added public role in metastore >>>>> 15/05/15 17:59:34 INFO HiveMetaStore: No user is added in admin role, >>>>> since config is empty >>>>> 15/05/15 17:59:35 INFO SessionState: No Tez session required at this >>>>> point. hive.execution.engine=mr. >>>>> 15/05/15 17:59:37 INFO HiveMetaStore: 0: get_tables: db=default pat=.* >>>>> 15/05/15 17:59:37 INFO audit: ugi=testuser ip=unknown-ip-addr >>>>> cmd=get_tables: db=default pat=.* >>>>> >>>>> not sure what to put in hive.metastore.uris in this case? >>>>> >>>>> >>>>> On Fri, May 15, 2015 at 2:52 PM, Yana Kadiyska < >>>>> [email protected]> wrote: >>>>> >>>>>> This should work. Which version of Spark are you using? Here is what >>>>>> I do -- make sure hive-site.xml is in the conf directory of the machine >>>>>> you're using the driver from. Now let's run spark-shell from that >>>>>> machine: >>>>>> >>>>>> scala> val hc= new org.apache.spark.sql.hive.HiveContext(sc) >>>>>> hc: org.apache.spark.sql.hive.HiveContext = >>>>>> org.apache.spark.sql.hive.HiveContext@6e9f8f26 >>>>>> >>>>>> scala> hc.sql("show tables").collect >>>>>> 15/05/15 09:34:17 INFO metastore: Trying to connect to metastore with >>>>>> URI thrift://hostname.com:9083 <-- here should be a value >>>>>> from your hive-site.xml >>>>>> 15/05/15 09:34:17 INFO metastore: Waiting 1 seconds before next >>>>>> connection attempt. >>>>>> 15/05/15 09:34:18 INFO metastore: Connected to metastore. >>>>>> res0: Array[org.apache.spark.sql.Row] = Array([table1,false], >>>>>> >>>>>> scala> hc.getConf("hive.metastore.uris") >>>>>> res13: String = thrift://hostname.com:9083 >>>>>> >>>>>> scala> hc.getConf("hive.metastore.warehouse.dir") >>>>>> res14: String = /user/hive/warehouse >>>>>> >>>>>> >>>>>> >>>>>> The first line tells you which metastore it's trying to connect to -- >>>>>> this should be the string specified under hive.metastore.uris property in >>>>>> your hive-site.xml file. I have not mucked with warehouse.dir too much >>>>>> but >>>>>> I know that the value of the metastore URI is in fact picked up from >>>>>> there >>>>>> as I regularly point to different systems... >>>>>> >>>>>> >>>>>> On Thu, May 14, 2015 at 6:26 PM, Tamas Jambor <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I have tried to put the hive-site.xml file in the conf/ directory >>>>>>> with, seems it is not picking up from there. >>>>>>> >>>>>>> >>>>>>> On Thu, May 14, 2015 at 6:50 PM, Michael Armbrust < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> You can configure Spark SQLs hive interaction by placing a >>>>>>>> hive-site.xml file in the conf/ directory. >>>>>>>> >>>>>>>> On Thu, May 14, 2015 at 10:24 AM, jamborta <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> is it possible to set hive.metastore.warehouse.dir, that is >>>>>>>>> internally >>>>>>>>> create by spark, to be stored externally (e.g. s3 on aws or wasb >>>>>>>>> on azure)? >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/store-hive-metastore-on-persistent-store-tp22891.html >>>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>>> Nabble.com. >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: [email protected] >>>>>>>>> For additional commands, e-mail: [email protected] >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
