Hi, Thank you, I have looked up the source code of Hcatalog, it seems every time when I run hcat -e “query”, it called hcatCli, then it make configuration, create and start a session, then dump it after being used. It can’t keep a session or connection and don’t have a Cli. The initialization take all the time. Therefore, I only can use the thrift API to do my job. Thank you for your precious suggestions!
Best regards, Hou > 在 2018年4月24日,下午7:45,Peter Vary <pv...@cloudera.com> 写道: > > Hi Hou, > > Kudu uses the Thrift HMS interface, and written in C. An example could be > found here: > https://github.com/apache/kudu/tree/master/src/kudu/hms > <https://github.com/apache/kudu/tree/master/src/kudu/hms> > > As for parametrizing Hcatalog I have only found this: > https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties > > <https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties> > But have not find anything there which might help you there. > > Peter > >> On Apr 24, 2018, at 10:51 AM, 侯宗田 <zongtian...@icloud.com> wrote: >> >> Hi, Peter: >> I have started a standalone metastore server and it indeed short that part >> of time, it does connection instead of initialization. But I still have some >> questions, >> First, I believe the Hcatalog must be quick because it is a mature product >> and I have not seen others complaining about this problem, is there some >> configuration which controls starting new session or how to keep a session >> connected to the HMS, in the log below it started a new session and >> connected twice. >> Second, I am very interested in using the HMS thrift API, but I could not >> found an example of how to use it in C/C++ to access hive table info. Do you >> know some link about it? >> Really thank you for your time!! >> >> Best regards, >> Hou >> >> $time ./hcat.py -e "use default; show table extended like haha;" >> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file >> file:/usr/local/hive/conf/hive-site.xml >> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: >> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f >> 18/04/24 15:47:10 INFO session.SessionState: Created local directory: >> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f >> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: >> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db >> 18/04/24 15:47:10 INFO ql.Driver: Compiling >> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): >> use default >> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with >> URI thrift://localhost:9083 >> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, >> current connections: 1 >> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore. >> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed >> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: >> Schema(fieldSchemas:null, properties:null) >> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling >> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); >> Time taken: 1.591 seconds >> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating >> a lock manager >> 18/04/24 15:47:12 INFO ql.Driver: Executing >> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5): >> use default >> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created >> SQLStdHiveAccessController for session context : HiveAuthzSessionContext >> [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI] >> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be >> ignored, since hive.security.authorization.manager is set to instance of >> HiveAuthorizerFactory. >> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration >> hive.metastore.filter.hook changed from >> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to >> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook >> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, >> current connections: 0 >> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with >> URI thrift://localhost:9083 >> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, >> current connections: 1 >> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore. >> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode >> 18/04/24 15:47:12 INFO ql.Driver: Completed executing >> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5); >> Time taken: 0.119 seconds >> OK >> 18/04/24 15:47:12 INFO ql.Driver: OK >> Time taken: 1.728 seconds >> 18/04/24 15:47:12 INFO ql.Driver: Compiling >> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): >> show table extended like haha >> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed >> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: >> Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from >> deserializer)], properties:null) >> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Initializing operator >> LIST_SINK[0] >> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling >> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); >> Time taken: 0.166 seconds >> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating >> a lock manager >> 18/04/24 15:47:12 INFO ql.Driver: Executing >> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59): >> show table extended like haha >> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode >> 18/04/24 15:47:12 INFO exec.DDLTask: pattern: haha >> 18/04/24 15:47:12 INFO exec.DDLTask: results : 1 >> 18/04/24 15:47:12 INFO ql.Driver: Completed executing >> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59); >> Time taken: 0.187 seconds >> OK >> 18/04/24 15:47:12 INFO ql.Driver: OK >> 18/04/24 15:47:12 INFO Configuration.deprecation: mapred.input.dir is >> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir >> 18/04/24 15:47:12 INFO mapred.FileInputFormat: Total input paths to process >> : 1 >> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Closing operator LIST_SINK[0] >> tableName:haha >> owner:kousouda >> location:hdfs://localhost:8020/user/hive/warehouse/haha >> inputformat:org.apache.hadoop.mapred.TextInputFormat >> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat >> columns:struct columns { i32 id} >> partitioned:false >> partitionColumns: >> totalNumberFiles:2 >> totalFileSize:4 >> maxFileSize:2 >> minFileSize:2 >> lastAccessTime:1524535110334 >> lastUpdateTime:1524535113101 >> >> Time taken: 0.394 seconds >> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: >> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme >> hdfs >> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: >> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with >> scheme file >> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, >> current connections: 0 >> >> real 0m5.593s >> user 0m8.645s >> sys 0m0.523s >>> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道: >>> >>> Hi, >>> >>> Disclaimer: I am not too familiar with the webhcat yet. >>> From the logs, I see, that: >>> - the first 3 seconds spent on starting a new session, and maybe a driver - >>> this can be reduced, if the session is already there, and the HiveServer2 >>> is started (but do not know if webhcat could use HS2, or reuse sessions) - >>> this delay could be avoided if you use any of the 3 solutions suggested in >>> my last mail. >>> - the next 3 seconds spent on initializing the metastore. This can be >>> reduced if a standalone metastore is started, and the webhcat is configured >>> to access this metastore. >>> >>> Hope this helps, >>> Peter >>> >>>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zongtian...@icloud.com> wrote: >>>> >>>> Thank you very much for your reply, I am wondering whether I use the >>>> webhcat rightly, I don’t think it is normal to create all the directories >>>> and objects to get a table describ and take 8 seconds. The webhcat should >>>> not be so slow, Or it is because I forget to start some server which can >>>> respond immediately? >>>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道: >>>>> >>>>> Hi, >>>>> >>>>> Alexander Kolbasov has a project which might interest you (keeping in >>>>> mind, >>>>> that this is not production ready - more like a proof of concept): >>>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md >>>>> >>>>> Also you can use HMS thrift API directly to access the MetaStore, or if >>>>> you >>>>> can/want write java code, you can use HiveMetastoreClient class to do it >>>>> in >>>>> java. >>>>> >>>>> I am not sure about the performance gains compared to HCat, but currently >>>>> there are no faster interfaces for HMS that I know of. >>>>> >>>>> Regards, >>>>> Peter >>>>> >>>>> >>>>> 侯宗田 <zongtian...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40): >>>>> >>>>>> Can anyone give me some suggestions? I have been stuck in this problem >>>>>> for >>>>>> several days. Need help!! >>>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zongtian...@icloud.com> 写道: >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am writing a application which needs the metastore about hive tables. >>>>>> I have used webhcat to get the information about tables and process them. >>>>>> But a simple request takes over eight seconds to respond on localhost. >>>>>> Why >>>>>> is this so slow, and how can I fix it or is there other way I can extract >>>>>> the metadata in C? >>>>>>> >>>>>>> $ time curl -s ' >>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean >>>>>> < >>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean >>>>>>> ' >>>>>>> {"columns": >>>>>>> [{"name":"id","type":"int"}], >>>>>>> "database":"default", >>>>>>> "table":"haha"} >>>>>>> >>>>>>> real 0m8.400s >>>>>>> user 0m0.053s >>>>>>> sys 0m0.019s >>>>>>> it seems to run a hcat.py, and it create a bunch of things then clear >>>>>> them, it takes very long time, does anyone have some ideas about it?? Any >>>>>> suggestions will be very appreciated! >>>>>>> >>>>>>> $hcat.py -e "use default; desc haha; " >>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>> SLF4J: Found binding in >>>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: Found binding in >>>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings < >>>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation. >>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file >>>>>> file:/usr/local/hive/conf/hive-site.xml >>>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load >>>>>> native-hadoop library for your platform... using builtin-java classes >>>>>> where >>>>>> applicable >>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: >>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668 >>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory: >>>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668 >>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory: >>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db >>>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling >>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): >>>>>> use default >>>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store >>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore >>>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize >>>>>> called >>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property >>>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored >>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property >>>>>> datanucleus.cache.level2 unknown - will be ignored >>>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object >>>>>> pin classes with >>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" >>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, >>>>>> underlying DB is MYSQL >>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in >>>>>> metastore >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in >>>>>> metastore >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in >>>>>> admin role, since config is empty >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions >>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda >>>>>> ip=unknown-ip-addr cmd=get_all_functions >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default >>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda >>>>>> ip=unknown-ip-addr cmd=get_database: default >>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed >>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema: >>>>>> Schema(fieldSchemas:null, properties:null) >>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling >>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); >>>>>> Time taken: 3.936 seconds >>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not >>>>>> creating a lock manager >>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing >>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62): >>>>>> use default >>>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created >>>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext >>>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI] >>>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will >>>>>> be ignored, since hive.security.authorization.manager is set to instance >>>>>> of >>>>>> HiveAuthorizerFactory. >>>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration >>>>>> hive.metastore.filter.hook changed from >>>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to >>>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread >>>>>> local RawStore... >>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda >>>>>> ip=unknown-ip-addr cmd=Cleaning up thread local RawStore... >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up >>>>>> thread local RawStore >>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda >>>>>> ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore >>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial >>>>>> mode >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default >>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda >>>>>> ip=unknown-ip-addr cmd=get_database: default >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store >>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore >>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize >>>>>> called >>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL, >>>>>> underlying DB is MYSQL >>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore >>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default >>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda >>>>>> ip=unknown-ip-addr cmd=get_database: default >>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing >>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62); >>>>>> Time taken: 0.202 seconds >>>>>>> OK >>>>>> >>>>>> >>>> >>> >> >