Hi,

Thank you, I have looked up the source code of Hcatalog, it seems every time 
when I run hcat -e “query”, it called hcatCli, then it make configuration, 
create and start a session, then dump it after being used. It can’t keep a 
session or connection and don’t have a Cli. The initialization take all the 
time. Therefore, I only can use the thrift API to do my job. Thank you for your 
precious suggestions!

Best regards,
Hou
> 在 2018年4月24日,下午7:45,Peter Vary <pv...@cloudera.com> 写道:
> 
> Hi Hou,
> 
> Kudu uses the Thrift HMS interface, and written in C. An example could be 
> found here:
> https://github.com/apache/kudu/tree/master/src/kudu/hms 
> <https://github.com/apache/kudu/tree/master/src/kudu/hms>
> 
> As for parametrizing Hcatalog I have only found this:
> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties
>  
> <https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties>
> But have not find anything there which might help you there.
> 
> Peter
> 
>> On Apr 24, 2018, at 10:51 AM, 侯宗田 <zongtian...@icloud.com> wrote:
>> 
>> Hi, Peter:
>> I have started a standalone metastore server and it indeed short that part 
>> of time, it does connection instead of initialization. But I still have some 
>> questions,
>> First, I believe the Hcatalog must be quick because it is a mature product 
>> and I have not seen others complaining about this problem, is there some 
>> configuration which controls starting new session or how to keep a session 
>> connected to the HMS, in the log below it started a new session and 
>> connected twice. 
>> Second, I am very interested in using the HMS thrift API, but I could not 
>> found an example of how to use it in C/C++ to access hive table info. Do you 
>> know some link about it?
>> Really thank you for your time!!
>> 
>> Best regards,
>> Hou
>> 
>> $time ./hcat.py -e "use default; show table extended like haha;"
>> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file 
>> file:/usr/local/hive/conf/hive-site.xml
>> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
>> library for your platform... using builtin-java classes where applicable
>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>> 18/04/24 15:47:10 INFO session.SessionState: Created local directory: 
>> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory: 
>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
>> 18/04/24 15:47:10 INFO ql.Driver: Compiling 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>  use default
>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with 
>> URI thrift://localhost:9083
>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, 
>> current connections: 1
>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: 
>> Schema(fieldSchemas:null, properties:null)
>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5);
>>  Time taken: 1.591 seconds
>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating 
>> a lock manager
>> 18/04/24 15:47:12 INFO ql.Driver: Executing 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>  use default
>> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created 
>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext 
>> [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
>> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be 
>> ignored, since hive.security.authorization.manager is set to instance of 
>> HiveAuthorizerFactory.
>> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration 
>> hive.metastore.filter.hook changed from 
>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to 
>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, 
>> current connections: 0
>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with 
>> URI thrift://localhost:9083
>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore, 
>> current connections: 1
>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing 
>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5);
>>  Time taken: 0.119 seconds
>> OK
>> 18/04/24 15:47:12 INFO ql.Driver: OK
>> Time taken: 1.728 seconds
>> 18/04/24 15:47:12 INFO ql.Driver: Compiling 
>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59):
>>  show table extended like haha
>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema: 
>> Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
>> deserializer)], properties:null)
>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Initializing operator 
>> LIST_SINK[0]
>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling 
>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59);
>>  Time taken: 0.166 seconds
>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not creating 
>> a lock manager
>> 18/04/24 15:47:12 INFO ql.Driver: Executing 
>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59):
>>  show table extended like haha
>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>> 18/04/24 15:47:12 INFO exec.DDLTask: pattern: haha
>> 18/04/24 15:47:12 INFO exec.DDLTask: results : 1
>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing 
>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59);
>>  Time taken: 0.187 seconds
>> OK
>> 18/04/24 15:47:12 INFO ql.Driver: OK
>> 18/04/24 15:47:12 INFO Configuration.deprecation: mapred.input.dir is 
>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>> 18/04/24 15:47:12 INFO mapred.FileInputFormat: Total input paths to process 
>> : 1
>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Closing operator LIST_SINK[0]
>> tableName:haha
>> owner:kousouda
>> location:hdfs://localhost:8020/user/hive/warehouse/haha
>> inputformat:org.apache.hadoop.mapred.TextInputFormat
>> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>> columns:struct columns { i32 id}
>> partitioned:false
>> partitionColumns:
>> totalNumberFiles:2
>> totalFileSize:4
>> maxFileSize:2
>> minFileSize:2
>> lastAccessTime:1524535110334
>> lastUpdateTime:1524535113101
>> 
>> Time taken: 0.394 seconds
>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: 
>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme 
>> hdfs
>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory: 
>> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with 
>> scheme file
>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore, 
>> current connections: 0 
>> 
>> real 0m5.593s
>> user 0m8.645s
>> sys  0m0.523s
>>> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道:
>>> 
>>> Hi,
>>> 
>>> Disclaimer: I am not too familiar with the webhcat yet.
>>> From the logs, I see, that:
>>> - the first 3 seconds spent on starting a new session, and maybe a driver - 
>>> this can be reduced, if the session is already there, and the HiveServer2 
>>> is started (but do not know if webhcat could use HS2, or reuse sessions) - 
>>> this delay could be avoided if you use any of the 3 solutions suggested in 
>>> my last mail.
>>> - the next 3 seconds spent on initializing the metastore. This can be 
>>> reduced if a standalone metastore is started, and the webhcat is configured 
>>> to access this metastore.
>>> 
>>> Hope this helps,
>>> Peter
>>> 
>>>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zongtian...@icloud.com> wrote:
>>>> 
>>>> Thank you very much for your reply, I am wondering whether I use the 
>>>> webhcat rightly, I don’t think it is normal to create all the directories 
>>>> and objects to get a table describ and take 8 seconds. The webhcat should 
>>>> not be so slow, Or it is because I forget to start some server which can 
>>>> respond immediately?   
>>>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Alexander Kolbasov has a project which might interest you (keeping in 
>>>>> mind,
>>>>> that this is not production ready - more like a proof of concept):
>>>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>>>> 
>>>>> Also you can use HMS thrift API directly to access the MetaStore, or if 
>>>>> you
>>>>> can/want write java code, you can use HiveMetastoreClient class to do it 
>>>>> in
>>>>> java.
>>>>> 
>>>>> I am not sure about the performance gains compared to HCat, but currently
>>>>> there are no faster interfaces for HMS that I know of.
>>>>> 
>>>>> Regards,
>>>>> Peter
>>>>> 
>>>>> 
>>>>> 侯宗田 <zongtian...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét 2:40):
>>>>> 
>>>>>> Can anyone give me some suggestions? I have been stuck in this problem 
>>>>>> for
>>>>>> several days. Need help!!
>>>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zongtian...@icloud.com> 写道:
>>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am writing a application which needs the metastore about hive tables.
>>>>>> I have used webhcat to get the information about tables and process them.
>>>>>> But a simple request takes over eight seconds to respond on localhost. 
>>>>>> Why
>>>>>> is this so slow, and how can I fix it or is there other way I can extract
>>>>>> the metadata in C?
>>>>>>> 
>>>>>>> $ time curl -s '
>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>> <
>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>>> '
>>>>>>> {"columns":
>>>>>>> [{"name":"id","type":"int"}],
>>>>>>> "database":"default",
>>>>>>> "table":"haha"}
>>>>>>> 
>>>>>>> real    0m8.400s
>>>>>>> user    0m0.053s
>>>>>>> sys     0m0.019s
>>>>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>>>>> them, it takes very long time, does anyone have some ideas about it?? Any
>>>>>> suggestions will be very appreciated!
>>>>>>> 
>>>>>>> $hcat.py -e "use default; desc haha; "
>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>>>>> file:/usr/local/hive/conf/hive-site.xml
>>>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>>>>> native-hadoop library for your platform... using builtin-java classes 
>>>>>> where
>>>>>> applicable
>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>> use default
>>>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>> called
>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>>>>> pin classes with
>>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>> underlying DB is MYSQL
>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>>>>> metastore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>>>>> metastore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>>>>> admin role, since config is empty
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_all_functions
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>>>>> Schema(fieldSchemas:null, properties:null)
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>> Time taken: 3.936 seconds
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>>>>> creating a lock manager
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>> use default
>>>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>>>>> be ignored, since hive.security.authorization.manager is set to instance 
>>>>>> of
>>>>>> HiveAuthorizerFactory.
>>>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>>>>> hive.metastore.filter.hook changed from
>>>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>>>>> local RawStore...
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=Cleaning up thread local RawStore...
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>>>>> thread local RawStore
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=Done cleaning up thread local RawStore
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>>>>> mode
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>> called
>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>> underlying DB is MYSQL
>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database: default
>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>> Time taken: 0.202 seconds
>>>>>>> OK
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to