Thanks for the info too! :)
> On Apr 25, 2018, at 11:12 AM, 侯宗田 <zongtian...@icloud.com> wrote:
>
> Hi,
>
> Thank you, I have looked up the source code of Hcatalog, it seems every time
> when I run hcat -e “query”, it called hcatCli, then it make configuration,
> create and start a session, then dump it after being used. It can’t keep a
> session or connection and don’t have a Cli. The initialization take all the
> time. Therefore, I only can use the thrift API to do my job. Thank you for
> your precious suggestions!
>
> Best regards,
> Hou
>> 在 2018年4月24日,下午7:45,Peter Vary <pv...@cloudera.com> 写道:
>>
>> Hi Hou,
>>
>> Kudu uses the Thrift HMS interface, and written in C. An example could be
>> found here:
>> https://github.com/apache/kudu/tree/master/src/kudu/hms
>> <https://github.com/apache/kudu/tree/master/src/kudu/hms>
>>
>> As for parametrizing Hcatalog I have only found this:
>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties
>>
>> <https://cwiki.apache.org/confluence/display/Hive/HCatalog+Configuration+Properties>
>> But have not find anything there which might help you there.
>>
>> Peter
>>
>>> On Apr 24, 2018, at 10:51 AM, 侯宗田 <zongtian...@icloud.com> wrote:
>>>
>>> Hi, Peter:
>>> I have started a standalone metastore server and it indeed short that part
>>> of time, it does connection instead of initialization. But I still have
>>> some questions,
>>> First, I believe the Hcatalog must be quick because it is a mature product
>>> and I have not seen others complaining about this problem, is there some
>>> configuration which controls starting new session or how to keep a session
>>> connected to the HMS, in the log below it started a new session and
>>> connected twice.
>>> Second, I am very interested in using the HMS thrift API, but I could not
>>> found an example of how to use it in C/C++ to access hive table info. Do
>>> you know some link about it?
>>> Really thank you for your time!!
>>>
>>> Best regards,
>>> Hou
>>>
>>> $time ./hcat.py -e "use default; show table extended like haha;"
>>> 18/04/24 15:47:08 INFO conf.HiveConf: Found configuration file
>>> file:/usr/local/hive/conf/hive-site.xml
>>> 18/04/24 15:47:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory:
>>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>>> 18/04/24 15:47:10 INFO session.SessionState: Created local directory:
>>> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f
>>> 18/04/24 15:47:10 INFO session.SessionState: Created HDFS directory:
>>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f/_tmp_space.db
>>> 18/04/24 15:47:10 INFO ql.Driver: Compiling
>>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>> use default
>>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with
>>> URI thrift://localhost:9083
>>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore,
>>> current connections: 1
>>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema:
>>> Schema(fieldSchemas:null, properties:null)
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling
>>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5);
>>> Time taken: 1.591 seconds
>>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not
>>> creating a lock manager
>>> 18/04/24 15:47:12 INFO ql.Driver: Executing
>>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5):
>>> use default
>>> 18/04/24 15:47:12 INFO sqlstd.SQLStdHiveAccessController: Created
>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>> [sessionString=6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f, clientType=HIVECLI]
>>> 18/04/24 15:47:12 WARN session.SessionState: METASTORE_FILTER_HOOK will be
>>> ignored, since hive.security.authorization.manager is set to instance of
>>> HiveAuthorizerFactory.
>>> 18/04/24 15:47:12 INFO hive.metastore: Mestastore configuration
>>> hive.metastore.filter.hook changed from
>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore,
>>> current connections: 0
>>> 18/04/24 15:47:12 INFO hive.metastore: Trying to connect to metastore with
>>> URI thrift://localhost:9083
>>> 18/04/24 15:47:12 INFO hive.metastore: Opened a connection to metastore,
>>> current connections: 1
>>> 18/04/24 15:47:12 INFO hive.metastore: Connected to metastore.
>>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing
>>> command(queryId=kousouda_20180424154710_e0443fb2-3930-4dc3-9965-25a9f98807a5);
>>> Time taken: 0.119 seconds
>>> OK
>>> 18/04/24 15:47:12 INFO ql.Driver: OK
>>> Time taken: 1.728 seconds
>>> 18/04/24 15:47:12 INFO ql.Driver: Compiling
>>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59):
>>> show table extended like haha
>>> 18/04/24 15:47:12 INFO ql.Driver: Semantic Analysis Completed
>>> 18/04/24 15:47:12 INFO ql.Driver: Returning Hive schema:
>>> Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from
>>> deserializer)], properties:null)
>>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Initializing operator
>>> LIST_SINK[0]
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed compiling
>>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59);
>>> Time taken: 0.166 seconds
>>> 18/04/24 15:47:12 INFO ql.Driver: Concurrency mode is disabled, not
>>> creating a lock manager
>>> 18/04/24 15:47:12 INFO ql.Driver: Executing
>>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59):
>>> show table extended like haha
>>> 18/04/24 15:47:12 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
>>> 18/04/24 15:47:12 INFO exec.DDLTask: pattern: haha
>>> 18/04/24 15:47:12 INFO exec.DDLTask: results : 1
>>> 18/04/24 15:47:12 INFO ql.Driver: Completed executing
>>> command(queryId=kousouda_20180424154712_99e6e25d-0505-44f1-a429-5ce45b0cae59);
>>> Time taken: 0.187 seconds
>>> OK
>>> 18/04/24 15:47:12 INFO ql.Driver: OK
>>> 18/04/24 15:47:12 INFO Configuration.deprecation: mapred.input.dir is
>>> deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
>>> 18/04/24 15:47:12 INFO mapred.FileInputFormat: Total input paths to process
>>> : 1
>>> 18/04/24 15:47:12 INFO exec.ListSinkOperator: Closing operator LIST_SINK[0]
>>> tableName:haha
>>> owner:kousouda
>>> location:hdfs://localhost:8020/user/hive/warehouse/haha
>>> inputformat:org.apache.hadoop.mapred.TextInputFormat
>>> outputformat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>> columns:struct columns { i32 id}
>>> partitioned:false
>>> partitionColumns:
>>> totalNumberFiles:2
>>> totalFileSize:4
>>> maxFileSize:2
>>> minFileSize:2
>>> lastAccessTime:1524535110334
>>> lastUpdateTime:1524535113101
>>>
>>> Time taken: 0.394 seconds
>>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory:
>>> /tmp/hive/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with scheme
>>> hdfs
>>> 18/04/24 15:47:12 INFO session.SessionState: Deleted directory:
>>> /tmp/hive/java/kousouda/6c7e97ad-c9dd-4c5e-9636-ab9d4e47d76f on fs with
>>> scheme file
>>> 18/04/24 15:47:12 INFO hive.metastore: Closed a connection to metastore,
>>> current connections: 0
>>>
>>> real 0m5.593s
>>> user 0m8.645s
>>> sys 0m0.523s
>>>> 在 2018年4月23日,下午3:57,Peter Vary <pv...@cloudera.com> 写道:
>>>>
>>>> Hi,
>>>>
>>>> Disclaimer: I am not too familiar with the webhcat yet.
>>>> From the logs, I see, that:
>>>> - the first 3 seconds spent on starting a new session, and maybe a driver
>>>> - this can be reduced, if the session is already there, and the
>>>> HiveServer2 is started (but do not know if webhcat could use HS2, or reuse
>>>> sessions) - this delay could be avoided if you use any of the 3 solutions
>>>> suggested in my last mail.
>>>> - the next 3 seconds spent on initializing the metastore. This can be
>>>> reduced if a standalone metastore is started, and the webhcat is
>>>> configured to access this metastore.
>>>>
>>>> Hope this helps,
>>>> Peter
>>>>
>>>>> On Apr 23, 2018, at 9:27 AM, 侯宗田 <zongtian...@icloud.com> wrote:
>>>>>
>>>>> Thank you very much for your reply, I am wondering whether I use the
>>>>> webhcat rightly, I don’t think it is normal to create all the directories
>>>>> and objects to get a table describ and take 8 seconds. The webhcat should
>>>>> not be so slow, Or it is because I forget to start some server which can
>>>>> respond immediately?
>>>>>> 在 2018年4月23日,下午3:06,Peter Vary <pv...@cloudera.com> 写道:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Alexander Kolbasov has a project which might interest you (keeping in
>>>>>> mind,
>>>>>> that this is not production ready - more like a proof of concept):
>>>>>> https://github.com/akolb1/gometastore/blob/master/hmstool/doc/hmstool.md
>>>>>>
>>>>>> Also you can use HMS thrift API directly to access the MetaStore, or if
>>>>>> you
>>>>>> can/want write java code, you can use HiveMetastoreClient class to do it
>>>>>> in
>>>>>> java.
>>>>>>
>>>>>> I am not sure about the performance gains compared to HCat, but currently
>>>>>> there are no faster interfaces for HMS that I know of.
>>>>>>
>>>>>> Regards,
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>> 侯宗田 <zongtian...@icloud.com> ezt írta (időpont: 2018. ápr. 23., Hét
>>>>>> 2:40):
>>>>>>
>>>>>>> Can anyone give me some suggestions? I have been stuck in this problem
>>>>>>> for
>>>>>>> several days. Need help!!
>>>>>>>> 在 2018年4月22日,下午9:38,侯宗田 <zongtian...@icloud.com> 写道:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am writing a application which needs the metastore about hive tables.
>>>>>>> I have used webhcat to get the information about tables and process
>>>>>>> them.
>>>>>>> But a simple request takes over eight seconds to respond on localhost.
>>>>>>> Why
>>>>>>> is this so slow, and how can I fix it or is there other way I can
>>>>>>> extract
>>>>>>> the metadata in C?
>>>>>>>>
>>>>>>>> $ time curl -s '
>>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>>> <
>>>>>>> http://localhost:50111/templeton/v1/ddl/database/default/table/haha?user.name=ctdean
>>>>>>>> '
>>>>>>>> {"columns":
>>>>>>>> [{"name":"id","type":"int"}],
>>>>>>>> "database":"default",
>>>>>>>> "table":"haha"}
>>>>>>>>
>>>>>>>> real 0m8.400s
>>>>>>>> user 0m0.053s
>>>>>>>> sys 0m0.019s
>>>>>>>> it seems to run a hcat.py, and it create a bunch of things then clear
>>>>>>> them, it takes very long time, does anyone have some ideas about it??
>>>>>>> Any
>>>>>>> suggestions will be very appreciated!
>>>>>>>>
>>>>>>>> $hcat.py -e "use default; desc haha; "
>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <
>>>>>>> http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
>>>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>>> 18/04/21 16:38:13 INFO conf.HiveConf: Found configuration file
>>>>>>> file:/usr/local/hive/conf/hive-site.xml
>>>>>>>> 18/04/21 16:38:15 WARN util.NativeCodeLoader: Unable to load
>>>>>>> native-hadoop library for your platform... using builtin-java classes
>>>>>>> where
>>>>>>> applicable
>>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created local directory:
>>>>>>> /tmp/hive/java/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668
>>>>>>>> 18/04/21 16:38:16 INFO session.SessionState: Created HDFS directory:
>>>>>>> /tmp/hive/kousouda/05096382-f9b6-4dae-aee2-dfa6750c0668/_tmp_space.db
>>>>>>>> 18/04/21 16:38:16 INFO ql.Driver: Compiling
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>>> use default
>>>>>>>> 18/04/21 16:38:17 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>>> 18/04/21 16:38:17 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>>> called
>>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>>>>>>>> 18/04/21 16:38:18 INFO DataNucleus.Persistence: Property
>>>>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>>>>> 18/04/21 16:38:18 INFO metastore.ObjectStore: Setting MetaStore object
>>>>>>> pin classes with
>>>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>>> underlying DB is MYSQL
>>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added admin role in
>>>>>>> metastore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: Added public role in
>>>>>>> metastore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: No user is added in
>>>>>>> admin role, since config is empty
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_all_functions
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr cmd=get_all_functions
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database:
>>>>>>>> default
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr cmd=get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Semantic Analysis Completed
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Returning Hive schema:
>>>>>>> Schema(fieldSchemas:null, properties:null)
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed compiling
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>>> Time taken: 3.936 seconds
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Concurrency mode is disabled, not
>>>>>>> creating a lock manager
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Executing
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62):
>>>>>>> use default
>>>>>>>> 18/04/21 16:38:20 INFO sqlstd.SQLStdHiveAccessController: Created
>>>>>>> SQLStdHiveAccessController for session context : HiveAuthzSessionContext
>>>>>>> [sessionString=05096382-f9b6-4dae-aee2-dfa6750c0668, clientType=HIVECLI]
>>>>>>>> 18/04/21 16:38:20 WARN session.SessionState: METASTORE_FILTER_HOOK will
>>>>>>> be ignored, since hive.security.authorization.manager is set to
>>>>>>> instance of
>>>>>>> HiveAuthorizerFactory.
>>>>>>>> 18/04/21 16:38:20 INFO hive.metastore: Mestastore configuration
>>>>>>> hive.metastore.filter.hook changed from
>>>>>>> org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to
>>>>>>> org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Cleaning up thread
>>>>>>> local RawStore...
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr cmd=Cleaning up thread local RawStore...
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Done cleaning up
>>>>>>> thread local RawStore
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Starting task [Stage-0:DDL] in serial
>>>>>>> mode
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database:
>>>>>>>> default
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr cmd=get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: Opening raw store
>>>>>>> with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: ObjectStore, initialize
>>>>>>> called
>>>>>>>> 18/04/21 16:38:20 INFO metastore.MetaStoreDirectSql: Using direct SQL,
>>>>>>> underlying DB is MYSQL
>>>>>>>> 18/04/21 16:38:20 INFO metastore.ObjectStore: Initialized ObjectStore
>>>>>>>> 18/04/21 16:38:20 INFO metastore.HiveMetaStore: 0: get_database:
>>>>>>>> default
>>>>>>>> 18/04/21 16:38:20 INFO HiveMetaStore.audit: ugi=kousouda
>>>>>>> ip=unknown-ip-addr cmd=get_database: default
>>>>>>>> 18/04/21 16:38:20 INFO ql.Driver: Completed executing
>>>>>>> command(queryId=kousouda_20180421163816_58c38a44-25e3-4665-8bb5-a9b17fdf2d62);
>>>>>>> Time taken: 0.202 seconds
>>>>>>>> OK
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>