How about disable kerberos and give a try? The time cost looks like waiting for 
something to timeout, for example DNS reverse lookup.

________________________________
From: Alan Gates <alanfga...@gmail.com>
Sent: Friday, May 17, 2019 8:05:04 AM
To: user@hive.apache.org
Subject: Re: Any HIVE DDL statement takes minutes to execute

[Warning]: This email originated from an external source. Do not open links or 
attachments unless you know the content is safe.
[경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지 마십시오.

I can think of two things that could take a long time in creating a table, 
database operations or file system operations.  The perf timers inside the 
metastore only measure the entire metadata operation, not the file part and the 
db part, so it will be hard to tell where the time is being spent.  When a 
table is first created the metastore prints a debug message to the logs that 
says "create_table" (you have to having logging set to DEBUG to see this).  
This will tell you when the metastore started processing the create table.  
Between creating the directory for the table and connecting to the RDBMS to 
create the entry for it, the createtime for the table is set.  A describe table 
extended should show you the create time of the table (or you can directly 
query the TBLS table in the RDBMS to find it as well).  Finally, when the 
metastore is done creating the table there is another entry in the log that 
starts with "create_table".  All three of these timestamps are generated on the 
same machine, so clock syncing won't be an issue.  These three timestamps 
should give you an idea of whether the majority of the time is being spent 
creating the directory or creating an entry in the database.

Which logs you need to look in to find the debug statements depends on whether 
you have a separate Hive Metastore Thrift service running or you have Hive 
Server2 directly communicating with the RDBMS.

Alan.

On Thu, May 16, 2019 at 1:42 AM Iulian Mongescu 
<iulian.monge...@grx.ro<mailto:iulian.monge...@grx.ro>> wrote:

Hi Mich,



First thank  you for taking the time to look over this problem. Now regarding 
the questions :



  1.  I can confirm there are no locks on metastore DB ;
  2.  About duration of the queries, in my previous mail I just gave some 
examples and I  can confirm that I run those queries on the metastore db server 
and also from the hive node that I’m using to test and the results are similar, 
almost instant response on all queries;
  3.  And yes, this apply only on DDL statements and is constant problem, not a 
random delay;
  4.  Regarding the network communication blocking, there is no firewall or a 
network performance issue between hive node and metastore db. As I said at the 
previous point, I run all queries also manually using mysql cmd client from the 
hive node and the response was almost instant;



Thank you,

Iulian



From: Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>>
Sent: Thursday, May 16, 2019 11:20 AM
To: user <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: Any HIVE DDL statement takes minutes to execute




Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw<https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fprofile%2Fview%3Fid%3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw&data=02%7C01%7Cpfeng%40coupang.com%7C2206927930314b82742608d6da5b60b4%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C636936483336458998&sdata=1YdUBCcyHDkJCSYsszG012O1coPtA4GMXIzah1LxpvM%3D&reserved=0>



http://talebzadehmich.wordpress.com<https://kor01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftalebzadehmich.wordpress.com&data=02%7C01%7Cpfeng%40coupang.com%7C2206927930314b82742608d6da5b60b4%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C636936483336458998&sdata=M%2BopViIO8TSkYqj3LPjLDj4fpj4bnsOyCKCQvSVb%2FMc%3D&reserved=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



I don't know much about MySQL but assuming it has tools to see the activity in 
the back end, what locks are you seeing in the database itself plus the 
duration of time that the command is executed on RDBMS etc.



Does this only apply to the DDL statements?



It is either some locking/blocking in the back end or the network connection 
between your Hadoop and the RDBMS causing the issue



I just tested DDL for external table in Hive through Oracle database and there 
was no issue.



HTH





On Thu, 16 May 2019 at 08:16, Iulian Mongescu 
<iulian.monge...@grx.ro<mailto:iulian.monge...@grx.ro>> wrote:

Hi Alan,



I’m using MySQL (Mariadb) for the metastore and I was thinking on this 
possibility too but from all my tests on metastore database that I run, every 
query is almost instant.

For example :

SELECT * FROM `TBLS`  ->  Query took 0.0001 seconds.

INSERT INTO `TBLS` ->  Query took 0.0020 seconds

DELETE FROM `TBLS` -> Query took 0.0021 seconds



Thank you,

Iulian



From: Alan Gates <alanfga...@gmail.com<mailto:alanfga...@gmail.com>>
Sent: Wednesday, May 15, 2019 9:51 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Any HIVE DDL statement takes minutes to execute



What are you using as the RDBMS for your metastore?  A first place I'd look is 
if the communications with the RDBMS are slow for some reason.



Alan.



On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu 
<iulian.monge...@grx.ro<mailto:iulian.monge...@grx.ro>> wrote:

Hello,



I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a problem 
with hive as any DDL statement that I run takes minutes to execute but any DML 
run in normal limits. I checked the logs but I didn’t find anything that seems 
related with this problem and I would appreciate any help to debug this issue.



Please find bellow some examples with DDL&DML queries and their durations:



------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 
(...) STORED AS ORC LOCATION 
'/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (184.191 seconds)

------------------------------------

0: jdbc:hive2://hdpx03:10000/> show tables;

+-----------+--+

| tab_name |

+-----------+--+

| agenti1 |

+-----------+--+

1 row selected (0.358 seconds)

---------------------------------

0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where 1;

INFO : Tez session hasn't been created yet. Opening session

INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)

INFO : Status: Running (Executing on YARN cluster with App id 
application_1552674174918_0002)



--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s

--------------------------------------------------------------------------------

+--------+--+

| total |

+--------+--+

| 1960 |

+--------+--+

1 row selected (15.853 seconds)



-------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> drop table agenti1;

No rows affected (184.164 seconds)

--------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 
(...) STORED AS ORC LOCATION 
'/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (190.288 seconds)



Thanks,



Iulian


Reply via email to