Re: doubt about locking mechanism in Hive

Edward Capriolo Tue, 09 Sep 2014 07:50:03 -0700

We use our own library, simple constructions like files in hdfs that work
like pid/lock files. a file like /flags/tablea/process1 could mean "hey i'm
working on table a leave it alone".  Accomplishes the exact same thing with
less fuss, it is also much easier for an external process/scheduler/shell
script to integrate with this system. I doubt many use hive locking as flow
control for a scheduling system.


On Tue, Sep 9, 2014 at 3:25 AM, wzc <wzc1...@gmail.com> wrote:

> Hi,
> We also encounter this in hive 0.13 , we need to enable concurrency  in
> daily ETL workflows (to avoid sub etl start to read parent etl 's output
> while it's still running).
> We found that in hive 0.13 sometime when you open hive cli shell it would
> output the msg "conflicting lock present for default mode EXCLUSIVE" and
> wait for some locks to be released. We haven't  encounter this in hive 0.11
> and are still trying to figure it out.
>
>
>
> 2014-08-25 15:21 GMT+08:00 Sourygna Luangsay <sluang...@pragsis.com>:
>
>>  Many thanks Edward for this complete answer.
>>
>>
>>
>> So the main idea is to simply disable concurrency in Hive if I get you.
>>
>>
>>
>> My doubt now is: is it something most Hive users do as default?
>>
>> Can somebody else share its own experience?
>>
>>
>>
>> Regards,
>>
>>
>>
>> *Sourygna Luangsay*
>>
>>
>>
>> *From:* Edward Capriolo [mailto:edlinuxg...@gmail.com]
>> *Sent:* viernes, 22 de agosto de 2014 16:07
>> *To:* user@hive.apache.org
>> *Subject:* Re: doubt about locking mechanism in Hive
>>
>>
>>
>> IMHO locking support should be turned off by default. I would argue if
>> you are requiring this feature often you may be designing your systems
>> improperly.
>>
>> You really should not have that many situations where you need locking in
>> a write (mostly) once file system. The only time I have ever used it is if
>> I had a process completely re-writing the contents of a table and I needed
>> downstream things not to select from this table when it was in an
>> inconsistent state. Having it on by default is a bad idea. You have pointed
>> out a case where doing a simple select query attempts to acquire locks it
>> does not need. That puts strain on more systems and creates more changes
>> for issues.
>>
>>
>>
>> One of the big design philosophy issues I tend to have with hive lately
>> is we have this pool of users (like myself) that use hive for its original
>> purpose. To query write once text files, and create aggregations.
>>
>> Then there are other groups attempting to implement very complicated
>> semantics around streaming, transactions, locking, whatever. Then you have
>> tools like cloudera manager giving configution warnings such as:
>>
>> " Hive: Hive is not configured with ZooKeeper Service. As a result,
>> hive-site will not contain hive.zookeeper.quorum, which can lead to
>> corruption in concurrency scenarios."
>>
>> I think this statement is incorrect AND is BAD advice.  Then users such
>> as yourself making a conclusion like "I should turn on locking" because no
>> one would ever assume that ....
>>
>> !!!SELECTING 1 ROW FROM A TABLE WOULD CAUSE 1100 LOCKS TO BE ACQUIRED!!!!
>>
>> ::rant over:: I am not saying that hive locking is bad, but I am saying I
>> leave it off and turn it on when I need it on a per query basis.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Aug 22, 2014 at 8:48 AM, Sourygna Luangsay <sluang...@pragsis.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I have some troubles with the locking/concurrency mechanism of Hive when
>> doing a large select and trying to create a table at the same time.
>>
>> My version of Hive is 0.13.
>>
>>
>>
>> What I try to do is the following:
>>
>>
>>
>> 1)      In a hive shell:
>> use mydatabase;
>> select * from competence limit 1;     # this table has 1100 partitions.
>> So with hive.support.concurrency=true, it needs at least 90s to execute (I
>> know, this is a silly query: I should rather do a select * where “a
>> partition”… The purpose of this query is to replicate easily the problem by
>> having a query that needs a lot of time to execute)
>>
>>
>>
>> 2)      In another hive shell, meanwhile the 1st query is executing:
>> use mydatabase;
>> create table probsourygna (foo string) ROW FORMAT DELIMITED FIELDS
>> TERMINATED BY '\t'  STORED AS TEXTFILE ;
>>
>> The problem is that the “create table” does not execute untill the first
>> query (select) has finished.
>>
>> And we can see messages of the following type:
>>
>> conflicting lock present for mydatabase mode EXCLUSIVE
>>
>> conflicting lock present for mydatabase mode EXCLUSIVE
>>
>> …
>>
>>
>>
>> (1 line every 60 s)
>>
>>
>>
>>
>>
>> It seems to me that the first query puts a shared lock at the database
>> (mydatabase) level.
>>
>> Then, the second query tries to acquire an exclusive lock at the database
>> level (fails and retries every 60s).
>>
>>
>>
>> Am I right? (when I look at the documentation
>> https://cwiki.apache.org/confluence/display/Hive/Locking , it says
>> nothing about locks at a database level)
>>
>> Is there any solution to my problem? (avoiding a long “select” to block a
>> “create” query, without removing the concurrency of Hive)
>>
>>
>>
>> Regards,
>>
>>
>>
>> *Sourygna Luangsay*
>>
>>
>> AVISO CONFIDENCIAL
>> Este correo y la información contenida o adjunta al mismo es privada y
>> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
>> informa a quien pueda haber recibido este correo por error que contiene
>> información confidencial cuyo uso, copia, reproducción o distribución estÃ¡
>> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
>> este correo por error, le rogamos lo ponga en conocimiento del emisor y
>> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningÃºn
>> modo.
>> CONFIDENTIALITY WARNING.
>> This message and the information contained in or attached to it are
>> private and confidential and intended exclusively for the addressee.
>> Pragsis informs to whom it may receive it in error that it contains
>> privileged information and its use, copy, reproduction or distribution is
>> prohibited. If you are not an intended recipient of this E-mail, please
>> notify the sender, delete it and do not read, act upon, print, disclose,
>> copy, retain or redistribute any portion of this E-mail.
>>
>>
>>
>> AVISO CONFIDENCIAL
>> Este correo y la información contenida o adjunta al mismo es privada y
>> confidencial y va dirigida exclusivamente a su destinatario. Pragsis
>> informa a quien pueda haber recibido este correo por error que contiene
>> información confidencial cuyo uso, copia, reproducción o distribución está
>> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe
>> este correo por error, le rogamos lo ponga en conocimiento del emisor y
>> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún
>> modo.
>> CONFIDENTIALITY WARNING.
>> This message and the information contained in or attached to it are
>> private and confidential and intended exclusively for the addressee.
>> Pragsis informs to whom it may receive it in error that it contains
>> privileged information and its use, copy, reproduction or distribution is
>> prohibited. If you are not an intended recipient of this E-mail, please
>> notify the sender, delete it and do not read, act upon, print, disclose,
>> copy, retain or redistribute any portion of this E-mail.
>>
>
>

Re: doubt about locking mechanism in Hive

Reply via email to