We use our own library, simple constructions like files in hdfs that work like pid/lock files. a file like /flags/tablea/process1 could mean "hey i'm working on table a leave it alone". Accomplishes the exact same thing with less fuss, it is also much easier for an external process/scheduler/shell script to integrate with this system. I doubt many use hive locking as flow control for a scheduling system.
On Tue, Sep 9, 2014 at 3:25 AM, wzc <wzc1...@gmail.com> wrote: > Hi, > We also encounter this in hive 0.13 , we need to enable concurrency in > daily ETL workflows (to avoid sub etl start to read parent etl 's output > while it's still running). > We found that in hive 0.13 sometime when you open hive cli shell it would > output the msg "conflicting lock present for default mode EXCLUSIVE" and > wait for some locks to be released. We haven't encounter this in hive 0.11 > and are still trying to figure it out. > > > > 2014-08-25 15:21 GMT+08:00 Sourygna Luangsay <sluang...@pragsis.com>: > >> Many thanks Edward for this complete answer. >> >> >> >> So the main idea is to simply disable concurrency in Hive if I get you. >> >> >> >> My doubt now is: is it something most Hive users do as default? >> >> Can somebody else share its own experience? >> >> >> >> Regards, >> >> >> >> *Sourygna Luangsay* >> >> >> >> *From:* Edward Capriolo [mailto:edlinuxg...@gmail.com] >> *Sent:* viernes, 22 de agosto de 2014 16:07 >> *To:* user@hive.apache.org >> *Subject:* Re: doubt about locking mechanism in Hive >> >> >> >> IMHO locking support should be turned off by default. I would argue if >> you are requiring this feature often you may be designing your systems >> improperly. >> >> You really should not have that many situations where you need locking in >> a write (mostly) once file system. The only time I have ever used it is if >> I had a process completely re-writing the contents of a table and I needed >> downstream things not to select from this table when it was in an >> inconsistent state. Having it on by default is a bad idea. You have pointed >> out a case where doing a simple select query attempts to acquire locks it >> does not need. That puts strain on more systems and creates more changes >> for issues. >> >> >> >> One of the big design philosophy issues I tend to have with hive lately >> is we have this pool of users (like myself) that use hive for its original >> purpose. To query write once text files, and create aggregations. >> >> Then there are other groups attempting to implement very complicated >> semantics around streaming, transactions, locking, whatever. Then you have >> tools like cloudera manager giving configution warnings such as: >> >> " Hive: Hive is not configured with ZooKeeper Service. As a result, >> hive-site will not contain hive.zookeeper.quorum, which can lead to >> corruption in concurrency scenarios." >> >> I think this statement is incorrect AND is BAD advice. Then users such >> as yourself making a conclusion like "I should turn on locking" because no >> one would ever assume that .... >> >> !!!SELECTING 1 ROW FROM A TABLE WOULD CAUSE 1100 LOCKS TO BE ACQUIRED!!!! >> >> ::rant over:: I am not saying that hive locking is bad, but I am saying I >> leave it off and turn it on when I need it on a per query basis. >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Aug 22, 2014 at 8:48 AM, Sourygna Luangsay <sluang...@pragsis.com> >> wrote: >> >> Hi, >> >> >> >> I have some troubles with the locking/concurrency mechanism of Hive when >> doing a large select and trying to create a table at the same time. >> >> My version of Hive is 0.13. >> >> >> >> What I try to do is the following: >> >> >> >> 1) In a hive shell: >> use mydatabase; >> select * from competence limit 1; # this table has 1100 partitions. >> So with hive.support.concurrency=true, it needs at least 90s to execute (I >> know, this is a silly query: I should rather do a select * where “a >> partition”… The purpose of this query is to replicate easily the problem by >> having a query that needs a lot of time to execute) >> >> >> >> 2) In another hive shell, meanwhile the 1st query is executing: >> use mydatabase; >> create table probsourygna (foo string) ROW FORMAT DELIMITED FIELDS >> TERMINATED BY '\t' STORED AS TEXTFILE ; >> >> The problem is that the “create table” does not execute untill the first >> query (select) has finished. >> >> And we can see messages of the following type: >> >> conflicting lock present for mydatabase mode EXCLUSIVE >> >> conflicting lock present for mydatabase mode EXCLUSIVE >> >> … >> >> >> >> (1 line every 60 s) >> >> >> >> >> >> It seems to me that the first query puts a shared lock at the database >> (mydatabase) level. >> >> Then, the second query tries to acquire an exclusive lock at the database >> level (fails and retries every 60s). >> >> >> >> Am I right? (when I look at the documentation >> https://cwiki.apache.org/confluence/display/Hive/Locking , it says >> nothing about locks at a database level) >> >> Is there any solution to my problem? (avoiding a long “select” to block a >> “create” query, without removing the concurrency of Hive) >> >> >> >> Regards, >> >> >> >> *Sourygna Luangsay* >> >> >> AVISO CONFIDENCIAL >> Este correo y la información contenida o adjunta al mismo es privada y >> confidencial y va dirigida exclusivamente a su destinatario. Pragsis >> informa a quien pueda haber recibido este correo por error que contiene >> información confidencial cuyo uso, copia, reproducción o distribución está >> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe >> este correo por error, le rogamos lo ponga en conocimiento del emisor y >> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún >> modo. >> CONFIDENTIALITY WARNING. >> This message and the information contained in or attached to it are >> private and confidential and intended exclusively for the addressee. >> Pragsis informs to whom it may receive it in error that it contains >> privileged information and its use, copy, reproduction or distribution is >> prohibited. If you are not an intended recipient of this E-mail, please >> notify the sender, delete it and do not read, act upon, print, disclose, >> copy, retain or redistribute any portion of this E-mail. >> >> >> >> AVISO CONFIDENCIAL >> Este correo y la información contenida o adjunta al mismo es privada y >> confidencial y va dirigida exclusivamente a su destinatario. Pragsis >> informa a quien pueda haber recibido este correo por error que contiene >> información confidencial cuyo uso, copia, reproducción o distribución está >> expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe >> este correo por error, le rogamos lo ponga en conocimiento del emisor y >> proceda a su eliminación sin copiarlo, imprimirlo o utilizarlo de ningún >> modo. >> CONFIDENTIALITY WARNING. >> This message and the information contained in or attached to it are >> private and confidential and intended exclusively for the addressee. >> Pragsis informs to whom it may receive it in error that it contains >> privileged information and its use, copy, reproduction or distribution is >> prohibited. If you are not an intended recipient of this E-mail, please >> notify the sender, delete it and do not read, act upon, print, disclose, >> copy, retain or redistribute any portion of this E-mail. >> > >