[ 
https://issues.apache.org/jira/browse/HIVE-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823504#comment-16823504
 ] 

Todd Lipcon commented on HIVE-21506:
------------------------------------

bq.  My understanding is that we are not yet blocked by the concurrency checks 
when acquiring locks, but the bottleneck is simply the number of HMS/RDBMS 
calls implementing that.

Agreed with that, and the general idea that we should understand the workload. 
That said, I don't know that we need a specific workload to agree on the 
central observation that most queries against Hive are read-only, given our 
focus on warehousing and datamart applications (Hive isn't an OLTP database by 
any stretch). I did a spot check on the ratio of DML to read-only queries in 
some customer profile datasets I have, and they range from a 300:1 ratio for 
some customers down to about a 1:1 ratio. Average is 7:1. 

> Memory based TxnHandler implementation
> --------------------------------------
>
>                 Key: HIVE-21506
>                 URL: https://issues.apache.org/jira/browse/HIVE-21506
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions
>            Reporter: Peter Vary
>            Priority: Major
>
> The current TxnHandler implementations are using the backend RDBMS to store 
> every Hive lock and transaction data, so multiple TxnHandler instances can 
> run simultaneously and can serve requests. The continuous 
> communication/locking done on the RDBMS side puts serious load on the backend 
> databases also restricts the possible throughput.
> If it is possible to have only a single active TxnHandler (with the current 
> design HMS) instance then we can provide much better (using only java based 
> locking) performance. We still have to store the committed write transactions 
> to the RDBMS (or later some other persistent storage), but other lock and 
> transaction operations could remain memory only.
> The most important drawbacks with this solution is that we definitely lose 
> scalability when one instance of TxnHandler is no longer able to serve the 
> requests (see NameNode), and fault tolerance in the sense that the ongoing 
> transactions should be terminated when the TxnHandler is failed. If this 
> drawbacks are acceptable in certain situations the we can provide better 
> throughput for the users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to