Whether you obtain a read lock depends on the guarantees you want to
make to your readers. Obtaining the lock will do a couple of things
your uses might want:
1) It will prevent DDL statements such as DROP TABLE from removing the
data while they are reading it.
2) It will prevent the compactor from removing the versions of the delta
files they are reading.
The other step you'll want is to heartbeat the lock. To avoid dead
clients holding locks forever the DbLockManager times them out after 300
seconds (default, it's configurable). To avoid this you'll need to call
IMetaStoreClient.heartbeat on a regular basis.
Alan.
Elliot West <mailto:tea...@gmail.com>
April 17, 2015 at 8:05
Hi, I'm working on a Cascading Tap that reads the data that backs a
transactional Hive table. I've successfully utilised the in-built
OrcInputFormat functionality to read and merge the deltas with the
base and optionally pull in the RecordIdentifiers. However, I'm now
considering what other steps I may need to take to collaborate with an
active Hive instance that could be writing to or compacting the table
as I'm trying to read it.
I recently became aware of the need to obtain a list of valid
transaction IDs but now wonder if I must also acquire a read lock for
the table? I'm thinking that the set of interactions for reading this
data may look something like:
1. Obtain ValidTxnList from the meta store:
org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns()
2. Set the ValidTxnList in the Configuration:
conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString());
3. Aquire a read lock:
org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)
4. Use OrcInputFormat to read the data
5. Finally, release the lock:
org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)
Can you advise on whether the lock is needed, whether this is the
correct way of managing the lock, and whether there are any other
steps I need take to appropriately interact with the data underpinning
a 'live' transactional table?
Thanks - Elliot.