Hi, I'm working on a Cascading Tap that reads the data that backs a
transactional Hive table. I've successfully utilised the in-built
OrcInputFormat functionality to read and merge the deltas with the base and
optionally pull in the RecordIdentifiers. However, I'm now considering what
other steps I may need to take to collaborate with an active Hive instance
that could be writing to or compacting the table as I'm trying to read it.

I recently became aware of the need to obtain a list of valid transaction
IDs but now wonder if I must also acquire a read lock for the table? I'm
thinking that the set of interactions for reading this data may look
something like:


   1. Obtain ValidTxnList from the meta store:
   org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns()

   2. Set the ValidTxnList in the Configuration:
   conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString());

   3. Aquire a read lock:
   org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)

   4. Use OrcInputFormat to read the data

   5. Finally, release the lock:
   org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)


Can you advise on whether the lock is needed, whether this is the correct
way of managing the lock, and whether there are any other steps I need take
to appropriately interact with the data underpinning a 'live' transactional
table?

Thanks - Elliot.

Reply via email to