Hi, I'm working on a Cascading Tap that reads the data that backs a transactional Hive table. I've successfully utilised the in-built OrcInputFormat functionality to read and merge the deltas with the base and optionally pull in the RecordIdentifiers. However, I'm now considering what other steps I may need to take to collaborate with an active Hive instance that could be writing to or compacting the table as I'm trying to read it.
I recently became aware of the need to obtain a list of valid transaction IDs but now wonder if I must also acquire a read lock for the table? I'm thinking that the set of interactions for reading this data may look something like: 1. Obtain ValidTxnList from the meta store: org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns() 2. Set the ValidTxnList in the Configuration: conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString()); 3. Aquire a read lock: org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest) 4. Use OrcInputFormat to read the data 5. Finally, release the lock: org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long) Can you advise on whether the lock is needed, whether this is the correct way of managing the lock, and whether there are any other steps I need take to appropriately interact with the data underpinning a 'live' transactional table? Thanks - Elliot.