The other issue is an external system has no ability to control when the compactor is run (it rewrites deltas into the base files and thus erases intermediate states that would interest you). The mapping of writeids (table specific) to transaction ids (system wide) is also cleaned intermittently, again erasing history. And there's no way to get the mapping from writeids to transaction ids from outside of Hive.
Alan. On Mon, May 6, 2019 at 6:23 AM Bhargav Bipinchandra Naik (Seller Platform-BLR) <bhargav.n...@flipkart.com> wrote: > We have a scenario where we want to consume only delta updates from Hive > tables. > - Multiple producers are updating data in Hive table > - Multiple consumer reading data from the Hive table > > Consumption pattern: > - Get all data that has been updated since last time I read. > > Is there any mechanism in Hive 3.0 which can enable above consumption > pattern? > > I see there is a construct of row__id(writeid, bucketid, rowid) in ACID > tables. > - Can row__id be used in this scenario? > - How is the "writeid" generated? > - Is there some meta information which captures the time when the rows > were actually visible for read? >