The other issue is an external system has no ability to control when the
compactor is run (it rewrites deltas into the base files and thus erases
intermediate states that would interest you).  The mapping of writeids
(table specific) to transaction ids (system wide) is also cleaned
intermittently, again erasing history.  And there's no way to get the
mapping from writeids to transaction ids from outside of Hive.

Alan.

On Mon, May 6, 2019 at 6:23 AM Bhargav Bipinchandra Naik (Seller
Platform-BLR) <bhargav.n...@flipkart.com> wrote:

> We have a scenario where we want to consume only delta updates from Hive
> tables.
> - Multiple producers are updating data in Hive table
> - Multiple consumer reading data from the Hive table
>
> Consumption pattern:
> - Get all data that has been updated since last time I read.
>
> Is there any mechanism in Hive 3.0 which can enable above consumption
> pattern?
>
> I see there is a construct of row__id(writeid, bucketid, rowid) in ACID
> tables.
> - Can row__id be used in this scenario?
> - How is the "writeid" generated?
> - Is there some meta information which captures the time when the rows
> were actually visible for read?
>

Reply via email to