Hi Alan,
Are write_ids monotonically increasing? Are write_ids accessible in the hive query? For e.g.: select * from table_name where write_id > N; Basically I am trying to understand if I can use write_id to consume only updated rows. Store the maximum write_id(X) seen in the result and next time query for all rows with row_id greater than X. Thanks, Bhargav On Fri, May 17, 2019 at 10:37 PM Alan Gates <alanfga...@gmail.com> wrote: > Sorry, looks like you sent this earlier and I missed it. > > A couple of things. One, write_id is per transaction per table. So for > table T, all rows written in w1 will have the same write_id, though they > will each have their own monotonically increasing row_ids. Row_ids are > scoped by a write_id, so if both w1 and w2 insert a 100 rows, w1 would have > write_id 1, and row_ids 0-99 while w2's rows would have write_id 2 and > row_ids 0-99. > > Two, If w1 and w2 both attempted to update or delete (not insert) records > from the same partition of table T, then w1 would fail at commit time > because it would see that w2 had already committed and there's a possible > conflict. This avoids lost updates and deleted records magically > reappearing. > > Alan. > > On Fri, May 17, 2019 at 4:44 AM Bhargav Bipinchandra Naik (Seller > Platform-BLR) <bhargav.n...@flipkart.com> wrote: > >> Is the following scenario supported? >> >> *timestamp:* t1 < t2 < t3 < t4 < t5 < t6 >> >> *w1 -* transaction which updates subset of rows in table T {start_time: >> t1, end_time: t5} >> *w2 -* transaction which updates subset of rows in table T {start_time: >> t2, end_time: t3} >> *r1 - *job which reads rows from table T {start_time: t4} >> *r2 - *job which reads rows from table T {start_time: t6} >> >> - Is the write_id strictly increasing number across rows? >> - Is the write_id a version number per row and not a global construct? >> - Will the subset of rows updated by w1 have write_ids greater than >> write_ids of row updated by w2? >> >> Say if job r1 consumed the data at t4 had maximum write_id 100. >> Will rows updated by job w1 (end_time: t5) always have write_id > 100? >> >> Basically I need some kind of checkpoint using which the next run of the >> read job can read only the data updated since the checkpoint. >> >> Thanks, >> -Bhargav >> >> >> >> >>