Sorry, looks like you sent this earlier and I missed it. A couple of things. One, write_id is per transaction per table. So for table T, all rows written in w1 will have the same write_id, though they will each have their own monotonically increasing row_ids. Row_ids are scoped by a write_id, so if both w1 and w2 insert a 100 rows, w1 would have write_id 1, and row_ids 0-99 while w2's rows would have write_id 2 and row_ids 0-99.
Two, If w1 and w2 both attempted to update or delete (not insert) records from the same partition of table T, then w1 would fail at commit time because it would see that w2 had already committed and there's a possible conflict. This avoids lost updates and deleted records magically reappearing. Alan. On Fri, May 17, 2019 at 4:44 AM Bhargav Bipinchandra Naik (Seller Platform-BLR) <bhargav.n...@flipkart.com> wrote: > Is the following scenario supported? > > *timestamp:* t1 < t2 < t3 < t4 < t5 < t6 > > *w1 -* transaction which updates subset of rows in table T {start_time: > t1, end_time: t5} > *w2 -* transaction which updates subset of rows in table T {start_time: > t2, end_time: t3} > *r1 - *job which reads rows from table T {start_time: t4} > *r2 - *job which reads rows from table T {start_time: t6} > > - Is the write_id strictly increasing number across rows? > - Is the write_id a version number per row and not a global construct? > - Will the subset of rows updated by w1 have write_ids greater than > write_ids of row updated by w2? > > Say if job r1 consumed the data at t4 had maximum write_id 100. > Will rows updated by job w1 (end_time: t5) always have write_id > 100? > > Basically I need some kind of checkpoint using which the next run of the > read job can read only the data updated since the checkpoint. > > Thanks, > -Bhargav > > > > >