> On May 20, 2020, 3:16 p.m., Peter Varga wrote: > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java > > Lines 686 (patched) > > <https://reviews.apache.org/r/72528/diff/1/?file=2232587#file2232587line686> > > > > I have concerns here, but I am not sure if they are well founded or > > not. I think this will break what the outside world thinks of snapshot > > isolation. I might have a hypothetical client that inserts lots of data in > > a source table and sometimes issue a merge statement from the source to the > > target table. They have some requirement that the target table can not have > > partial data regarding some property. Example they inserting sales data, > > and the target table can not contain half the data of a day, it can either > > have all or none. So what the clients does, it will issue the inserts into > > the source table synchronously ordered by the date and when it gets to a > > next day it issue a merge statement asynchronously and continues to inserts > > the data for the next day synchronously. And it might think that it is save > > to do so, since the merge statement has a snapshot it will not see the data > > inserted afterwards. But with this change it will break. > > It might not be the best example, since how would the client know when > > the snapshot is actually captured. But I am not familiar enough with the > > ecosystem, does anything use the Hive by issuing the compile and run > > separately? Because there you could be sure before this change, that the > > compilation order also meant snapshot order. So summarized, I don't know > > what the outside world excepts of the snapshot isolation. > > Denys Kuzmenko wrote: > insert into source and merge from source into target won't conflict with > each other, they touch different tables. Maybe I missing something here... > > Peter Varga wrote: > My example was not perfect. I don't mean that it will conflict with the > insert into the source table. It can conflict with some other client's > transaction. My main point is, after the conflict is noticed and you > regenerate the snapshot it will starts to read results from transactions that > were opened and committed after the original query was compiled, and I'm just > trying to figure out, what kinf of problems can it cause, if any. In my > example you start to read records inserted later, but what if somebody added > a new partition since the compilation, wouldn't it cause problem? > > Denys Kuzmenko wrote: > probably there might be an issue as we won't create any locks for the > newly created partition, however we'll start reading it. > instead of rollback & retry on Hive side we might consider to just fail > and let the user re-try.
however it still leaves the question what happens now in Hive when somebody adds a new partition (insert with dynamic partitioning) since the compilation (merge insert). I'll test this out. - Denys ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72528/#review220838 ----------------------------------------------------------- On May 19, 2020, 11:19 a.m., Denys Kuzmenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72528/ > ----------------------------------------------------------- > > (Updated May 19, 2020, 11:19 a.m.) > > > Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary. > > > Bugs: HIVE-23503 > https://issues.apache.org/jira/browse/HIVE-23503 > > > Repository: hive-git > > > Description > ------- > > ValidTxnManager doesn't consider txns opened and committed between snapshot > generation and locking when evaluating ValidTxnListState. This cause issues > like duplicate insert in case of concurrent merge insert & insert. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 > ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 > ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java > 0383881acc > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java > 600289f837 > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 8a15b7cc5d > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > 65df9c2ba9 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > 887d4303f4 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java > 312936efa8 > storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java > b8ff03f9c4 > storage-api/src/java/org/apache/hadoop/hive/common/ValidTxnList.java > d4c3b09730 > > > Diff: https://reviews.apache.org/r/72528/diff/1/ > > > Testing > ------- > > DbTxnManager tests. > > > Thanks, > > Denys Kuzmenko > >