There's only been one significant change in ACID that requires different implementations. In ACID v1 delta files contained inserts, updates, and deletes. In ACID v2 delta files are split so that inserts are placed in one file, deletes in another, and updates are an insert plus a delete. This change was put into Hive 3, so you have to upgrade your ACID tables when upgrading from Hive 2 to 3.
You can see info on ACID v1 at https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions You can get a start understanding ACID v2 with https://issues.apache.org/jira/browse/HIVE-14035 This has design documents. I don't guarantee the implementation completely matches the design, but you can at least get an idea of the intent and follow the JIRA stream from there to see what was implemented. Alan. On Sat, Mar 9, 2019 at 3:25 AM Nicolas Paris <nicolas.pa...@riseup.net> wrote: > Hi, > > > The issue is that outside readers don't understand which records in > > the delta files are valid and which are not. Theoretically all this > > is possible, as outside clients could get the valid transaction list > > from the metastore and then read the files, but no one has done this > > work. > > I guess each hive version (1,2,3) differ in how they manage delta files > isn't ? This means pig or spark need to implement 3 different ways of > dealing with hive. > > Is there any documentation that would help a developper to implement > those specific connectors ? > > Thanks > > > On Wed, Mar 06, 2019 at 09:51:51AM -0800, Alan Gates wrote: > > Pig is in the same place as Spark, that the tables need to be compacted > first. > > The issue is that outside readers don't understand which records in the > delta > > files are valid and which are not. > > > > Theoretically all this is possible, as outside clients could get the > valid > > transaction list from the metastore and then read the files, but no one > has > > done this work. > > > > Alan. > > > > On Wed, Mar 6, 2019 at 8:28 AM Abhishek Gupta <abhila...@gmail.com> > wrote: > > > > Hi, > > > > Does Hive ACID tables for Hive version 1.2 posses the capability of > being > > read into Apache Pig using HCatLoader or Spark using SQLContext. > > For Spark, it seems it is only possible to read ACID tables if the > table is > > fully compacted i.e no delta folders exist in any partition. Details > in the > > following JIRA > > > > https://issues.apache.org/jira/browse/SPARK-15348, https:// > > issues.apache.org/jira/browse/SPARK-15348 > > > > However I wanted to know if it is supported at all in Apache Pig to > read > > ACID tables in Hive > > > > -- > nicolas >