Thanks Alan for the clarifications. Hive has made such improvements it has lost its old friends in the process. Hope one day all the friends speak together again: pig, spark, presto read/write ACID together.
On Sat, Mar 09, 2019 at 02:23:48PM -0800, Alan Gates wrote: > There's only been one significant change in ACID that requires different > implementations. In ACID v1 delta files contained inserts, updates, and > deletes. In ACID v2 delta files are split so that inserts are placed in one > file, deletes in another, and updates are an insert plus a delete. This > change > was put into Hive 3, so you have to upgrade your ACID tables when upgrading > from Hive 2 to 3. > > You can see info on ACID v1 at > https://cwiki.apache.org/confluence/display/Hive > /Hive+Transactions > > You can get a start understanding ACID v2 with https://issues.apache.org/jira/ > browse/HIVE-14035 This has design documents. I don't guarantee the > implementation completely matches the design, but you can at least get an idea > of the intent and follow the JIRA stream from there to see what was > implemented. > > Alan. > > On Sat, Mar 9, 2019 at 3:25 AM Nicolas Paris <nicolas.pa...@riseup.net> wrote: > > Hi, > > > The issue is that outside readers don't understand which records in > > the delta files are valid and which are not. Theoretically all this > > is possible, as outside clients could get the valid transaction list > > from the metastore and then read the files, but no one has done this > > work. > > I guess each hive version (1,2,3) differ in how they manage delta files > isn't ? This means pig or spark need to implement 3 different ways of > dealing with hive. > > Is there any documentation that would help a developper to implement > those specific connectors ? > > Thanks > > > On Wed, Mar 06, 2019 at 09:51:51AM -0800, Alan Gates wrote: > > Pig is in the same place as Spark, that the tables need to be compacted > first. > > The issue is that outside readers don't understand which records in the > delta > > files are valid and which are not. > > > > Theoretically all this is possible, as outside clients could get the > valid > > transaction list from the metastore and then read the files, but no one > has > > done this work. > > > > Alan. > > > > On Wed, Mar 6, 2019 at 8:28 AM Abhishek Gupta <abhila...@gmail.com> > wrote: > > > > Hi, > > > > Does Hive ACID tables for Hive version 1.2 posses the capability of > being > > read into Apache Pig using HCatLoader or Spark using SQLContext. > > For Spark, it seems it is only possible to read ACID tables if the > table is > > fully compacted i.e no delta folders exist in any partition. Details > in the > > following JIRA > > > > https://issues.apache.org/jira/browse/SPARK-15348, https:// > > issues.apache.org/jira/browse/SPARK-15348 > > > > However I wanted to know if it is supported at all in Apache Pig to > read > > ACID tables in Hive > > > > -- > nicolas > -- nicolas