Re: Read Hive ACID tables in Spark or Pig

Alan Gates Sat, 09 Mar 2019 14:24:32 -0800

There's only been one significant change in ACID that requires different
implementations.  In ACID v1 delta files contained inserts, updates, and
deletes.  In ACID v2 delta files are split so that inserts are placed in
one file, deletes in another, and updates are an insert plus a delete.
This change was put into Hive 3, so you have to upgrade your ACID tables
when upgrading from Hive 2 to 3.


You can see info on ACID v1 at
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

You can get a start understanding ACID v2 with
https://issues.apache.org/jira/browse/HIVE-14035  This has design
documents.  I don't guarantee the implementation completely matches the
design, but you can at least get an idea of the intent and follow the JIRA
stream from there to see what was implemented.

Alan.

On Sat, Mar 9, 2019 at 3:25 AM Nicolas Paris <nicolas.pa...@riseup.net>
wrote:

> Hi,
>
> > The issue is that outside readers don't understand which records in
> > the delta files are valid and which are not. Theoretically all this
> > is possible, as outside clients could get the valid transaction list
> > from the metastore and then read the files, but no one has done this
> > work.
>
> I guess each hive version (1,2,3) differ in how they manage delta files
> isn't ? This means pig or spark need to implement 3 different ways of
> dealing with hive.
>
> Is there any documentation that would help a developper to implement
> those specific connectors ?
>
> Thanks
>
>
> On Wed, Mar 06, 2019 at 09:51:51AM -0800, Alan Gates wrote:
> > Pig is in the same place as Spark, that the tables need to be compacted
> first.
> > The issue is that outside readers don't understand which records in the
> delta
> > files are valid and which are not.
> >
> > Theoretically all this is possible, as outside clients could get the
> valid
> > transaction list from the metastore and then read the files, but no one
> has
> > done this work.
> >
> > Alan.
> >
> > On Wed, Mar 6, 2019 at 8:28 AM Abhishek Gupta <abhila...@gmail.com>
> wrote:
> >
> >     Hi,
> >
> >     Does Hive ACID tables for Hive version 1.2 posses the capability of
> being
> >     read into Apache Pig using HCatLoader or Spark using SQLContext.
> >     For Spark, it seems it is only possible to read ACID tables if the
> table is
> >     fully compacted i.e no delta folders exist in any partition. Details
> in the
> >     following JIRA
> >
> >     https://issues.apache.org/jira/browse/SPARK-15348, https://
> >     issues.apache.org/jira/browse/SPARK-15348
> >
> >     However I wanted to know if it is supported at all in Apache Pig to
> read
> >     ACID tables in Hive
> >
>
> --
> nicolas
>

Re: Read Hive ACID tables in Spark or Pig

Reply via email to