Re: Read Hive ACID tables in Spark or Pig

Nicolas Paris Sat, 09 Mar 2019 03:25:22 -0800

Hi,

> The issue is that outside readers don't understand which records in
> the delta files are valid and which are not. Theoretically all this
> is possible, as outside clients could get the valid transaction list
> from the metastore and then read the files, but no one has done this
> work.


I guess each hive version (1,2,3) differ in how they manage delta files
isn't ? This means pig or spark need to implement 3 different ways of
dealing with hive.

Is there any documentation that would help a developper to implement
those specific connectors ?

Thanks


On Wed, Mar 06, 2019 at 09:51:51AM -0800, Alan Gates wrote:
> Pig is in the same place as Spark, that the tables need to be compacted 
> first. 
> The issue is that outside readers don't understand which records in the delta
> files are valid and which are not.
> 
> Theoretically all this is possible, as outside clients could get the valid
> transaction list from the metastore and then read the files, but no one has
> done this work.
> 
> Alan.
> 
> On Wed, Mar 6, 2019 at 8:28 AM Abhishek Gupta <abhila...@gmail.com> wrote:
> 
>     Hi,
> 
>     Does Hive ACID tables for Hive version 1.2 posses the capability of being
>     read into Apache Pig using HCatLoader or Spark using SQLContext.
>     For Spark, it seems it is only possible to read ACID tables if the table 
> is
>     fully compacted i.e no delta folders exist in any partition. Details in 
> the
>     following JIRA
> 
>     https://issues.apache.org/jira/browse/SPARK-15348, https://
>     issues.apache.org/jira/browse/SPARK-15348
> 
>     However I wanted to know if it is supported at all in Apache Pig to read
>     ACID tables in Hive
> 

-- 
nicolas

Re: Read Hive ACID tables in Spark or Pig

Reply via email to