Re: A proposal for read-only external table for cloud-native Hive deployment

2019-08-06 Thread Thai Bui
Hello all -- This thread is old but I just wanted to get an update with newer information and not spam the dev forum with too much information. To recap: My previous discussion was about proposing read-only transaction support for Hive using external tables. This could be supported using insert-on

Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-26 Thread Thai Bui
> > My suggestion does require a change to your ETL process, but it doesn't > require you to copy the data into HDFS or to create storage clusters. Hive > managed tables can reside in S3 with no problem. Thanks for pointing this out. I totally forget that managed tables could have a location ext

Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-25 Thread Gopal Vijayaraghavan
>reuse the transactional_properties and add 'read_only' as a new value. With >read-only tables, all INSERT, UPDATE, DELETE statements will fail at Hive >front-end. This is actually a common ask when it comes to OnPrem -> Cloud REPL streams, to avoid diverging. The replicated data ha

Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-25 Thread Alan Gates
My suggestion does require a change to your ETL process, but it doesn't require you to copy the data into HDFS or to create storage clusters. Hive managed tables can reside in S3 with no problem. Alan. On Thu, Apr 25, 2019 at 2:18 PM Thai Bui wrote: > Your suggested workflow will work and it w

Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-25 Thread Thai Bui
Your suggested workflow will work and it would require us to re-ETL data from S3 to all over the place to multiple clusters. This is a cumbersome approach since most of our data reside on S3 and clusters are somewhat transient in nature (in the order of a few months for a redeployment & don't have

Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-24 Thread Alan Gates
Would a workflow like the following work then: 1. Non-Hive tool produces data 2. Do a Hive load into a managed table. This effectively takes a snapshot of the data. 3. Now you still have the data for Non-Hive tools to operate on, and in Hive you get all the Hive 3 goodness. This would introduce a

Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-24 Thread Thai Bui
As I understand, read-only ACID tables only work if your table is a managed table (so you'll have to create your table with CREATE TABLE .. TBLPROPERTIES ('transactional_properties'='insert_only') ) and Hive will control the data layout. Unfortunately, in my case, I'm concerned with external table

Re: A proposal for read-only external table for cloud-native Hive deployment

2019-04-24 Thread Alan Gates
Have you looked at the insert only ACID tables in Hive 3 ( https://issues.apache.org/jira/browse/HIVE-14535 )? These were designed specifically with the cloud in mind, since the way Hive traditionally adds new data doesn't work well in the cloud. And they do not require ORC, they work with any fi