Re: A proposal for read-only external table for cloud-native Hive deployment

Thai Bui Fri, 26 Apr 2019 09:32:13 -0700

>
> My suggestion does require a change to your ETL process, but it doesn't
> require you to copy the data into HDFS or to create storage clusters.  Hive
> managed tables can reside in S3 with no problem.

Thanks for pointing this out. I totally forget that managed tables could
have a location externally specified. I think we can cope with this
approach in the short-term but in the long-term, a more ETL-less approach
is much more preferable with read-only transactional support for external
tables. Mainly to avoid duplicate copies of data.

This is actually a common ask when it comes to OnPrem -> Cloud REPL
> streams, to avoid diverging.
> The replicated data having its own updates is very problematic for CDC
> style ACID replication into the cloud.

It's a common problem when the pattern is replicating data everywhere and
the users (such as analysts) don't know its full implications, which we are
trying to avoid in the first place. But sometime, it's unavoidable if you
are going on-prem -> cloud. With ACID support for read-only tables though,
we'll give the users an option to "try it out" before fully commit to an
ETL process to copy/optimize the data.

On Thu, Apr 25, 2019 at 4:54 PM Gopal Vijayaraghavan <gop...@apache.org>
wrote:

> >    reuse the transactional_properties and add 'read_only' as a new
> value. With
> >    read-only tables, all INSERT, UPDATE, DELETE statements will fail at
> Hive
> >    front-end.
>
> This is actually a common ask when it comes to OnPrem -> Cloud REPL
> streams, to avoid diverging.
>
> The replicated data having its own updates is very problematic for CDC
> style ACID replication into the cloud.
>
> Ranger authorization works great for this, though it is all-or-nothing
> right now.
>
> At some point in the future, I wish I could lock up specific fields from
> being updated in ACID.
>
> Cheers,
> Gopal
>
>
>

-- 
Thai

Re: A proposal for read-only external table for cloud-native Hive deployment

Reply via email to