> > My suggestion does require a change to your ETL process, but it doesn't > require you to copy the data into HDFS or to create storage clusters. Hive > managed tables can reside in S3 with no problem.
Thanks for pointing this out. I totally forget that managed tables could have a location externally specified. I think we can cope with this approach in the short-term but in the long-term, a more ETL-less approach is much more preferable with read-only transactional support for external tables. Mainly to avoid duplicate copies of data. This is actually a common ask when it comes to OnPrem -> Cloud REPL > streams, to avoid diverging. > The replicated data having its own updates is very problematic for CDC > style ACID replication into the cloud. It's a common problem when the pattern is replicating data everywhere and the users (such as analysts) don't know its full implications, which we are trying to avoid in the first place. But sometime, it's unavoidable if you are going on-prem -> cloud. With ACID support for read-only tables though, we'll give the users an option to "try it out" before fully commit to an ETL process to copy/optimize the data. On Thu, Apr 25, 2019 at 4:54 PM Gopal Vijayaraghavan <gop...@apache.org> wrote: > > reuse the transactional_properties and add 'read_only' as a new > value. With > > read-only tables, all INSERT, UPDATE, DELETE statements will fail at > Hive > > front-end. > > This is actually a common ask when it comes to OnPrem -> Cloud REPL > streams, to avoid diverging. > > The replicated data having its own updates is very problematic for CDC > style ACID replication into the cloud. > > Ranger authorization works great for this, though it is all-or-nothing > right now. > > At some point in the future, I wish I could lock up specific fields from > being updated in ACID. > > Cheers, > Gopal > > > -- Thai