As I understand, read-only ACID tables only work if your table is a managed
table (so you'll have to create your table with CREATE TABLE
.. TBLPROPERTIES ('transactional_properties'='insert_only') ) and Hive will
control the data layout.

Unfortunately, in my case, I'm concerned with external tables where data is
written by other tools such as Spark, PySpark, Sqoop or older Hive clusters
and Hadoop-based systems to cloud storage such as S3. My wish is to have
materialized views and query result caching work directly on those data if
and only if the table is registered as an external, read-only table in Hive
3 via the same ACID mechanism.

On Wed, Apr 24, 2019 at 3:35 PM Alan Gates <alanfga...@gmail.com> wrote:

> Have you looked at the insert only ACID tables in Hive 3 (
> https://issues.apache.org/jira/browse/HIVE-14535 )?  These were designed
> specifically with the cloud in mind, since the way Hive traditionally adds
> new data doesn't work well in the cloud.  And they do not require ORC, they
> work with any file format.
>
> Alan.
>
> On Wed, Apr 24, 2019 at 12:04 PM Thai Bui <blquyt...@gmail.com> wrote:
>
> > Hello all,
> >
> > Hive 3 has brought significant changes to the community with the support
> > for ACID tables as default managed tables. With ACID tables, we can use
> > features such as materialized views, query result caching for BI tools
> and
> > more. But without ACID tables such as external tables, Hive doesn't
> support
> > any of these advanced features which makes a majority of cloud-native
> users
> > like me sad :(.
> >
> > I propose we should support a more limited version of read-only external
> > tables such that materialized views and query result caching would work.
> > For example:
> >
> > CREATE EXTERNAL TABLE table_name (..) STORED AS ORC
> > LOCATION 's3://some-bucket/some-dir'
> > TBLPROPERTIES ('read-only': "true");
> >
> > In such tables, any data modification operations such as INSERT and
> UPDATE
> > would fail and DDL operations that "add" or "remove" partitions to the
> > table would succeed such as "ALTER TABLE ... ADD PARTITION". This would
> > make it possible for Hive to invalidate the cache and materialized views
> > even when the table is an external table.
> >
> > Let me know what do you guys think and maybe I can start writing a wiki
> > document describing the approach in greater details.
> >
> > Thanks,
> > Thai
> >
>


-- 
Thai

Reply via email to