Hello all -- This thread is old but I just wanted to get an update with
newer information and not spam the dev forum with too much information.
To recap: My previous discussion was about proposing read-only transaction
support for Hive using external tables. This could be supported using
insert-on
>
> My suggestion does require a change to your ETL process, but it doesn't
> require you to copy the data into HDFS or to create storage clusters. Hive
> managed tables can reside in S3 with no problem.
Thanks for pointing this out. I totally forget that managed tables could
have a location ext
>reuse the transactional_properties and add 'read_only' as a new value. With
>read-only tables, all INSERT, UPDATE, DELETE statements will fail at Hive
>front-end.
This is actually a common ask when it comes to OnPrem -> Cloud REPL streams, to
avoid diverging.
The replicated data ha
My suggestion does require a change to your ETL process, but it doesn't
require you to copy the data into HDFS or to create storage clusters. Hive
managed tables can reside in S3 with no problem.
Alan.
On Thu, Apr 25, 2019 at 2:18 PM Thai Bui wrote:
> Your suggested workflow will work and it w
Your suggested workflow will work and it would require us to re-ETL data
from S3 to all over the place to multiple clusters. This is a cumbersome
approach since most of our data reside on S3 and clusters are somewhat
transient in nature (in the order of a few months for a redeployment &
don't have
Would a workflow like the following work then:
1. Non-Hive tool produces data
2. Do a Hive load into a managed table. This effectively takes a snapshot
of the data.
3. Now you still have the data for Non-Hive tools to operate on, and in
Hive you get all the Hive 3 goodness.
This would introduce a
As I understand, read-only ACID tables only work if your table is a managed
table (so you'll have to create your table with CREATE TABLE
.. TBLPROPERTIES ('transactional_properties'='insert_only') ) and Hive will
control the data layout.
Unfortunately, in my case, I'm concerned with external table
Have you looked at the insert only ACID tables in Hive 3 (
https://issues.apache.org/jira/browse/HIVE-14535 )? These were designed
specifically with the cloud in mind, since the way Hive traditionally adds
new data doesn't work well in the cloud. And they do not require ORC, they
work with any fi