Hi Valentine,

               I think your issue is related to 
https://issues.apache.org/jira/browse/HIVE-28198 . 
               IMO, some other upstrem engine(Trino/Spark) will also encounter 
this issue. You can try the workaround by setting the property 
metastore.metadata.transformer.class to empty to disable the transformer, and 
then the behavior(create table) of hive4 will be the same as hive3.
|
  <property>
     <name>metastore.metadata.transformer.class/name>
       <value> </value>
  </property>
|




Thanks,
Butao Zhang
---- Replied Message ----
| From | l<mr.tols...@gmail.com> |
| Date | 12/7/2024 18:42 |
| To | <dev@hive.apache.org> |
| Subject | Transition from hive3 to hive4 |
Thank you for developing hive. We have all been waiting for hive4 to
come out for a long time.
We have a huge dwh with hive3, in our assembly we are switching to hive4.
We do not understand how it works, asf advised us to contact you.

A little introduction, we do not have acid tables. We just have
parquet tables that
We create via create table, create external table. We have 2 problem areas:

Hive DB:
Hive3 is just location=hive.warehouse.dir (both managed and unmanaged
tables are stored here)
Hive4 has 2 paths
1 - location=hive.warehouse.external.dir for unmanaged tables?
2 - managelocation=hive.warehouse.dir for managed tables?

If we already have storage on hive3, what should we do?
As far as I understand, we should take the path from hive3
hive.warehouse.dir and write it to managelocation? and create a new
path for location via hive.warehouse.external.dir?

Next we want to understand by what principle and through what ddl
constructions tables are written to the location, managelocation dir.
Let me remind you that we do not have acid tables.

Hive DDL:
We noticed that all our tables that we created became inside create
external table.

The only difference in them is that some have
translated_to_external=true external.table.purge=true
And some tables do not. Those that do not have this
(external.table.purge=true) do we consider them unmanaged?

It does not matter whether we write create table or create external
table inside all ddl show that they create external table and they are
written exclusively to the location dir. We were unable to write the
table to the managelocation dir. When we write create external table
inside there is no external.table.purge=true
And if we write create table inside there is
external.table.purge=true. But there is not a single table, neither
old nor new, that we create, that inside the dll it was create table.
That is, all tables automatically become create external table, even
new ones, although we did not ask for it. + some tables became manage,
but by what principle is also unclear. We are completely confused and
discouraged by the behavior of hive4.

Guys, help us figure it out, we have a petabyte of data and thousands
of users, but we cannot explain to them now how hive4 works.

Valentine Smith
Big Data Solution Architect

Reply via email to