Hi Eugene,

the TBL_ID in TBLS table is handled by Datanucleus, so AUTO_INCREMENT won't
help, since the TBL_ID is not defined as AUTO_INCREMENT.

Datanucleus uses SEQUENCE_TABLE to store the actual value for primary keys.
In this table this two rows is what you need to modify:

org.apache.hadoop.hive.metastore.model.MDatabase
org.apache.hadoop.hive.metastore.model.MTable

e.g:
update SEQUENCE_TABLE set NEXT_VAL = 10000  where
SEQUENCE_NAME='org.apache.hadoop.hive.metastore.model.MTable';
and do it for org.apache.hadoop.hive.metastore.model.Database as well.

After that if you create a table the TBL_ID will be used from this value.
Datanucleus uses caching (default 10) so maybe the next tables will still
use the old value. Try to create 10 simple table like this:

create table test1 (i int);
...
create table test10 (i int);
and then drop them and check the TBL_ID.

*Before doing this I recommend to create a backup from the Metastore DB!!*

Also check this:
https://community.cloudera.com/t5/Support-Questions/How-to-migrate-Hive-Table-From-one-cluster-to-another/m-p/235145

Regards,

Zoltan Ratkai

On Sun, Oct 22, 2023 at 5:39 PM Eugene Miretsky <eug...@badal.io> wrote:

> Hey!
>
> Looking for a way to control the ids (DB_ID and TABLE_ID) of newly
> created  databases and tables.
>
> We have a somewhat complicated use case where we replicate the metastore
> (and data) from a source Hive cluster to a target cluster. However new
> tables can be added on both source and target. We need a way to avoid
> unique Id collision. One way would be to make sure all databases/tables
> created in the target Hive start from a higher Id.
>
> We have tried to set AUTO_INCREAMENT='10000' on a metastore MySQL db, but
> it doesn't work. This makes us think the Id is generated by the Metastore
> code itself, but we cannot find the right place in the code, or if it is
> possible to control the logic.
>
> Any advice would be appreciated.
>
> Cheers,
>

Reply via email to