We extracted data from Oracle to csv first, already convert schema objects from 
Oracle to Postgresql too.   Then use COPY from csv to Postgres.
The point is about the 2 options to how to make the data load fast, pg_dump 
only used to dump metadata in Postgres to rebuild index and recreate 
constraints.
  The questions is  instead of drop index and create index, we check update 
pg_index set indisready=false and reindex again after load.

From: Jeff Janes <jeff.ja...@gmail.com>
Sent: Sunday, June 19, 2022 4:01 AM
To: James Pang (chaolpan) <chaol...@cisco.com>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: reindex option for tuning load large data



On Fri, Jun 17, 2022 at 1:34 AM James Pang (chaolpan) 
<chaol...@cisco.com<mailto:chaol...@cisco.com>> wrote:
Hi ,

  We plan to migrate large database from Oracle to Postgres(version 13.6, OS 
Redhat8 Enterprise), we are checking options to make data load in Postgres 
fast. Data volume is about several TB,  thousands of indexes,  many large table 
with partitions.  We want to make data load running fast and avoid miss any 
indexes when reindexing. There are 2 options about reindex. Could you give some 
suggestions about the 2 options, which option is better.


  1.  Create tables and indexes( empty database) ,   update pg_index set 
indisready=false and inisvalid=false,  then  load data use COPY from csv ,  
then reindex table …

Where did this idea come from?  This is likely to destroy your database.

2).  Use pg_dump to dump meta data only , then copy “CREATE INDEX … sql “
        Drop indexes before data load
       After data load, increase max_parallel_maintenance_workers, 
maintenance_work_mem

       Run CREATE INDEX … sql  to leverage parallel create index feature.

pg_dump doesn't run against Oracle, so where is the thing you are running 
pg_dump against coming from?

If you already have a fleshed out schema in PostgreSQL, you should dump the 
sections separately (with --section=pre-data and --section=post-data) to get 
the commands to build the objects which should be run before and after the data 
is loaded.

Cheers,

Jeff

Reply via email to