On Thu, Sep 5, 2024 at 4:14 PM Lok P <loknath...@gmail.com> wrote: > Hi, > > We are having a requirement to create approx 50 billion rows in a > partition table(~1 billion rows per partition, 200+gb size daily > partitions) for a performance test. We are currently using ' insert into > <target table_partition> select.. From <source_table_partition> or <some > transformed query>;' method . We have dropped all indexes and constraints > First and then doing the load. Still it's taking 2-3 hours to populate one > partition. >
At three hours, that's 92,593 records/second. Seems pretty slow. How much of that time is taken by <some transformed query>? How big are the records? How fast is the hardware? Is there a faster way to achieve this? > Testing is the only way to know for sure. > Few teammate suggesting to use copy command and use file load instead, > which will be faster. So I wanted to understand, how different things it > does behind the scenes as compared to insert as select command? As because > it only deals with sql engine only. > COPY is highly optimized for buffered operation. INSERT... maybe not so much. But if the source data is already in a table, that would require piping the data to stdout and then back into the database. psql appdb -c "COPY (SELECT ...) TO STDOUT;" | psql appdb -c "COPY some_table FROM STDOUT;". Use binary mode, so text conversion isn't required. Maybe that's faster, maybe not. Additionally, when we were trying to create indexes post data load on one > partition, it took 30+ minutes. Any possible way to make it faster? > > Is there any way to drive the above things in parallel by utilizing full > database resources? > Put the destination tables in a different tablespace on a different controller. > It's postgres 15.4 > Why not 15.8? -- Death to America, and butter sauce. Iraq lobster!