On Mon, Jul 13, 2020 at 11:20 AM Sebastian Dressler <sebast...@swarm64.com> wrote:
> Hi Henrique, > > On 13. Jul 2020, at 16:23, Henrique Montenegro <typ...@gmail.com> wrote: > > [...] > > * Insert the data from the `users` table into the `users_no_dups` table > > ``` > insert into users_no_dups ( > created_ts, > user_id, > name, > url > ) ( > select > created_ts, > user_id, > name, > url > from > users > ) on conflict do nothing > ``` > > > How do you check contraints here? Is this enforced with UK/PK? > The Unique Key is supposed to to the constraint enforcing here. The `users` table will have data that is duplicate and the maximum number of records on it is 1 million. Then I just try to insert it into the `users_no_dups` table with the `on conflict do nothing` to ignore the duplicates and discard them. > Running the above loop worked fine for about 12 hours. Each file was taking > about 30 seconds to be processed. About 4 seconds to create the `users` > table > and have the CSV data loaded into it and anything between 20 and 30 > seconds to > insert the data from `users` into `users_no_dups`. > > > Do you see anything suspicious in the logs, i.e. something in the realms > of running out of transaction IDs? > I set the log to debug1. I haven't seen anything that called my attention, but I am not really sure what to look for, so perhaps I missed it. Any suggestions on what to look for or any specific log configuration to do? > > [...] > > > Recreating the table now isn't really providing any improvements. I tried > recreating it with a `fillfactor` of `10`, but it was taking too long and > too > much space (the table had 300GB with the fillfactor set to 30; with it set > to > 10 it went up to almost 1TB). > > > To me it sounds like the UK/PK is getting too much to write. A possible > solution could be to start partitioning the table. > I thought about partitioning it, but I can't figure out on what. The `user_id` column is a number that is somewhat random so I don't know what kinds of range I would use for it. I will try to look at the values again and see if there is something that I could perhaps use as a range. Any other suggestions? > > [...] > ``` > ssl = off > shared_buffers = 8GB > work_mem = 12GB > maintenance_work_mem = 12GB > max_stack_depth = 4MB > synchronous_commit = off > wal_writer_flush_after = 128MB > max_wal_size = 32GB > min_wal_size = 80MB > effective_cache_size = 96GB > ``` > > > Another suggestion would be to increase the min_wal_size here, but since > you use UNLOGGED tables it does not matter much. > > > Information about the machine: > > ``` > Processor: 2x Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz (6 cores each, 12 > threads) > RAM: 256GB > > > Disk1: 2TB SSD SATA-3 Samsung Evo 860 > Disk2: 6TB HDD SATA-3 Seagate Exos Enterprise 7200RPM > Disk3: 8TB HDD SATA-3 Seagate Exos Enterprise 7200RPM > > Disk1 and Disk2 are configured as a single logical volume. > > > Just curious: does that mean you mix up SSD + HDD? > Yeah, I did that. Probably not very smart of me. I plan on undoing it soon. I assumed that is not what is causing my issue since the tablespace where the table is stored is on `Disk3` which is not part of the Logical Volume. > > Cheers, > Sebastian > > > -- > > Sebastian Dressler, Solution Architect > +49 30 994 0496 72 | sebast...@swarm64.com > > Swarm64 AS > Parkveien 41 B | 0258 Oslo | Norway > Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787 > CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender > (Styrets Leder): Dr. Sverre Munck > > Swarm64 AS Zweigstelle Hive > Ullsteinstr. 120 | 12109 Berlin | Germany > Registered at Amtsgericht Charlottenburg - HRB 154382 B > > Thanks! Henrique