On Mon, Jul 13, 2020 at 12:50 PM Sebastian Dressler <sebast...@swarm64.com> wrote:
> Hi Henrique, > > On 13. Jul 2020, at 18:42, Henrique Montenegro <typ...@gmail.com> wrote: > > On Mon, Jul 13, 2020 at 11:20 AM Sebastian Dressler <sebast...@swarm64.com> > wrote: > > >> Running the above loop worked fine for about 12 hours. Each file was >> taking >> about 30 seconds to be processed. About 4 seconds to create the `users` >> table >> and have the CSV data loaded into it and anything between 20 and 30 >> seconds to >> insert the data from `users` into `users_no_dups`. >> >> >> Do you see anything suspicious in the logs, i.e. something in the realms >> of running out of transaction IDs? >> > > I set the log to debug1. I haven't seen anything that called my attention, > but I am not really sure what to look for, so perhaps I missed it. Any > suggestions on what to look for or any specific log configuration to do? > > > Not necessarily, if you'd run out of tx IDs you would notice that cleary, > I guess. I also think that this is not the issue. > > > >> >> [...] >> >> >> Recreating the table now isn't really providing any improvements. I tried >> recreating it with a `fillfactor` of `10`, but it was taking too long and >> too >> much space (the table had 300GB with the fillfactor set to 30; with it >> set to >> 10 it went up to almost 1TB). >> >> >> To me it sounds like the UK/PK is getting too much to write. A possible >> solution could be to start partitioning the table. >> > > I thought about partitioning it, but I can't figure out on what. The > `user_id` column is a number that is somewhat random so I don't know what > kinds of range I would use for it. I will try to look at the values again > and see if there is something that I could perhaps use as a range. Any > other suggestions? > > > Depending on granularity, maybe partition on `created_ts`? > I could give it a try. The reason I didn't try that yet was that I thought that since the UK is on the `user_id` column it wouldn't give me any benefit, but I can't really justify why I was thinking that. I would assume that the constraint would be validated against the index and not the whole table, so this might work. I will give it a try. Thanks! Henrique > > Cheers, > Sebastian > > -- > > Sebastian Dressler, Solution Architect > +49 30 994 0496 72 | sebast...@swarm64.com > > Swarm64 AS > Parkveien 41 B | 0258 Oslo | Norway > Registered at Brønnøysundregistrene in Norway under Org.-Number 911 662 787 > CEO/Geschäftsführer (Daglig Leder): Thomas Richter; Chairman/Vorsitzender > (Styrets Leder): Dr. Sverre Munck > > Swarm64 AS Zweigstelle Hive > Ullsteinstr. 120 | 12109 Berlin | Germany > Registered at Amtsgericht Charlottenburg - HRB 154382 B > >