Re: Inherited an 18TB DB & need to backup
Also when you get in the multi TB data storage the bill gets a little harder to digest in S3. On Fri, May 15, 2020 at 11:49 Andreas 'ads' Scherbaum wrote: > > > On Fri, May 15, 2020 at 7:52 PM Ravi Krishna > wrote: > >> >> Why should the backup land in S3, and not local somewhere? >> Any good reason why one should pay for the additional storage and >> transfer costs? >> >> Good question. The key point in my statement was "db of this size". >> >> The problem with local backup is that space is not infinite. If your >> business requires you to >> store backups for say 7 years, storing it locally will be a problem. In >> one large financial >> company I use to work, full backup was used to store old data. >> (except last 30 days where WAL logs were used for a real PIT). We use to >> store full backups >> for about 60 days and then send older backup to an off site storage. >> Nothing is free. >> >> I remember a case where we were requested by business to restore a db of >> a given date two yrs >> prior as they had to look at old data. It took us close to 96 hrs to give >> the users the required database. >> >> S3 storage is ridiculously cheap. Off site storage companies like Iron >> Mountain should find their client base >> ditching them big time. >> > > If your database is running somewhere in the cloud, then yes, that might > make > sense. If your database runs in your own data center, then usually you > also have > disk space available there. Plus a transfer out of your data center will > take time. > > There is no "per se" recommendation to move data to S3. And there might be > additional requirements like data protection laws, encryption requirements > ect. > > -- > Andreas 'ads' Scherbaum > German PostgreSQL User Group > European PostgreSQL User Group - Board of Directors > Volunteer Regional Contact, Germany - PostgreSQL Project > -- T: @Thaumion IG: Thaumion scot...@gmail.com
Re: "Go" (lang) standard driver
I have been using pgx and seems to be working well. I recommend if your just starting to use the v4 version. On Tue, Aug 18, 2020 at 5:53 AM Stephen Frost wrote: > Greetings, > > * Olivier Gautherot (ogauthe...@gautherot.net) wrote: > > Le mar. 18 août 2020 à 09:36, Tony Shelver a écrit > : > > > -- Forwarded message - > > > From: Tony Shelver > > > Date: Tue, 18 Aug 2020 at 09:33 > > > Subject: Re: "Go" (lang) standard driver > > > To: Edson Richter > > > > > > > > > A quick Google search found https://github.com/lib/pq. Has 6.1K > stars > > > so I would guess fairly well used. > > > > I have used it successfully and it performed perfectly. Go for it. > > It also says that it's not actively developed and recommends pgx > instead... > > Thanks, > > Stephen > -- T: @Thaumion IG: Thaumion scot...@gmail.com
Re: Optimizing Postgresql ILIKE while query
Also leading wildcards can inhibit the use of indexes. Best to try to avoid LIKE queries similar to '%TERM' On Mon, Oct 22, 2018 at 12:23 AM Alban Hertroys wrote: > > > > On 22 Oct 2018, at 7:56, aman gupta wrote: > > > > Issue: > > > > We have the base table which contains 22M records and we created a view > on top of it while querying the view with ILIKE clause it took 44 seconds > and with LIKE Clause 20 Seconds > > > > Query: > > > > fm_db_custom_db=# EXPLAIN (ANALYZE, TIMING OFF) > > select > destination,hostname,inputfilename,inputtime,logicalservername,outputfilename,outputtime,processinglink,source,totalinputbytes,totalinputcdrs,totaloutputbytes,totaloutputcdrs > from mmsuper.test_20m_view where inputfilename ilike > '%SDPOUTPUTCDR_4001_BLSDP09_ADM_4997_18-10-15-02549.ASN%'; > > Perhaps, when you have a question about timing, you shouldn't turn off the > timing in the query plan? Now we can't see where the time is spent. > > > > > That's all sequential scans that each remove a significant amount of rows. > That probably costs a significant amount of time to do. > > It looks like you don't have any indices on the underlying table(s) at > all. I'd start there and then look at the ILIKE problem again. By that > time, Pavel's suggestion for a trigram index on that text field is probably > spot-on. > > Alban Hertroys > -- > If you can't see the forest for the trees, > cut the trees and you'll find there is no forest. > > >
Optimizing Database High CPU
Hi we are running a Postgresql Database 9.4.18 and we are noticing a high CPU usage. Nothing is critical at the moment but if we were to scale up more of what we are doing, I feel we are going to run into issues. It is a 2 x 6 core machine, 128GB ram, Raid 10 HDD The iostat metrics for the HDD look minimal < 10% util Available memory seems to be good. The CPU utilization is what bothering me user 5-7% sys50-70% - seems high wa<0.5% So trying to troubleshoot possible high cpu: Number of concurrent connections averages 50 to 100 - seems high although we max at 200. No long running queries Streaming replication to backup server High update tables - we have about 4 tables that have a high volume of updates High update rate is what I am thinking is causing the issue and I found possibly setting fillfactor to a lower default which the internet says you need to do a vacuum full which I am trying to avoid but if it needs to be done we can schedule it. Just want to make sure if I am chasing the correct rabbit hole. Are there any statistics I could run to see if a setting change would help. Best, Scott -- T: @Thaumion IG: Thaumion scot...@gmail.com
Re: Optimizing Database High CPU
Alright will try the upgrade. > Is it a few transactions updating a lot of rows each, or many transactions > updating a few rows each? It is a lot of transaction updating a few rows. Then will look into a connection pooler. Thanks for the response. On Wed, Feb 27, 2019 at 2:01 PM Michael Lewis wrote: >> >> If those 50-100 connections are all active at once, yes, that is high. They >> can easily spend more time fighting each other over LWLocks, spinlocks, or >> cachelines rather than doing useful work. This can be exacerbated when you >> have multiple sockets rather than all cores in a single socket. And these >> problems are likely to present as high Sys times. >> >> Perhaps you can put up a connection pooler which will allow 100 connections >> to all think they are connected at once, but forces only 12 or so to >> actually be active at one time, making the others transparently queue. > > > Can you expound on this or refer me to someplace to read up on this? > > Context, I don't want to thread jack though: I think I am seeing similar > behavior in our environment at times with queries that normally take seconds > taking 5+ minutes at times of high load. I see many queries showing > buffer_mapping as the LwLock type in snapshots but don't know if that may be > expected. In our environment PgBouncer will accept several hundred > connections and allow up to 100 at a time to be active on the database which > are VMs with ~16 CPUs allocated (some more, some less, multi-tenant and > manually sharded). It sounds like you are advocating for connection max very > close to the number of cores. I'd like to better understand the pros/cons of > that decision. -- T: @Thaumion IG: Thaumion scot...@gmail.com
Re: Optimizing Database High CPU
Hey, So I finally found the culprit. Turns out to be the THP fighting with itself. After running on Ubuntu echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag It instantly went from a loadavg of 30 to 3 Also make sure you re-enable on reboot. Anyway just wanted to give a followup on the issue incase anyone else is having the same problem. On Mon, Mar 4, 2019 at 12:03 PM Jeff Janes wrote: > On Wed, Feb 27, 2019 at 5:01 PM Michael Lewis wrote: > >> If those 50-100 connections are all active at once, yes, that is high. >>> They can easily spend more time fighting each other over LWLocks, >>> spinlocks, or cachelines rather than doing useful work. This can be >>> exacerbated when you have multiple sockets rather than all cores in a >>> single socket. And these problems are likely to present as high Sys times. >>> >>> Perhaps you can put up a connection pooler which will allow 100 >>> connections to all think they are connected at once, but forces only 12 or >>> so to actually be active at one time, making the others transparently queue. >>> >> >> Can you expound on this or refer me to someplace to read up on this? >> > > Just based on my own experimentation. This is not a blanket > recommendation, but specific to the situation that we already suspect > there is contention, and the server is too old to have > pg_stat_actvity.wait_event > column. > > >> Context, I don't want to thread jack though: I think I am seeing similar >> behavior in our environment at times with queries that normally take >> seconds taking 5+ minutes at times of high load. I see many queries showing >> buffer_mapping as the LwLock type in snapshots but don't know if that may >> be expected. >> > > It sounds like your processes are fighting to reserve buffers in > shared_buffers in which to read data pages. But those data pages are > probably already in the OS page cache, otherwise reading it from disk would > be slow enough that you would be seeing some type of IO wait, or buffer_io, > rather than buffer_mapping as the dominant wait type. So I think that > means you have most of your data in RAM, but not enough of it in > shared_buffers. You might be in a rare situation where setting > shared_buffers to a high fraction of RAM, rather than the usual low > fraction, is called for. Increasing NUM_BUFFER_PARTITIONS might also be > useful, but that requires a recompilation of the server. But do these > spikes correlate with anything known at the application level? A change in > the mix of queries, or a long report or maintenance operation? Maybe the > query plans briefly toggle over to using seq scans rather than index scans > or vice versa, which drastically changes the block access patterns? > > >> In our environment PgBouncer will accept several hundred connections and >> allow up to 100 at a time to be active on the database which are VMs with >> ~16 CPUs allocated (some more, some less, multi-tenant and manually >> sharded). It sounds like you are advocating for connection max very close >> to the number of cores. I'd like to better understand the pros/cons of that >> decision. >> > > There are good reasons to allow more than that. For example, your > application holds some transactions open briefly while it does some > cogitation on the application-side, rather than immediately committing and > so returning the connection to the connection pool. Or your server has a > very high IO capacity and benefits from lots of read requests in the queue > at the same time, so it can keep every spindle busy and every rotation > productive. But, if you have no reason to believe that any of those > situations apply to you, but do have evidence that you have lock contention > between processes, then I think that limiting the number active processes > to the number of cores is a good starting point. > > Cheers, > > Jeff > -- T: @Thaumion IG: Thaumion scot...@gmail.com
Re: The tragedy of SQL
It is kind of a purists fallacy. Point being if you could just write ASM code it would be the best. When in reality, a database is used not because it is the best technical database, but is used by many people. Something that other developers can pickup and use without reading a 200 page manual and study for a year on end. Although maybe stuff would be better if everyone did that, on the other hand might just be wasted effort. You complain about no-SQL database but actually then advocate for it, by saying SQL is sad. I find Postgres as a traditional RDB and has specific use cases. If you compare that to Clickhouse which has a very different use case. Don't compare timeseriesdb, because even that has limitations that clickhouse surpasses at scale. Just an example of a no-SQL database. If you do start a new database, let me know. I would like to see that in action. On Tue, Sep 14, 2021 at 9:20 AM Rob Sargent wrote: > On 9/14/21 10:10 AM, Michael Nolan wrote: > > I started programming in 1967, and over the last 50+ years I've programmed > in more languages than I would want to list. I spent a decade writing in > FORTRAN on a GA 18/30 (essentially a clone of the IBM 1130) with limited > memory space, so you had to write EFFICIENT code, something that is a bit > of a lost art these days. I also spent a decade writing in COBOL. > > I've not found many tasks that I couldn't find a way to write in whatever > language I had available to write it in. There may be bad (or at least > inefficient) languages, but there are lots of bad programmers. > -- > Mike Nolan > htf...@gmail.com > > OK, I'm maybe responsible for this thread turning into a diatribe. I > shouted at OP 'cause he shouted at us. My mistake, and I apologize. > I'm probably closer to Mike's "bad programmers" than I would care to admit > but fully believe software is a "people problem" more than most of us > realize. > -- scot...@gmail.com