Re: Inherited an 18TB DB & need to backup

2020-05-15 Thread Scottix
Also when you get in the multi TB data storage the bill gets a little
harder to digest in S3.

On Fri, May 15, 2020 at 11:49 Andreas 'ads' Scherbaum 
wrote:

>
>
> On Fri, May 15, 2020 at 7:52 PM Ravi Krishna 
> wrote:
>
>>
>> Why should the backup land in S3, and not local somewhere?
>> Any good reason why one should pay for the additional storage and
>> transfer costs?
>>
>> Good question. The key point in my statement was "db of this size".
>>
>> The problem with local backup is that space is not infinite. If your
>> business requires you to
>> store backups for say 7 years, storing it locally will be a problem.  In
>> one large financial
>> company I use to work, full backup was used to store old data.
>> (except last 30 days where WAL logs were used for a real PIT).  We use to
>> store full backups
>> for about 60 days and then send older backup to an off site storage.
>> Nothing is free.
>>
>> I remember a case where we were requested by business to restore a db of
>> a given date two yrs
>> prior as they had to look at old data. It took us close to 96 hrs to give
>> the users the required database.
>>
>> S3 storage is ridiculously cheap.  Off site storage companies like Iron
>> Mountain should find their client base
>> ditching them big time.
>>
>
> If your database is running somewhere in the cloud, then yes, that might
> make
> sense. If your database runs in your own data center, then usually you
> also have
> disk space available there. Plus a transfer out of your data center will
> take time.
>
> There is no "per se" recommendation to move data to S3. And there might be
> additional requirements like data protection laws, encryption requirements
> ect.
>
> --
> Andreas 'ads' Scherbaum
> German PostgreSQL User Group
> European PostgreSQL User Group - Board of Directors
> Volunteer Regional Contact, Germany - PostgreSQL Project
>
-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com


Re: "Go" (lang) standard driver

2020-08-18 Thread Scottix
I have been using pgx and seems to be working well.

I recommend if your just starting to use the v4 version.

On Tue, Aug 18, 2020 at 5:53 AM Stephen Frost  wrote:

> Greetings,
>
> * Olivier Gautherot (ogauthe...@gautherot.net) wrote:
> > Le mar. 18 août 2020 à 09:36, Tony Shelver  a écrit
> :
> > > -- Forwarded message -
> > > From: Tony Shelver 
> > > Date: Tue, 18 Aug 2020 at 09:33
> > > Subject: Re: "Go" (lang) standard driver
> > > To: Edson Richter 
> > >
> > >
> > > A quick Google search found https://github.com/lib/pq.   Has 6.1K
> stars
> > > so I would guess fairly well used.
> >
> > I have used it successfully and it performed perfectly. Go for it.
>
> It also says that it's not actively developed and recommends pgx
> instead...
>
> Thanks,
>
> Stephen
>


-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com


Re: Optimizing Postgresql ILIKE while query

2018-10-23 Thread Scottix
Also leading wildcards can inhibit the use of indexes. Best to try to avoid
LIKE queries similar to '%TERM'

On Mon, Oct 22, 2018 at 12:23 AM Alban Hertroys  wrote:

>
>
> > On 22 Oct 2018, at 7:56, aman gupta  wrote:
> >
> > Issue:
> >
> > We have the base table which contains 22M records and we created a view
> on top of it while querying the view with ILIKE clause it took 44 seconds
> and with LIKE Clause 20 Seconds
> >
> > Query:
> >
> > fm_db_custom_db=# EXPLAIN (ANALYZE, TIMING OFF)
> > select
> destination,hostname,inputfilename,inputtime,logicalservername,outputfilename,outputtime,processinglink,source,totalinputbytes,totalinputcdrs,totaloutputbytes,totaloutputcdrs
> from mmsuper.test_20m_view  where inputfilename ilike
> '%SDPOUTPUTCDR_4001_BLSDP09_ADM_4997_18-10-15-02549.ASN%';
>
> Perhaps, when you have a question about timing, you shouldn't turn off the
> timing in the query plan? Now we can't see where the time is spent.
>
> > 
>
> That's all sequential scans that each remove a significant amount of rows.
> That probably costs a significant amount of time to do.
>
> It looks like you don't have any indices on the underlying table(s) at
> all. I'd start there and then look at the ILIKE problem again. By that
> time, Pavel's suggestion for a trigram index on that text field is probably
> spot-on.
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>
>


Optimizing Database High CPU

2019-02-27 Thread Scottix
Hi we are running a Postgresql Database 9.4.18 and we are noticing a
high CPU usage. Nothing is critical at the moment but if we were to
scale up more of what we are doing, I feel we are going to run into
issues.

It is a 2 x 6 core machine, 128GB ram, Raid 10 HDD

The iostat metrics for the HDD look minimal < 10% util
Available memory seems to be good.

The CPU utilization is what bothering me
user  5-7%
sys50-70% - seems high
wa<0.5%

So trying to troubleshoot possible high cpu:
Number of concurrent connections averages 50 to 100 - seems high
although we max at 200.
No long running queries
Streaming replication to backup server
High update tables - we have about 4 tables that have a high volume of updates

High update rate is what I am thinking is causing the issue and I
found possibly setting fillfactor to a lower default which the
internet says you need to do a vacuum full which I am trying to avoid
but if it needs to be done we can schedule it. Just want to make sure
if I am chasing the correct rabbit hole.

Are there any statistics I could run to see if a setting change would help.

Best,
Scott

--
T: @Thaumion
IG: Thaumion
scot...@gmail.com



Re: Optimizing Database High CPU

2019-02-28 Thread Scottix
Alright will try the upgrade.

> Is it a few transactions updating a lot of rows each, or many transactions 
> updating a few rows each?
It is a lot of transaction updating a few rows.

Then will look into a connection pooler.

Thanks for the response.

On Wed, Feb 27, 2019 at 2:01 PM Michael Lewis  wrote:
>>
>> If those 50-100 connections are all active at once, yes, that is high.  They 
>> can easily spend more time fighting each other over LWLocks, spinlocks, or 
>> cachelines rather than doing useful work.  This can be exacerbated when you 
>> have multiple sockets rather than all cores in a single socket.  And these 
>> problems are likely to present as high Sys times.
>>
>> Perhaps you can put up a connection pooler which will allow 100 connections 
>> to all think they are connected at once, but forces only 12 or so to 
>> actually be active at one time, making the others transparently queue.
>
>
> Can you expound on this or refer me to someplace to read up on this?
>
> Context, I don't want to thread jack though: I think I am seeing similar 
> behavior in our environment at times with queries that normally take seconds 
> taking 5+ minutes at times of high load. I see many queries showing 
> buffer_mapping as the LwLock type in snapshots but don't know if that may be 
> expected. In our environment PgBouncer will accept several hundred 
> connections and allow up to 100 at a time to be active on the database which 
> are VMs with ~16 CPUs allocated (some more, some less, multi-tenant and 
> manually sharded). It sounds like you are advocating for connection max very 
> close to the number of cores. I'd like to better understand the pros/cons of 
> that decision.



-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com



Re: Optimizing Database High CPU

2019-05-11 Thread Scottix
Hey,
So I finally found the culprit. Turns out to be the THP fighting with
itself.

After running on Ubuntu
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

It instantly went from a loadavg of 30 to 3

Also make sure you re-enable on reboot.

Anyway just wanted to give a followup on the issue incase anyone else is
having the same problem.

On Mon, Mar 4, 2019 at 12:03 PM Jeff Janes  wrote:

> On Wed, Feb 27, 2019 at 5:01 PM Michael Lewis  wrote:
>
>> If those 50-100 connections are all active at once, yes, that is high.
>>> They can easily spend more time fighting each other over LWLocks,
>>> spinlocks, or cachelines rather than doing useful work.  This can be
>>> exacerbated when you have multiple sockets rather than all cores in a
>>> single socket.  And these problems are likely to present as high Sys times.
>>>
>>> Perhaps you can put up a connection pooler which will allow 100
>>> connections to all think they are connected at once, but forces only 12 or
>>> so to actually be active at one time, making the others transparently queue.
>>>
>>
>> Can you expound on this or refer me to someplace to read up on this?
>>
>
> Just based on my own experimentation.  This is not a blanket
> recommendation,  but specific to the situation that we already suspect
> there is contention, and the server is too old to have 
> pg_stat_actvity.wait_event
> column.
>
>
>> Context, I don't want to thread jack though: I think I am seeing similar
>> behavior in our environment at times with queries that normally take
>> seconds taking 5+ minutes at times of high load. I see many queries showing
>> buffer_mapping as the LwLock type in snapshots but don't know if that may
>> be expected.
>>
>
> It sounds like your processes are fighting to reserve buffers in
> shared_buffers in which to read data pages.  But those data pages are
> probably already in the OS page cache, otherwise reading it from disk would
> be slow enough that you would be seeing some type of IO wait, or buffer_io,
> rather than buffer_mapping as the dominant wait type.  So I think that
> means you have most of your data in RAM, but not enough of it in
> shared_buffers.  You might be in a rare situation where setting
> shared_buffers to a high fraction of RAM, rather than the usual low
> fraction, is called for.  Increasing NUM_BUFFER_PARTITIONS might also be
> useful, but that requires a recompilation of the server.  But do these
> spikes correlate with anything known at the application level?  A change in
> the mix of queries, or a long report or maintenance operation?  Maybe the
> query plans briefly toggle over to using seq scans rather than index scans
> or vice versa, which drastically changes the block access patterns?
>
>
>> In our environment PgBouncer will accept several hundred connections and
>> allow up to 100 at a time to be active on the database which are VMs with
>> ~16 CPUs allocated (some more, some less, multi-tenant and manually
>> sharded). It sounds like you are advocating for connection max very close
>> to the number of cores. I'd like to better understand the pros/cons of that
>> decision.
>>
>
> There are good reasons to allow more than that.  For example, your
> application holds some transactions open briefly while it does some
> cogitation on the application-side, rather than immediately committing and
> so returning the connection to the connection pool.  Or your server has a
> very high IO capacity and benefits from lots of read requests in the queue
> at the same time, so it can keep every spindle busy and every rotation
> productive.  But, if you have no reason to believe that any of those
> situations apply to you, but do have evidence that you have lock contention
> between processes, then I think that limiting the number active processes
> to the number of cores is a good starting point.
>
> Cheers,
>
> Jeff
>


-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com


Re: The tragedy of SQL

2021-09-14 Thread Scottix
It is kind of a purists fallacy. Point being if you could just write ASM
code it would be the best.

When in reality, a database is used not because it is the best technical
database, but is used by many people. Something that other developers can
pickup and use without reading a 200 page manual and study for a year on
end. Although maybe stuff would be better if everyone did that, on the
other hand might just be wasted effort.

You complain about no-SQL database but actually then advocate for it, by
saying SQL is sad. I find Postgres as a traditional RDB and has specific
use cases. If you compare that to Clickhouse which has a very different use
case. Don't compare timeseriesdb, because even that has limitations that
clickhouse surpasses at scale. Just an example of a no-SQL database.

If you do start a new database, let me know. I would like to see that in
action.

On Tue, Sep 14, 2021 at 9:20 AM Rob Sargent  wrote:

> On 9/14/21 10:10 AM, Michael Nolan wrote:
>
> I started programming in 1967, and over the last 50+ years I've programmed
> in more languages than I would want to list.  I spent a decade writing in
> FORTRAN on a GA 18/30 (essentially a clone of the IBM 1130) with limited
> memory space, so you had to write EFFICIENT code, something that is a bit
> of a lost art these days.  I also spent a decade writing in COBOL.
>
> I've not found many tasks that I couldn't find a way to write in whatever
> language I had available to write it in.  There may be bad (or at least
> inefficient) languages, but there are lots of bad programmers.
> --
> Mike Nolan
> htf...@gmail.com
>
> OK, I'm maybe responsible for this thread turning into a diatribe.  I
> shouted at OP 'cause he shouted at us. My mistake, and I apologize.
> I'm probably closer to Mike's "bad programmers" than I would care to admit
> but fully believe software is a "people problem" more than most of us
> realize.
>


-- 
scot...@gmail.com