On 1 July 2014 20:20, Tom Lane wrote:
> Jeff Frost writes:
>>> On Jun 30, 2014, at 4:04 PM, Tom Lane wrote:
Did you check whether the locks were all on temp tables of the
ON COMMIT DROP persuasion?
>
>> And indeed it did catch up overnight and the lag increased shortly after a
>> corr
On 1 July 2014 20:20, Tom Lane wrote:
> I don't plan to work on this myself, but perhaps someone with more
> motivation will want to run with these ideas.
I was planning to work on improving performance of replication apply
over the summer, mid July - Aug, so I'll add this to the list.
--
Sim
Andres Freund writes:
> On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
>> It seems like there are three, not mutually exclusive, ways we might
>> address this:
> But I think more importantly it's probably quite possible to hit a
> similar problem without ON COMMIT DROP relations. Say DISCARD TEMP
On 2014-07-01 15:20:37 -0400, Tom Lane wrote:
> Jeff Frost writes:
> >> On Jun 30, 2014, at 4:04 PM, Tom Lane wrote:
> >>> Did you check whether the locks were all on temp tables of the
> >>> ON COMMIT DROP persuasion?
>
> > And indeed it did catch up overnight and the lag increased shortly afte
Jeff Frost writes:
>> On Jun 30, 2014, at 4:04 PM, Tom Lane wrote:
>>> Did you check whether the locks were all on temp tables of the
>>> ON COMMIT DROP persuasion?
> And indeed it did catch up overnight and the lag increased shortly after a
> correlating spike in AccessExclusiveLocks that were
On Jun 30, 2014, at 4:57 PM, Jeff Frost wrote:
>
> On Jun 30, 2014, at 4:04 PM, Tom Lane wrote:
>
>> Ah ... that's more like a number I can believe something would have
>> trouble coping with. Did you see a noticeable slowdown with this?
>> Now that we've seen that number, of course it's poss
On Jun 30, 2014, at 4:04 PM, Tom Lane wrote:
> Ah ... that's more like a number I can believe something would have
> trouble coping with. Did you see a noticeable slowdown with this?
> Now that we've seen that number, of course it's possible there was an
> even higher peak occurring when you sa
On 2014-06-30 19:04:20 -0400, Tom Lane wrote:
> Jeff Frost writes:
> >>> So it seems like we have a candidate explanation. I'm a bit surprised
> >>> that StandbyReleaseLocks would get this slow if there are only a dozen
> >>> AccessExclusiveLocks in place at any one time, though. Perhaps that
>
Jeff Frost writes:
>>> So it seems like we have a candidate explanation. I'm a bit surprised
>>> that StandbyReleaseLocks would get this slow if there are only a dozen
>>> AccessExclusiveLocks in place at any one time, though. Perhaps that
>>> was a low point and there are often many more?
> Si
On Jun 30, 2014, at 1:46 PM, Jeff Frost wrote:
>> So it seems like we have a candidate explanation. I'm a bit surprised
>> that StandbyReleaseLocks would get this slow if there are only a dozen
>> AccessExclusiveLocks in place at any one time, though. Perhaps that
>> was a low point and there
On Jun 30, 2014, at 1:39 PM, Tom Lane wrote:
>
>
>> Another item of note is the system catalogs are quite bloated:
>> Would that cause the replica to spin on StandbyReleaseLocks?
>
> AFAIK, no. It's an unsurprising consequence of heavy use of short-lived
> temp tables though.
>
Yah, this h
Jeff Frost writes:
> On Jun 30, 2014, at 1:15 PM, Andres Freund wrote:
>> So these are probably relations created in uncommitted
>> transactions. Possibly ON COMMIT DROP temp tables?
> That would make sense. There are definitely quite a few of those being used.
Uh-huh. I doubt that the mechan
On Jun 30, 2014, at 1:15 PM, Andres Freund wrote:
> On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
>>
>> On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira
>> wrote:
>>
>>>
>>> On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost wrote:
>>> And if you go fishing in pg_class for any of the oids, you
On 2014-06-30 12:57:56 -0700, Jeff Frost wrote:
>
> On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira
> wrote:
>
> >
> > On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost wrote:
> > And if you go fishing in pg_class for any of the oids, you don't find
> > anything:
> >
> > That is probably becaus
On Jun 30, 2014, at 12:54 PM, Matheus de Oliveira
wrote:
>
> On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost wrote:
> And if you go fishing in pg_class for any of the oids, you don't find
> anything:
>
> That is probably because you are connected in the wrong database. Once you
> connect to th
On Mon, Jun 30, 2014 at 4:42 PM, Jeff Frost wrote:
> And if you go fishing in pg_class for any of the oids, you don't find
> anything:
That is probably because you are connected in the wrong database. Once you
connect to the database of interest, you don't even need to query pg_class,
just cast
On Jun 30, 2014, at 12:32 PM, Tom Lane wrote:
> Jeff Frost writes:
>> Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:
>
>> mode | count
>> --+---
>> AccessExclusiveLock |11
>> AccessShareLock | 2089
>>
Jeff Frost writes:
> Sampling pg_locks on the primary shows ~50 locks with ExclusiveLock mode:
>mode | count
> --+---
> AccessExclusiveLock |11
> AccessShareLock | 2089
> ExclusiveLock|46
> RowExclusiveLock
On Jun 30, 2014, at 12:17 PM, Jeff Frost wrote:
>>
>> already is quite helpful.
>>
>> What are you doing on that system? Is there anything requiring large
>> amounts of access exclusive locks on the primary? Possibly large amounts
>> of temporary relations?
>
>
> The last time we did a 100% l
On Jun 30, 2014, at 11:39 AM, Andres Freund wrote:
> On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
>> On Jun 30, 2014, at 10:29 AM, Soni M wrote:
>
>>> It is
>>> 96.62% postgres [.] StandbyReleaseLocks
>>> as Jeff said. It runs quite long time, more than 5 minutes i think
>>>
>
2014-06-30 20:34 GMT+02:00 Jeff Frost :
> On Jun 30, 2014, at 10:29 AM, Soni M wrote:
>
>
>
>
> On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund
> wrote:
>
>>
>> My guess it's a spinlock, probably xlogctl->info_lck via
>> RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
>>
On 2014-06-30 11:34:52 -0700, Jeff Frost wrote:
> On Jun 30, 2014, at 10:29 AM, Soni M wrote:
> > It is
> > 96.62% postgres [.] StandbyReleaseLocks
> > as Jeff said. It runs quite long time, more than 5 minutes i think
> >
> > i also use hot standby. we have 4 streaming replica,
On Jun 30, 2014, at 10:29 AM, Soni M wrote:
>
>
>
> On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund wrote:
>
> My guess it's a spinlock, probably xlogctl->info_lck via
> RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
> to show up correctly in profiles...
>
> What wo
On Tue, Jul 1, 2014 at 12:14 AM, Andres Freund
wrote:
>
> My guess it's a spinlock, probably xlogctl->info_lck via
> RecoveryInProgress(). Unfortunately inline assembler doesn't always seem
> to show up correctly in profiles...
>
> What worked for me was to build with -fno-omit-frame-pointer - th
On 2014-06-30 19:14:24 +0300, Heikki Linnakangas wrote:
> On 06/30/2014 05:46 PM, Soni M wrote:
> >Here's what 'perf top' said on streaming replica :
> >
> >Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
> > 95.97% postgres [.] 0x002210f3
>
>
On Jun 30, 2014, at 9:14 AM, Heikki Linnakangas wrote:
> On 06/30/2014 05:46 PM, Soni M wrote:
>> Here's what 'perf top' said on streaming replica :
>>
>> Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
>> 95.97% postgres [.] 0x002210f3
>
>
On 06/30/2014 05:46 PM, Soni M wrote:
Here's what 'perf top' said on streaming replica :
Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
95.97% postgres [.] 0x002210f3
Ok, so it's stuck doing something.. Can you get build with debug symbol
Here's what 'perf top' said on streaming replica :
Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
95.97% postgres [.] 0x002210f3
0.41% perf [.] 0x0005f225
0.39% libc-2.12.so
On 06/29/2014 03:43 PM, Soni M wrote:
top and sar says 100% cpu usage of one core, no sign of I/O wait.
Hmm, I wonder what it's doing then... If you have "perf" installed on
the system, you can do "perf top" to get a quick overlook of where the
CPU time is spent.
- Heikki
--
Sent via pgs
Here's some data from standby (this is when replaying take 44seconds per
wal files):
2014-06-29 00:07:36.513 CDT,,,16682,,53af6f46.412a,44,,2014-06-28 20:43:34
CDT,,0,LOG,0,"restartpoint complete: wrote 63187 buffers (3.0%); 0
transaction log file(s) added, 0 removed, 0 recycled; write=209.170
top and sar says 100% cpu usage of one core, no sign of I/O wait. The
database is 1.5TB in size. RAM in master is 145GB, on slave it's differ,
some has about 16GB another has 145GB also.
nothing suspicious on standby's postgres log.
on master's postgres log :
WARNING,01000,"pgstat wait timeout",,
On 06/29/2014 11:14 AM, Soni M wrote:
Everything works fine until on Thursday we have high load on master, and
after that every streaming replica lag further behind the master. Even on
night and weekend where all server load is low. But the slony slave is OK
at all.
What does 'top' on the stand
32 matches
Mail list logo