Hi,
On 2025-04-15 13:44:09 +0300, Yura Sokolov wrote:
> 15.04.2025 13:00, Andres Freund пишет:
> > 1) Increasing NUM_XLOGINSERT_LOCKS allows more contention on insertpos_lck
> > and
> >spinlocks scale really badly under heavy contention
> >
> > I think we c
ke
NUM_XLOGINSERT_LOCKS (the ring size needs to be limited, but I didn't see a
disadvantage with using something like MaxConnections * 2).
Greetings,
Andres Freund
wal, wal_init'
(or a subset thereof).
>From what I know of btrfs, I don't think you want direct IO though. Possibly
for WAL, but definitely not for data. IIRC it currently can cause corruption.
Greetings,
Andres Freund
CPU, is higher.
You could check what happens if you bind both server and client to the same
CPU socket.
numactl --membind 1 --cpunodebind 1
forces programs to allocate memory and run on a specific CPU socket.
Greetings,
Andres Freund
hat timescale does flush WAL too frequently for some
reason...
Greetings,
Andres Freund
ows=376135 width=20) (actual time=88.336..1377.862 rows=371810
> loops=1)
Given the time spent in the bitmap heap scan, it might be beneficial to
increase effective_io_concurrency some.
> Recheck Cond: ((ts >= '2023-03-16
> 10:51:28.397+01'::timestamp with time zone) AND (ts <= '2023-04-16
> 10:51:28.397+02'::timestamp with time zone))
> Rows Removed by Index Recheck: 2131
> Heap Blocks: lossy=4742
The lossiness might also incur some overhead, so increasing work_mem a bit
will help some.
Greetings,
Andres Freund
Hi,
On 2023-02-10 20:45:39 +0100, Pavel Stehule wrote:
> But for significant improvements it needs some form of JIT (Postgres has JIT
> for SQL expressions, but it is not used for PLpgSQL expressions). On second
> hand, PL/pgSQL is not designed (and usually) not used for extensive numeric
> calcul
need more shared memory. There's a small
degradation of performance due to the increased size of some shared
datastructures, most prominently the lock table for heavyweight locks.
Greetings,
Andres Freund
Hi,
On November 30, 2022 3:47:32 AM PST, Andrew Dunstan wrote:
>
>On 2022-11-29 Tu 16:06, David Rowley wrote:
>> On Wed, 30 Nov 2022 at 03:31, Tom Lane wrote:
>>> Alvaro Herrera writes:
IMO it was a mistake to turn JIT on in the default config, so that's one
thing you'll likely want
at which new connections are made, and thus how quickly the backlog is
filled.
Do you get the same behaviour if you set net.core.somaxconn to higher than the
number of connections? IIRC you need to restart postgres for it to take
effect.
Greetings,
Andres Freund
a simple LEFT JOINed query when
> under high load (> 1000 queries / sec). Furthermore the CPU usage is quite
> high.
We can't say much about aurora. It's a heavily modified fork of postgres. Did
you reproduce this with vanilla postgres? And if so, do you have it in a form
that somebody could try out?
Greetings,
Andres Freund
Hi,
On March 4, 2022 10:42:39 AM PST, Tom Lane wrote:
>aditya desai writes:
>> One of the service layer app is inserting Millions of records in a table
>> but one row at a time. Although COPY is the fastest way to import a file in
>> a table. Application has a requirement of processing a row an
t he subsequent locks, but instead just swapping in the
currently-last lock.
Greetings,
Andres Freund
(*) IIRC the issue is when writing back we try to write back multiple buffers
at once (using conditional lock acquisition to avoid deadlocks). Those then
are likely released in FIFO order. I think it
Hi,
On October 27, 2021 2:44:56 PM PDT, Ashkil Dighin
wrote:
>Hi,
>Yes, lock contention reduced with postgresqlv14.
>Lock acquire reduced 18% to 10%
>10.49 %postgres postgres[.] LWLockAcquire
>5.09% postgres postgres[.] _bt_compare
>
>Is lock contention can be reduced
Hi,
On 2021-10-25 18:38:40 -0600, Michael Lewis wrote:
> On Mon, Oct 25, 2021, 5:36 PM Andres Freund wrote:
> If your hot data set is actually larger than s_b, I'd recommend trying a
> larger s_b. It's plausible that a good chunk of lock contention is from
> that.
> How
mportant
callers are.
> Postgres.conf used in Baremetal
>
> shared_buffers = 128GB(1/4 th RAM size)
> effective_cachesize=392 GB(1/3 or 75% of RAM size)
If your hot data set is actually larger than s_b, I'd recommend trying a
larger s_b. It's plausible that a good chunk of lock contention is from that.
Greetings,
Andres Freund
Hi,
On 2021-04-13 23:04:50 -0400, Tom Lane wrote:
> Andres Freund writes:
> > On 2021-04-13 19:16:46 -0400, Tom Lane wrote:
> >> Like this? This passes check-world, modulo the one very-unsurprising
> >> regression test change. I've not tried to do any perform
ormance testing.
I wonder if there's a realistic chance it could create additional
deadlocks that don't exist right now?
Would it be a problem that we'd still release the locks on catalog
tables early, but not on its toast table?
Greetings,
Andres Freund
s a requested improvement?
The ability to lock a toast table? Yea, it might be worth doing that. I
seem to recall this being discussed not too long ago...
Greetings,
Andres Freund
Hi,
On 2021-04-13 09:33:48 -0700, Paul Friedman wrote:
> I've attached the 2 perf reports. From the 2nd one, I can see lots of
> time waiting for TOAST table locks on the geometry column, but I
> definitely don't fully understand the implications or why LockManager
> would be struggling here.
Oh
m call, which in turn is what lwlocks
end up using on linux when the lock is contended.
Check the second half of:
https://www.postgresql.org/message-id/20210412215738.xytq33wlciljyva5%40alap3.anarazel.de
Greetings,
Andres Freund
ected weren't just the first few
milliseconds of starting those 60 queries, there shouldn't be any
additional "heavyweight locks" taken given the duration of your queries.
The futex profile hopefully will tell us from where that is coming
from...
Greetings,
Andres Freund
ord --call-graph dwarf -e syscalls:sys_enter_futex -a sleep 3
perf report --no-children --sort comm,symbol
Greetings,
Andres Freund
Hi,
On 2020-04-29 10:50:54 +0200, Laurenz Albe wrote:
> On Wed, 2020-04-29 at 08:54 +0200, Marc Rechté wrote:
> > I am trying to figure out the recommended settings for a PG dedicated
> > machine regarding NUMA.
> >
> > I assume that the shared buffers are using Huge Phages only. Please
> > cor
what I brought up: There'd might not be a lot of common
code/infrastructure between deleting row versions that are invisible due
to no backend having a snapshot to see them (presumably inferred via
xmin/xmax), and newly created row versions within a transaction that are
invisible because there's no snapshot with that cid.
Greetings,
Andres Freund
a tuple into a
tuplequeue requires the input slot to be materialized into a HeapTuple
(should probably be MinimalTuple....), which often the input will not
yet be. So I think it'll often be much worse than 10x.
Greetings,
Andres Freund
Hi,
On 2019-12-16 17:48:16 -0500, Tom Lane wrote:
> Hmm, that's an interesting thought. The OP did say the CPU type,
> but according to Intel's spec page for it [1] the difference between
> base and turbo frequency is only 4.0 vs 4.2 GHz, which doesn't seem
> like enough to explain the results ..
efore it, and see how
> much of the change in your real queries is explained by this one thing (or
> whether any of it is)
In particular, artificial queries will often show bottlenecks that are
not releveant in practice...
> commit b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755
&
ll. From a debugging printout:
Is categories.category_id unique? Does the plan change if you ANALYZE
the tables?
This plan doesn't look like it'd actually take long, if the estimates
are correct.
> What on Earth could be causing this simple query to be running 100% CPU for
> hours?
Is the DELETE actually taking that long, or the query you showed the
explain for, or both?
Greetings,
Andres Freund
so much more performance work that's applicable to everyone using PG,
even without access to special purpose hardware.
Greetings,
Andres Freund
/current/pgstattuple.html
not the pg_stat_user_indexes entry...
Greetings,
Andres Freund
nt the preceding message contains
> advice relating to a Federal tax issue, unless expressly stated otherwise the
> advice is not intended or written to be used, and it cannot be used by the
> recipient or any other taxpayer, for the purpose of avoiding Federal tax
> penalties, and was not written to support the promotion or marketing of any
> transaction or matter discussed herein.
GNGNGGRR.
Greetings,
Andres Freund
Hi,
On August 24, 2019 12:41:03 PM PDT, Tom Lane wrote:
>Jeff Janes writes:
>> Most of the time is not after the clock stops, but before the
>stepwise
>> ANALYZE clock starts. If you just do an EXPLAIN rather than EXPLAIN
>> ANALYZE, that is also slow. The giant hash table is created during
>t
Hi,
On 2019-08-20 19:55:56 +0200, Felix Geisendörfer wrote:
> > On 20. Aug 2019, at 19:32, Andres Freund wrote:
> > FWIW, that's not a mis-estimate I'm getting on master ;). Obviously
> > that doesn't actually address your concern...
>
> I suppose this
):
+ 33.01% postgres postgres [.] tuplehash_iterate
- 18.39% postgres libc-2.28.so [.] __memset_avx2_erms
- 90.94% page_fault
__memset_avx2_erms
tuplehash_allocate
tuplehash_create
BuildTupleHashTableExt
build_hash_table
ExecInitAgg
ExecInitNode
InitPlan
standard_ExecutorStart
Greetings,
Andres Freund
e?
> That can matter quite a bit.
Why aren't you surprised? I can easily get 20k+ write transactions/sec
on my laptop, with synchronous_commit=off. With appropriate
shared_buffers and other settings, the disk speed shouldn't matter that
much for in insertion mostly workload.
Greetings,
Andres Freund
et that?
Are your clients in the same datacenter as your database? Otherwise it
could be that you're mostly seeing latency effects.
Greetings,
Andres Freund
;CREATE TEMPORARY TABLE
pg_stat_database_snap AS SELECT * FROM pg_stat_database;
SELECT datname,
(blks_hit - blks_read)::float/NULLIF(blks_hit, 0)::float
FROM (
SELECT datname,
pd.blks_read - ps.blks_read AS blks_read,
pd.blks_hit - ps.blks_hit AS blks_hit
FROM pg_stat_database pd JOIN pg_stat_database_snap ps USING (datname) )
pd_diff;
Greetings,
Andres Freund
uarantee in practice.
Ivan Kartyshov, reviewed by Tomas Vondra and Robert Haas.
Discusssion:
so everything from 10 onwards ought to be fine.
Greetings,
Andres Freund
DATE pg_index SET indisvalid = false WHERE indexrelid =
'name_of_index'::regclass AND indisvalid;
EXPLAIN yourquery;
ROLLBACK;
might allow to test that without actually dropping the index. But that
of course requires superuser access.
Greetings,
Andres Freund
Hi,
On 2019-05-07 10:32:45 -0700, Andres Freund wrote:
> pgbench -i -q -s 96 && pgbench -n -c 8 -j 8 -T 100 -P1
possibly also worthwhile to note: Adding -M prepared (which I think
phoronix doesn't specify) makes this considerably faster...
Greetings,
Andres Freund
f
> the program as a whole, not just a single TU at a time. This enables
> it to perform additional aggressive optimization.
Note that the flags described don't enable LTO.
Greetings,
Andres Freund
ns with us.
> Full details about the data types we collect and what we use this for and
> your related rights is set out in our online privacy policy at
> https://www.gam.com/en/legal/privacy-policy.
> Please familiarise yourself with this policy and check it from time to time
> for updates as it supplements this notice.
This is a public list.
Greetings,
Andres Freund
ly - seen that the "planning cost" we add to
the custom plan leads to the generic plan to always be preferred. In
particular for indexed queries, on system that set random_page_cost =
seq_page_cost = 1 (due to SSD or expectation that workload is entirely
cached), the added cost from cached_plan_cost() can be noticable in
comparison to the estimated cost of the total query.
Greetings,
Andres Freund
suggest testing it with synchronous_commit=off instead. That's about
as fast for this type of workload, doesn't have cluster corruption
issues, the window of a transaction not persisting in case of a crash is
very small, and it can just set by any user in individual sessions.
Greetings,
Andres Freund
t; mergejoinscansel
> initial_cost_mergejoin
> try_mergejoin_path
> add_paths_to_joinrel
> make_join_rel
> join_search_one_level
> standard_join_search
> make_one_rel
> query_planner
> ...
I suspect some of this might be related to < 11 not having the following
commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd
Greetings,
Andres Freund
, 4KB FS
blocks, and 8KB postgres block size.
There still might be some benefit of different FS block sizes, but it's
not going to be related directly to IOPS.
Greetings,
Andres Freund
Hi,
On 2019-03-06 19:21:33 +0100, Tobias Gierke wrote:
> On 06.03.19 18:42, Andres Freund wrote:
> >
> > It's hard to know precisely without running a profile of the
> > workload. My suspicion is that the bottleneck in this query is the use
> > of numeric,
#x27;re too complicated to be inlined.
Generally there's definitely advantage in JITing aggregation.
There's a lot of further improvements on the table with better JIT code
generation, I just haven't gotten around implementing those :(
Greetings,
Andres Freund
4) (actual
> time=74.799..74.799 rows=0 loops=1)
> Filter: (SubPlan 1)
> Rows Removed by Filter: 30044
> SubPlan 1
> -> Subquery Scan on subq_1 (cost=0.00..0.02 rows=1 width=0) (actual
> time=0.002..0.002 rows=0 loops=30044)
> -> HashAggregate (cost=0.00..0.01 rows=1 width=20) (actual
> time=0.000..0.000 rows=0 loops=30044)
> Group Key: ref_0.o_entry_d, c_credit
> -> Result (cost=0.00..0.00 rows=0 width=20) (actual
> time=0.000..0.000 rows=0 loops=30044)
> One-Time Filter: false
> Planning Time: 0.350 ms
> Execution Time: 79.237 ms
I think that might be fixed in the latest point release. I screwed up
and made resets of tuple hash tables (and there's 30044 of those here)
more expensive. It's fixed in the latest minor release however.
Greetings,
Andres Freund
g internet, so I appreciate self contained email.
Greetings,
Andres Freund
if
you want to play around with it?
Greetings,
Andres Freund
On 2018-12-08 15:23:19 -0800, Rob Sargent wrote:
>
>
> > On Dec 8, 2018, at 3:12 PM, Andres Freund wrote:
> >
> > On 2018-12-08 12:06:23 -0800, Jeremy Schneider wrote:
> >> On RDS PostgreSQL, the default is 25% of your server memory. This seems
> >&
On 2018-12-08 12:06:23 -0800, Jeremy Schneider wrote:
> On RDS PostgreSQL, the default is 25% of your server memory. This seems
> to be pretty widely accepted as a good starting point on PostgreSQL.
FWIW, I think it's widely cited, but also bad advice. 25% for a OLTP
workload on a 1TB machine wit
I thought it would also have been related to
> https://www.postgresql.org/message-id/CAMkU%3D1xPqHP%3D7YPeChq6n1v_qd4WGf%2BZvtnR-b%2BgyzFqtJqMMQ%40mail.gmail.com
Neither of these are related to the problem.
Greetings,
Andres Freund
called once for each column in
a select list (or using or ...). IIRC we've hit this once when I was at
citus, too.
We really should be usign a more appropriate datastructure here - very
likely a hashtable. Unfortunately such a change would likely be a bit
too much to backpatch...
Greetin
is much more crucial. If a
lot of those tuples aren't frozen (don't quite know how that works
there), the clog accesses will also kill you if the table was filled
over many transactions, since clog's access characteristics to a lot of
xids is pretty bad with DIO.
Greetings,
Andres Freund
On 2018-08-17 15:21:19 +0200, Alexis Lê-Quôc wrote:
> On Tue, Aug 14, 2018 at 7:50 PM Andres Freund wrote:
>
> > Hi,
> >
> > On 2018-08-14 10:46:45 -0700, Andres Freund wrote:
> > > On 2018-08-14 15:18:55 +0200, Alexis Lê-Quôc wrote:
> > >
ns we will likely write the full page very often. Yes, my hands are
> tied!
Why is that a requirement / how is specifically phrased? Is it a bounded
recovery time?
Greetings,
Andres Freund
Hi,
On 2018-08-14 10:46:45 -0700, Andres Freund wrote:
> On 2018-08-14 15:18:55 +0200, Alexis Lê-Quôc wrote:
> > + 30.25%26.78% postgres postgres [.] mdnblocks
>
> This I've likely fixed ~two years back:
>
> http://archives
tgres [kernel.kallsyms] [k]
> copy_user_enhanced_fast_string
Possible that a slightly bigger shared buffer would help you.
It'd probably more helpful to look at a perf report --no-children for
this kind of analysis.
Greetings,
Andres Freund
Hi,
On 2018-07-31 12:56:26 -0400, Jeff Janes wrote:
> On Mon, Jul 30, 2018 at 1:23 PM, Andres Freund wrote:
>
> > On 2018-07-30 07:19:07 -0400, Jeff Janes wrote:
> >
> > > And indeed, in my hands JIT makes it almost 3 times worse.
> >
> > Not in my measure
imization
> time show up there?
As my timings showed, I don't see the slowdown you're reporting. Could
you post a few EXPLAIN ANALYZEs?
Greetings,
Andres Freund
Hi,
On 2018-07-30 18:01:34 +0200, Pavel Stehule wrote:
> look on
> http://www.postgresql-archive.org/PATCH-LLVM-tuple-deforming-improvements-td6029385.html
> thread, please.
Given the results I just posted in the sibling email I don't think those
issues apply here.
Greetings,
Andres Freund
d version essentially now boils down to a near
optimal loop around the intermittent bigint columns (which we deform
because we use a slot - at some point we're going to have to do
better). No checks for the NULL bitmap, no alignment considerations,
all that's optimized away.
Greetings,
Andres Freund
Hi,
On 2018-07-20 08:27:34 -0400, Robert Haas wrote:
> On Thu, Jul 19, 2018 at 4:32 PM, Andres Freund wrote:
> >> 1. Why the error message changes? If there's a good reason, it should
> >> be done as a separate commit, or at least well-documented in the
> >&
Hi,
On 2018-07-18 14:34:34 -0400, Robert Haas wrote:
> On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund wrote:
> > FWIW, here's a rebased version of this patch. Could probably be polished
> > further. One might argue that we should do a bit more wide ranging
> > change
Hi,
On 2017-12-08 13:44:37 -0800, Andres Freund wrote:
> On 2017-12-08 10:17:34 -0800, Andres Freund wrote:
> > the strtoll is libc functionality triggered by pg_atoi(), something I've
> > seen show up in numerous profiles. I think it's probably time to have
> >
e issues for tables. And for functions it's fairly easy
to get into trouble because there's a mismatch between the snapshot
pg_dump uses (a normal transactional snapshot) and the snapshot used to
deparse expressions etc (a fresh catalog snapshot that takes into
account concurrent commits).
Greetings,
Andres Freund
however, I get a planning time of 0.5ms.
How long does planning take if you repeat this? I wonder if a good chunk
of those 1.8s is initial loading of plv8.
Greetings,
Andres Freund
e planner
that forgo optimizations if volatile functions are involved. If you
declare the function as immutable, does the problem persist?
Greetings,
Andres Freund
Hi,
On 2017-12-08 10:17:34 -0800, Andres Freund wrote:
> the strtoll is libc functionality triggered by pg_atoi(), something I've
> seen show up in numerous profiles. I think it's probably time to have
> our own optimized version of it rather than relying on libcs.
Attach
of more
reasonable numbers of columns.
The copy implementation deserves some optimization too...
> Here¹s a tarball with test schema and data:
> http://nohuhu.org/copy_perf.tar.bz2; it¹s 338k compressed but expands to
> ~50mb. Here¹s the result of profiling session with perf:
> https://pastebin.com/pjv7JqxD
Thanks!
Greetings,
Andres Freund
ts suggest that the sweet spot is
more likely to be an order of magnitude or two bigger. Depends a bit on
your workload (including size of scans and concurrency) obviously.
Greetings,
Andres Freund
er than disks can read.
Huh? Zlib compresses at a few 10s of MB/s.
Greetings,
Andres Freund
On November 21, 2017 6:49:26 AM PST, Tom Lane wrote:
>Zakhar Shapurau writes:
>
>Presumably, we could examine the grouping sets to identify column(s)
>present in all sets, and then allow the optimization for clauses that
>reference only such columns. Or maybe I'm misreading the comment
>(but t
76 matches
Mail list logo