ges
> with headers and only a few tuples. If any of those are insecure, they
> all are. Therefore, I don't see any reason to treat them differently.
>
We had to special case zero pages and not encrypt them because as far as I
can tell, there is no atomic way to extend a file and initialize it to
Enc(zero) in the same step.
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
contents of the page and they would then achieve that this page gets
ignored.
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
On Wed, 13 Oct 2021 at 00:25, Bruce Momjian wrote:
> On Tue, Oct 12, 2021 at 11:21:28PM +0300, Ants Aasma wrote:
> > On Tue, 12 Oct 2021 at 16:14, Bruce Momjian wrote:
> >
> > Well, how do you detect an all-zero page vs a page that encrypted to
> all
> >
On Wed, 13 Oct 2021 at 02:20, Bruce Momjian wrote:
> On Wed, Oct 13, 2021 at 12:48:51AM +0300, Ants Aasma wrote:
> > On Wed, 13 Oct 2021 at 00:25, Bruce Momjian wrote:
> >
> > On Tue, Oct 12, 2021 at 11:21:28PM +0300, Ants Aasma wrote:
> > > Page encr
is a much better media for having a
reasoned discussion about technical design decisions.
> > In other words: maybe I'm wrong here, but it looks to me like we're
>
> laboriously reinventing the wheel when we could be working on
> > improving the working prototype.
>
>
ize, unless the table can't possibly have that many tuples. It
may make sense to allocate it based on estimated number of dead tuples and
resize if needed.
Regards,
Ants Aasma
Web: https://www.cybertec-postgresql.com
From 6101b360ea85a66aba093f98a83ae335983aa4a5 Mon Sep 17 00:00:00 2001
From: A
/12GB limitation. I'll see if I can
pick up where that thread left off and push it along.
Regards,
Ants Aasma
Web: https://www.cybertec-postgresql.com
rkers_per_gather = 0:
>
> select count(bid) from pgbench_accounts;
>
> no checksums: ~456ms
> with checksums: ~489ms
>
> 456.0/489 = 0.9325
>
> The cost of checksums is about 6.75% here.
>
Can you try with postgres compiled with CFLAGS="-O2 -march=native"? There
On Thu, Mar 28, 2019 at 10:38 AM Christoph Berg wrote:
> Re: Ants Aasma 2019-03-27 <
> ca+csw_twxdrzdn2xsszbxej63dez+f6_hs3qf7hmxfenxsq...@mail.gmail.com>
> > Can you try with postgres compiled with CFLAGS="-O2 -march=native"?
> There's
> > a bit of lo
purpose is to allow an external transaction manager to perform
> atomic global transactions across multiple databases or other transactional
> resources. Unless you're writing a transaction manager, you probably
> shouldn't be using PREPARE TRANSACTION.
Regards,
Ants Aasma
nly a few lwlock acquisitions away and shouldn't make
any material difference. Patch to do so is attached.
Regards,
Ants Aasma
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7375a78ffc..faa9690e48 100644
--- a/src/backend/access/transam/xlog.c
+++ b/s
even though the code has been like that for a very long time. I was
actually mostly worried about extension code run by logging hook
causing the panic.
Regards,
Ants Aasma
hink this would also fix oracle_fdw crashing when postgres is
compiled with --with-ldap. At least RTLD_DEEPBIND helped. [1]
[1]
https://www.postgresql.org/message-id/CA%2BCSw_tPDYgnzCYW0S4oU0mTUoUhZ9pc7MRBPXVD-3Zbiwni9w%40mail.gmail.com
Ants Aasma
your needs have
outgrown what RDS works well with and you are in for a painful move
sooner or later.
Regards,
Ants Aasma
--
+43-670-6056265
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com
eed [1].
>
> I think trying with something like 500-1000 partitions might be a good
> place to start.
I don't think that will actually help much. 1000 partitions means each
partition gets data from ~50 vehicles. A 60 tuples per page each page
in the partitioned able will contain on av
proach I described, or a buffering microservice in front of
PostgreSQL like Aleksander recommended should fix data locality for
you. If you weren't running on RDS I would even propose using Redis as
the buffer with one key per driver and redis_fdw to make the data
accessible from within PostgreSQL.
Regards,
Ants Aasma
--
+43-670-6056265
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com
ses this
hash function is used for.
Regards,
Ants Aasma
From 912f46be12536985dda7bcfb669d4ec13e79d073 Mon Sep 17 00:00:00 2001
From: Ants Aasma
Date: Mon, 29 Jan 2024 21:07:44 +0200
Subject: [PATCH 2/2] Unaligned fasthash word at a time hashing
About 10% performance benefit on short strings, 50%
On Tue, 30 Jan 2024 at 12:04, John Naylor wrote:
>
> On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma wrote:
> > But given that we know the data length and we have it in a register
> > already, it's easy enough to just mask out data past the end with a
> > shift. See pa
with StrategyRejectBuffer(). So maybe a dynamic
sizing algorithm could be applied to the ringbuffer. Make the buffers
array in strategy capable of holding up to the limit of buffers, but
set ring size conservatively. If we have to flush WAL, double the ring
size (up to the limit). If we loop around the ring without flushing,
decrease the ring size by a small amount to let clock sweep reclaim
them for use by other backends.
--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com
y other such examples.
Ideally yes, though I am not hopeful of finding a solution that does
this any time soon. Just to take your example, if a nightly
maintenance job wipes out the shared buffer contents slightly
optimizing its non time-critical work and then causes morning user
visible lo
all (what do
> you think of the name?).
A static inline function seems like a less surprising and more type
safe solution for this.
--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com
On Mon, 20 Mar 2023 at 00:59, Melanie Plageman
wrote:
>
> On Wed, Mar 15, 2023 at 6:46 AM Ants Aasma wrote:
> >
> > On Wed, 15 Mar 2023 at 02:29, Melanie Plageman
> > wrote:
> > > As for routine vacuuming and the other buffer access strategies, I thi
On Wed, 13 Mar 2024 at 04:56, Kyotaro Horiguchi wrote:
>
> At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi
> wrote in
> > Oh, I once saw the fix work, but seems not to be working after some
> > point. The new issue was a corruption of received WAL records on the
> > first standby, an
rated looks sane. I added the clang
pragma because it insisted on unrolling otherwise and based on how the
instruction dependencies look that is probably not too helpful even
for large cases (needs to be tested). The configure check and compile
flags of course need to be amended for BW.
Regards
On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote:
>
> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
> > What about using the masking capabilities of AVX-512 to handle the
> > tail in the same code path? Masked out portions of a load instruction
> > will n
On Tue, 2 Apr 2024 at 00:31, Nathan Bossart wrote:
> On Tue, Apr 02, 2024 at 12:11:59AM +0300, Ants Aasma wrote:
> > What about using the masking capabilities of AVX-512 to handle the
> > tail in the same code path? Masked out portions of a load instruction
> > will not gene
cases,
making the choice easy.
Regards,
Ants Aasma
is manually giving essentially the same result in gcc. As most
distro packages are built using gcc I think it would make sense to
have the extra code if it gives a noticeable benefit for large cases.
The visibility map patch has the same issue, otherwise looks good.
Regards,
Ants Aasma
diff --git
On Mon, 8 Apr 2024 at 16:26, Robert Haas wrote:
> And maybe we need to think of a way to further mitigate this crush of
> last minute commits. e.g. In the last week, you can't have more
> feature commits, or more lines of insertions in your commits, than you
> did in the prior 3 weeks combined. I
d the only
branch is for found/not found. Hoping to have a working prototype of SLRU
on top in the next couple of days.
Regards,
Ants Aasma
2.859s (26.7 GiB/s)
clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s)
--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com
On Tue, 9 Jan 2024 at 16:03, Peter Eisentraut wrote:
> On 29.11.23 18:15, Nathan Bossart wrote:
> > Using the same benchmark as we did for the SSE2 linear searches in
> > XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following:
> >
> >writerssse2avx2 %
> >2561
On Tue, 9 Jan 2024 at 18:20, Nathan Bossart wrote:
>
> On Tue, Jan 09, 2024 at 09:20:09AM +0700, John Naylor wrote:
> > On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart
> > wrote:
> >>
> >> > I suspect that there could be a regression lurking for some inputs
> >> > that the benchmark doesn't look
xlp_tli is not being used to its full potential right
now either. We only check that it's not going backwards, but there is at
least one not very hard to hit way to get postgres to silently replay on
the wrong timeline. [1]
[1]
https://www.postgresql.org/message-id/canwkhkmn3qwacvudzhb6wsvlrtkwebiyso-klfykkqvwuql...@mail.gmail.com
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
eline. Maybe
while at it, we should also track that the next record should be a
checkpoint record for the timeline switch and error out if not. Thoughts?
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
recoverytest.sh
Description: application/shellscript
hat seems
completely broken.
> As you know, when new primary starts a diverged history, the
> recommended way is to blow (or stash) away the archive, then take a
> new backup from the running primary.
My understanding is that backup archives are supposed to remain valid
even after PITR or e
th patch).
So I think the correct approach would still be to have ReadRecord() or
ApplyWalRecord() determine that switching timelines is needed.
--
Ants Aasma
www.cybertec-postgresql.com
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index
ring encryption mode choices given concerns expressed is next.
Currently a viable option seems to be AES-XTS with LSN added into the IV.
XTS doesn't have an issue with predictable IV and isn't totally broken in
case of IV reuse.
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
On Wed, 6 Oct 2021 at 23:08, Bruce Momjian wrote:
> Yes, I would prefer we don't use the LSN. I only mentioned it since
> Ants Aasma mentioned LSN use above.
>
Is there a particular reason why you would prefer not to use LSN? I
suggested it because in my view having a variable
e LSN be unencrypted and include
> it in the tweak as that would limit the risk from re-use of the same
> tweak over time.
>
Right, my thought was to leave the first 8 bytes of pages, the LSN,
unencrypted and include the value in the tweak. Just tested that OpenSSL
aes-256-xts
vely dense ID space to get the
performance boost, which seems essential to the approach. The latter
issue means that it can't be easily dropped into GIN or B-tree indexes
for ctid storage.
[1] https://github.com/ChenHuajun/pg_roaringbitmap
[2] https://github.com/cybertec-postgresql/pgfaceting
--
Ants Aasma
www.cybertec-postgresql.com
On Tue, 27 Jun 2023 at 07:09, Andres Freund wrote:
> On 2023-06-27 15:33:57 +1200, Thomas Munro wrote:
> > On Tue, Jun 27, 2023 at 2:05 PM Andres Freund wrote:
> > > Unfortunately it scaled way worse at first. This is not an inherent
> > > issue, but
> > > due to an implementation choice in Read
On Tue, 27 Jun 2023 at 18:40, Andres Freund wrote:
> On 2023-06-27 14:49:48 +0300, Ants Aasma wrote:
> > If you want to experiment, here is a rebased version of something I
> > hacked up a couple of years back on the way to Fosdem Pgday. I didn't
> > pursue it further b
apping), that might help.
Just as another point in support of strategy based/extensible tuple
placement, I would at some point try out placing INSERT ON CONFLICT
tuples on the same page as the preceding key in the index. Use case is
in tables with (series, timestamp) primary key to get locality of
access range scanning for a single series. Placement will always be a
tradeoff that is dependent on hardware and workload, and the effect
can be pretty large. For the mentioned use case, if placement can
maintain some semblance of clustering, there will be a 10-100x
reduction in buffers accessed for a relatively minor increase in
bloat.
--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com
ng
from the O= value that it's claiming to come from to automate replacement
of intermediate certificates, but not trust that every other sub-CA signed
by root and their sub-sub-CA-s are completely honest and secure.
Regards,
Ants Aasma
with overloaded CPU and
a contended spinlock. A process holding the spinlock might easily get
scheduled out leading to excessive spinning by everybody. I think a simple
thing to try would be to replace the spinlock with LWLock.
I did a prototype patch that replaces spinlocks with futexes,
ent that every performance critical spinlock had already been removed.
To be clear, I am not advocating for this patch to get included. I just had
the patch immediately available and it could have confirmed that using a
better lock fixes things.
--
Ants Aasma
Senior Database Engineerwww.cybertec-postgresql.com
al
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.
Regards,
Ants Aasma
Cybertec
rate on splitting input
data to workers. After that any performance issues would be basically
the same as a normal parallel insert workload. There may well be
bottlenecks there, but those could be tackled independently.
Regards,
Ants Aasma
Cybertec
kers, even when inserting to an
unindexed unlogged table. If we get the SIMD line splitting in, it
will be enough to overwhelm most I/O subsystems available today.
Regards,
Ants Aasma
On Mon, 13 Apr 2020 at 23:16, Andres Freund wrote:
> > Still, if the reader does the splitting, then you don't need as much
> > IPC, right? The shared memory data structure is just a ring of bytes,
> > and whoever reads from it is responsible for the rest.
>
> I don't think so. If only one process
ee scan. A highly parallel index only scan on a
fully cached index should create at least some spinlock contention.
Regards,
Ants Aasma
ks.
There is also an implicit assumption here that a maintenance command is a
background task and a normal DML query is a foreground task. This is not
true for all cases, users may want to throttle transactions doing lots of
DML to keep synchronous commit latencies for smaller transactions within
reasonable limits.
As a wild idea for how to handle the throttling, what if when all our wal
insertion credits are used up XLogInsert() sets InterruptPending and the
actual sleep is done inside ProcessInterrupts()?
Regards,
Ants Aasma
get of WAL
insertion credits per time interval, and when the credits run out the
process sleeps. With this type of scheme it would be reasonably
straightforward to let UPDATEs being blocked by REINDEX to transfer their
WAL insertion budgets to the REINDEX, making it get a larger piece of the
total throughput pie.
Regards,
Ants Aasma
, you are welcome
> and probably get your name on it:-)
>
There are pretty good approximations for s > 1.0 using Riemann zeta
function and Euler derived a formula for the s = 1 case.
I also noticed that i is int in this function, but n is int64. That seems
like an oversight.
Regards,
Ants Aasma
e tunable. Ideally the
buffers would be at least big enough to absorb one of the workers getting
scheduled out for a timeslice, which could be up to tens of megabytes.
Regards,
Ants Aasma
[1] https://github.com/geofflangdale/simdcsv/
e larger
than the data buffer, but that doesn't seem like a major issue. Once the line is
buffered and begins inserting next worker can start buffering the next tuple.
Regards,
Ants Aasma
tch attached if you'd
like to try that out.
Regards,
Ants Aasma
diff --git a/src/main.cpp b/src/main.cpp
index 9d33a85..2cf775c 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -185,7 +185,6 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
#endif
simd_
On Tue, 18 Feb 2020 at 15:21, Amit Kapila wrote:
>
> On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma wrote:
> >
> > On Tue, 18 Feb 2020 at 12:20, Amit Kapila wrote:
> > > This is something similar to what I had also in mind for this idea. I
> > > had thought of
On Wed, 19 Feb 2020 at 06:22, Amit Kapila wrote:
>
> On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma wrote:
> >
> > On Tue, 18 Feb 2020 at 15:21, Amit Kapila wrote:
> > >
> > > On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma wrote:
> > > >
&
y to fetch the
next state).
I whipped together a quick prototype that uses SIMD and bitmap
manipulations to do the equivalent of CopyReadLineText() in csv mode
including quotes and escape handling, this runs at 0.25-0.5 cycles per
byte.
Regards,
Ants Aasma
#include
#include
#include
#include
#
ing the largest portion. Amdahl's law
says that splitting into tuples needs to be made fast before
parallelizing makes any sense.
Regards,
Ants Aasma
[1]
https://www3.stats.govt.nz/2018census/Age-sex-by-ethnic-group-grouped-total-responses-census-usually-resident-population-counts-2006-2013-2018-Censuses-RC-TA-SA2-DHB.zip
lidation scan until something can be evicted.
--
Ants Aasma
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com/
eeper queues).'
>
For reference, a typical datacenter SSD needs a queue depth of 128 to
saturate a single device. [1] Multiply that appropriately for RAID arrays.
Regards,
Ants Aasma
[1]
https://www.anandtech.com/show/12435/the-intel-ssd-dc-p4510-ssd-review-part-1-virtual-raid-on-cpu-vroc-scalability/3
can. With some limits of course as parent
nodes to the parallel index scan can increase the row count by
arbitrary amounts.
Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
PAUSE in a loop without attempting to grab the lock. In
PostgreSQL it's called only once per retry attempt.
Regards,
Ants Aasma
--
PostgreSQL Senior Consultant
www.cybertec-postgresql.com
Austria (HQ), Wiener Neustadt | Switzerland, Zürich | Estonia,
Tallinn | Uruguay, Montevideo
Facebook: www.fb.com/cybertec.postgresql
Twitter: www.twitter.com/PostgresSupport
bility of
extracting the last seen value and another set-if-greater update
operation.
--
Ants Aasma
www.cybertec-postgresql.com
patch was enough to resolve the issue.
--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com
On Thu, 14 Nov 2024 at 16:35, Noah Misch wrote:
> Based on a grep of PGXN code, here are some or all of the modules that
> react
> to sizeof(ResultRelInfo):
>
To add to this list, Christoph Berg confirmed that timescaledb test suite
crashes. [1]
Regards,
Ants Aasm
r direction is to extract more memory concurrency. Prefetcher could
batch multiple lookups together so CPU OoO execution has a chance to fire
off multiple memory accesses at the same time.
The other direction is to split off WAL decoding, buffer lookup and maybe
even pinning to a separate process from the main redo loop.
--
Ants Aasma
>
or larger values. But before committing
to that approach, I think revisiting the quality of the page checksum
algorithm is due. Quality and robustness were not the highest
priorities when developing it.
--
Ants Aasma
Lead Database Consultant
www.cybertec-postgresql.com
On Thu, 9 Jan 2025 at 22:53, Andres Freund wrote:
> Workstation w/ 2x Xeon Gold 6442Y:
>
>march memresult
> native 100246.13766ms @ 33.282 GB/s
> native 10456.08080ms @ 17.962 GB
On Thu, 9 Jan 2025 at 18:25, Andres Freund wrote:
> > I'm curious about this because the checksum code should be fast enough
> > to easily handle that throughput.
>
> It seems to top out at about ~5-6 GB/s on my 2x Xeon Gold 6442Y
> workstation. But we don't have a good ready-made way of testing t
.
This is one of the tricky parts to fix for AIO, as directIO will also
bypass this mechanism. PostgreSQL would need to start issuing those
prefetches itself to not have a regression there.
In a theoretical world, where we would be able to drive prefetches
from an inner B-tree page, the difference
n't really have a machine at hand that can do
anywhere close to this amount of I/O.
I'm asking because if it's the calculation that is slow then it seems
like it's time to compile different ISA extension variants of the
checksum code and select the best one at runtime.
--
Ants Aasma
75 matches
Mail list logo