Re: Why is writing JSONB faster than just JSON?

2021-05-02 Thread Dmitry Dolgov
> On Fri, Apr 23, 2021 at 01:56:57AM -0700, Mitar wrote:
> Hi!
> 
> On Thu, Apr 15, 2021 at 12:11 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > My point was that for JSON, after validating that the input is
> > > syntactically correct, we just store it as-received.  So in particular
> > > the amount of whitespace in the value would depend on how the client
> > > had chosen to format the JSON.  This'd affect the stored size of
> > > course, and I think it would have an effect on compression time too.
> >
> > Yes, I got it and just wanted to confirm you were right - this was the
> > reason I've observed slowdown trying to reproduce the report.
> 
> Thank you for trying to reproduce the report. I did a bit more digging
> myself and I am still confused.
>
> ...
>
> So I do not know what is happening and why you cannot reproduce it.

Could you maybe get a profile with perf for both cases? Since they're
executed within a single backend, you can profile only a single pid.
Having a reasonable profiling frequency, --call-graph dwarf and probably
limit events to only user space with precise tagging (cycles:uppp)
should give an impression what's going on.




PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?

2021-05-02 Thread Pól Ua Laoínecháin
Hi all,


On 2019/10/09, I posted a question here concerning PostgreSQL I/O, the
(primarily) Linux fsync problem and my lecturer's attitude to
PostgreSQL (text of that email is at the bottom of this post).


I asked why the fsync issue didn't affect Oracle and/or MySQL for example?

As far as I can see, this was because Oracle uses Direct I/O whereas
PostgreSQL uses Buffered I/O. I know that the issue has been resolved
and no longer affects currently supported versions - but I'm still
curious...


>From here: 
>https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/

> Whereas, writing the modified/dirty buffers to datafiles from shared buffers 
> is always through Buffered IO.


Now, I'm not quite sure that I completely comprehend matters: Is there
a difference between Asynchronous I/O and Buffered I/O?

If so, could some kind person point me to referernces that explain the
difference?


But, my foggy brain aside, I read (can't find URL - paraphrasing):
PostgreSQL is happy to let the kernel take the load off the server and
look after I/O - that's fine, but you'd better be able to trust your
kernel.


However, MySQL also uses Asynchronous I/O by default - or does it?

>From here: 
>https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html#:~:text=InnoDB%20uses%20the%20asynchronous%20I,which%20is%20enabled%20by%20default,
we have:

> InnoDB uses the asynchronous I/O subsystem (native AIO) on Linux to perform 
> read-ahead and write requests for data file pages. This behavior is 
> controlled by the innodb_use_native_aio configuration option, which is 
> enabled by default.


Now, I haven't (AFAIK) seen references to problems with this fsync
issue on MySQL. Maybe they're so used to losing data, nobody noticed?
:-)


Seriously though, it does beg the question - why did this cause a
major issue for PostgreSQL but not for MySQL?


Is it because of (a) difference(s) between Asynchronous I/O and
Buffered I/O asked about above?


A couple of pointers (excuse the pun!) about this issue would be
great. Some stuff which shows the difference between Direct I/O and
the others would also be helpful. I seem to remember there was mention
of this for PostgreSQL but that it would be a mulit-year project. Why
so if PostgreSQL can already use Direct I/O for the WAL logs?

Obviously, I can search and I have been - but I'd appreciate material
from people here who can sort the wheat from the chaff and point me to
solid references. If here is not a suitable forum, then kindly
redirect me.

TIA and rgs,


Pól Ua...



=

> 2019/10/09 Is my lecturer wrong about PostgreSQL? I think he is!


I recently started a Masters in Computer Science (and not at the
institution in my email address).

One of my courses is "Advanced Databases" - yummy I thought - it's not
even compulsory for me but I just *_had_* to take this module. The
lecturer is a bit of an Oracle fan-boy (ACE director no less...
hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to
do my dissertation with him. So, we're having a chat and I make plain
my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me
that there are problems with random block corruption with PostgreSQL.
I said "really" and before that conversation could go any further,
another student came over and asked a question.

So, I toddled off and did some research - I had heard something about
this before (vague fuzzy memories) of a problem with the Linux kernel
so I searched for a bit and duly dug up a couple of pages

https://lwn.net/Articles/752063/ : PostgreSQL's fsync() surprise - and

https://news.ycombinator.com/item?id=19238121 : Linux Fsync Issue for
Buffered IO and Its Preliminary Fix for PostgreSQL

So, this week I go back to my lecturer and say, yep, there was some
issue but it was a Linux kernel problem and not PostgreSQL's fault and
has been resolved.

He tells me that he knew about that but that there was another issue
(he had "spoken to people" at meetings!). I said "well, why isn't it
fixed?" and he replied "where's the impetus?" to which I responded
(quite shocked at this stage) something like "well, I know that the
core team values correctness very highly" to which he came back with
"yes, but they have no commercial imperative to fix anything - they
have to wait until somebody is capable enough and interested enough to
do the work". He then muttered something about this mysterious flaw
having been fixed in EnterpriseDB.

At this point, I lost interest. Having lurked on lists and going by my
general "gut feeling" - if there was a serious issue causing
irrecoverable block corruption, I'm pretty sure that it would be "all
hands on deck" until this problem had been solved and "nice-to-haves"
(GENERATED AS... for example) would have been parked till then.

Now, I have four questions:

1) Is my lecturer full of it or does he really have a point?

2) The actual concrete acknowledg

Is this the future of I/O for the RDBMS?

2021-05-02 Thread Pól Ua Laoínecháin
Hi all,


Kind of a followup to my question: PostgreSQL, Asynchronous I/O,
Buffered I/O and why did fsync-gate not affect Oracle or MySQL?, I
have another. The blog below kinda got me thinking about all of this.


I have an interest in NewSQL distributed systems - in particular
CockroachDB, TiDB and YugaByte.

Their architectures are fairly similar - hardly surprising since they
are all (partially) F/LOSS clones of the Google Spanner/F1 systems.

They use an underlying KV store and put an SQL interface above that
and use a Raft (or other) consensus algorithm to coordinate and
whatnot.


While perusing the interweb, I stumbled on this very interesting blog
post from TiDB.


https://pingcap.com/blog/tikv-and-spdk-pushing-the-limits-of-storage-performance

It talks about the Storage Performance Development Kit (SPDK) (spdk.io).

The blog appears to think this system is a panacea and manna from
heaven rolled into one:

> This solution solves the four problems we mentioned earlier: it removes the 
> syscall overhead, uses data structures and caching algorithms more suitable 
> for databases and NVMe disks, and simplifies file system logging.


Given that many/most database processing is I/O bound, I'm just
wondering if this system is all it's cracked up to be, or are we at
the "Mass Media Hype Begins" and am I about to jump off the cliff edge
of the "Peak of Inflated Expectations" and fall headlong into the
Trough of Disillusionment? (see:
https://en.wikipedia.org/wiki/Hype_cycle).

Will this have any implications for PostgreSQL, given that it is a db
that compiles/runs on a large number of systems - or can subsystems
such as this be integrated/included for those chips which support it?
This particular SDK appears to be Intel specific, but if one chip
manufacturer can do it, can't they all (eventually)?


If this isn't the appropriate forum for discussing these matters, then
please indicate a suitable forum.


TIA and rgs,


Pól Ua...




Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?

2021-05-02 Thread Vijaykumar Jain
This wiki page.
It has PR references for mysql and mongo for the fsycnc issue.

Fsync Errors - PostgreSQL wiki


I'd leave the more intellectual brainstorming to the experts.

Also, ask for concrete references / reproducible scenarios for opinions if
you care.
else it leads to rumours :)

Thanks,
Vijay




On Sun, 2 May 2021 at 19:17, Pól Ua Laoínecháin  wrote:

> Hi all,
>
>
> On 2019/10/09, I posted a question here concerning PostgreSQL I/O, the
> (primarily) Linux fsync problem and my lecturer's attitude to
> PostgreSQL (text of that email is at the bottom of this post).
>
>
> I asked why the fsync issue didn't affect Oracle and/or MySQL for example?
>
> As far as I can see, this was because Oracle uses Direct I/O whereas
> PostgreSQL uses Buffered I/O. I know that the issue has been resolved
> and no longer affects currently supported versions - but I'm still
> curious...
>
>
> From here:
> https://www.percona.com/blog/2019/02/22/postgresql-fsync-failure-fixed-minor-versions-released-feb-14-2019/
>
> > Whereas, writing the modified/dirty buffers to datafiles from shared
> buffers is always through Buffered IO.
>
>
> Now, I'm not quite sure that I completely comprehend matters: Is there
> a difference between Asynchronous I/O and Buffered I/O?
>
> If so, could some kind person point me to referernces that explain the
> difference?
>
>
> But, my foggy brain aside, I read (can't find URL - paraphrasing):
> PostgreSQL is happy to let the kernel take the load off the server and
> look after I/O - that's fine, but you'd better be able to trust your
> kernel.
>
>
> However, MySQL also uses Asynchronous I/O by default - or does it?
>
> From here:
> https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-diskio.html#:~:text=InnoDB%20uses%20the%20asynchronous%20I,which%20is%20enabled%20by%20default
> ,
> we have:
>
> > InnoDB uses the asynchronous I/O subsystem (native AIO) on Linux to
> perform read-ahead and write requests for data file pages. This behavior is
> controlled by the innodb_use_native_aio configuration option, which is
> enabled by default.
>
>
> Now, I haven't (AFAIK) seen references to problems with this fsync
> issue on MySQL. Maybe they're so used to losing data, nobody noticed?
> :-)
>
>
> Seriously though, it does beg the question - why did this cause a
> major issue for PostgreSQL but not for MySQL?
>
>
> Is it because of (a) difference(s) between Asynchronous I/O and
> Buffered I/O asked about above?
>
>
> A couple of pointers (excuse the pun!) about this issue would be
> great. Some stuff which shows the difference between Direct I/O and
> the others would also be helpful. I seem to remember there was mention
> of this for PostgreSQL but that it would be a mulit-year project. Why
> so if PostgreSQL can already use Direct I/O for the WAL logs?
>
> Obviously, I can search and I have been - but I'd appreciate material
> from people here who can sort the wheat from the chaff and point me to
> solid references. If here is not a suitable forum, then kindly
> redirect me.
>
> TIA and rgs,
>
>
> Pól Ua...
>
>
>
> =
>
> > 2019/10/09 Is my lecturer wrong about PostgreSQL? I think he is!
>
>
> I recently started a Masters in Computer Science (and not at the
> institution in my email address).
>
> One of my courses is "Advanced Databases" - yummy I thought - it's not
> even compulsory for me but I just *_had_* to take this module. The
> lecturer is a bit of an Oracle fan-boy (ACE director no less...
> hmmm...) and I want(ed) - becoming less enthusiasic by the minute - to
> do my dissertation with him. So, we're having a chat and I make plain
> my love of good 'ol PostgreSQL as my RDBMS of choice and he tells me
> that there are problems with random block corruption with PostgreSQL.
> I said "really" and before that conversation could go any further,
> another student came over and asked a question.
>
> So, I toddled off and did some research - I had heard something about
> this before (vague fuzzy memories) of a problem with the Linux kernel
> so I searched for a bit and duly dug up a couple of pages
>
> https://lwn.net/Articles/752063/ : PostgreSQL's fsync() surprise - and
>
> https://news.ycombinator.com/item?id=19238121 : Linux Fsync Issue for
> Buffered IO and Its Preliminary Fix for PostgreSQL
>
> So, this week I go back to my lecturer and say, yep, there was some
> issue but it was a Linux kernel problem and not PostgreSQL's fault and
> has been resolved.
>
> He tells me that he knew about that but that there was another issue
> (he had "spoken to people" at meetings!). I said "well, why isn't it
> fixed?" and he replied "where's the impetus?" to which I responded
> (quite shocked at this stage) something like "well, I know that the
> core team values correctness very highly" to which he came back with
> "yes, but they have no commercial imperative to fix anything - they
> have to wait until somebody is capable enough 

Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?

2021-05-02 Thread Ron

On 5/2/21 8:46 AM, Pól Ua Laoínecháin wrote:
[snip]

Now, I'm not quite sure that I completely comprehend matters: Is there
a difference between Asynchronous I/O and Buffered I/O?


* Asynchronous (a-syn-chron-ous) is an adjective which means "not together 
with time".
* Buffered means "read more than you need at the moment, and then do 
processing to/from a cache".


They are not mutually exclusive.

Their antonyms are
* Synchronous (syn-chron-ous): together with time.
* Direct, where you read/write only what you need at the moment, directly 
to/from the IO device.


Thus, *async* IO is where you tell the IO subsystem that you need something, 
and then *go off and do something else*; the IO system interrupts you when 
the data has been delivered.

*Sync*hronous IO is where you request IO and then *wait for the data*.

--
Angular momentum makes the world go 'round.


Re: PostgreSQL, Asynchronous I/O, Buffered I/O and why did fsync-gate not affect Oracle or MySQL?

2021-05-02 Thread Peter J. Holzer
On 2021-05-02 14:46:41 +0100, Pól Ua Laoínecháin wrote:
> Now, I'm not quite sure that I completely comprehend matters: Is there
> a difference between Asynchronous I/O and Buffered I/O?

Yes.

Buffered I/O means that there is a buffer (or maybe several layers of
buffers) between the application and the device: A read request might
return data which has been cached from a previous read request without
touching the device. And a write request will return as soon as the data
is written to the buffer (it will be written to the device at some later
time). The opposite of buffered I/O is direct I/O, which always talks
directly to the device and doesn't use any buffers.

Asynchronous I/O refers to a different programming model: Any read or
write request only initiates the data transfer and returns immediately.
The application will later be notified when the request is finished.
This is very different from the traditional (in Unix) synchronous
programming model where a read would block until the data was actually
available and a write would block until the data was safely transferred
to the OS's buffer cache (and can be overwritten by the application).

In theory all four combinations (buffered synchronous, buffered
asynchronous, direct synchronous, direct asynchronous) are possible, but
some OS's may not implement all of them.

hp


-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: Is this the future of I/O for the RDBMS?

2021-05-02 Thread Peter J. Holzer
On 2021-05-02 14:49:44 +0100, Pól Ua Laoínecháin wrote:
> While perusing the interweb, I stumbled on this very interesting blog
> post from TiDB.
> 
> 
> https://pingcap.com/blog/tikv-and-spdk-pushing-the-limits-of-storage-performance
> 
> It talks about the Storage Performance Development Kit (SPDK) (spdk.io).

This sounds certainly interesting. Without reading up on the details,
however, I notice that it implements a file system in user space which
means that the kernel cannot enforce any permissions. This may be ok for
a database (which typically runs as a single OS user anyway), but it is
something to consider. It looks somewhat similar to Oracle's "raw device
tablespaces" of the 1980s. By the time I got involved in database
programming in the late 1990s these were considered obsolete (negligible
performance advantage, but a hassle for the DBA). Maybe with NVME SSDs
and persistent memory like Intel Optane it is time to revisit that idea.

There are less intrusive possibilities, though: Linux has recently
(kernel 5.1 - oh, that is already 2 years old) aquired a new async I/O
API named io_uring, which eliminates the system call overhead. I haven't
played around with it myself, but some blog posts report quite
substantial performance improvements, in some cases approaching the
theoretical limits of the used (very fast) SSDs.


> Will this have any implications for PostgreSQL, given that it is a db
> that compiles/runs on a large number of systems - or can subsystems
> such as this be integrated/included for those chips which support it?
> This particular SDK appears to be Intel specific, but if one chip
> manufacturer can do it, can't they all (eventually)?

I don't think the chipset makes much of a difference. It's an open
source library written in C - it can almost certainly be recompiled for
ARM or whatever. It will almost certainly be linux-specific, though.
It's probably possible to write something similar for Windows, MacOS,
FreeBSD, etc. but I have no idea how hard that may be (maybe you just
have to change a few system calls - maybe you have to rewrite 80 percent
of it).

More important for PostgreSQL is whether something like this can be
incorporated without changing the overall architecture: If you just have
to change a handful of functions performing low-level I/O, it may be
worthwhile even if only a few systems (Linux systems where the DBA is
willing to set up devices for direct access from user space) benefit
from it. If it means rewriting large parts of postgres and then some
platforms cannot be supported at all or only at reduced performance,
this is not an option.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: "invalid contrecord" error on replica

2021-05-02 Thread Adrien Nayrat

Hello,


I encountered a similar issue with pg 13.

TL;DR: The secondary did not received a wal record (CHECKPOINT_SHUTDOWN) which 
corrupted the wal and he failed when he tried to replay it.



For a personal project I have a primary and a secondary with streaming 
replication and replication_slot.


I updated pgbackrest only on primary side, which broke archiving due to a 
different pgbackrest version on primary and secondary.
It filled the filesystem until the primary crashed. So, I updated pgbackrest on 
the secondary, archiving caught up and I did not look at secondary. I thought it 
will caught up too.


Unfortunately not:

FATAL:  could not connect to the primary server: FATAL:  the database system is 
in recovery mode

LOG:  started streaming WAL from primary at AA/A100 on timeline 1
FATAL:  could not receive data from WAL stream : ERROR:  requested starting 
point AA/A100 is ahead of the WAL flush position of this server AA/A0FFFBE8

LOG:  started streaming WAL from primary at AA/A100 on timeline 1
FATAL:  could not receive data from WAL stream : ERROR:  requested starting 
point AA/A100 is ahead of the WAL flush position of this server AA/A0FFFBE8

LOG:  started streaming WAL from primary at AA/A100 on timeline 1
FATAL:  could not receive data from WAL stream : ERROR:  requested starting 
point AA/A100 is ahead of the WAL flush position of this server AA/A0FFFBE8

LOG:  started streaming WAL from primary at AA/A100 on timeline 1
FATAL:  could not receive data from WAL stream : ERROR:  requested starting 
point AA/A100 is ahead of the WAL flush position of this server AA/A0FFFC20

[...]
LOG:  started streaming WAL from primary at AA/A100 on timeline 1
FATAL:  could not receive data from WAL stream : ERROR:  requested starting 
point AA/A100 is ahead of the WAL flush position of this server AA/A0FFFC20

LOG:  started streaming WAL from primary at AA/A100 on timeline 1
FATAL:  could not receive data from WAL stream : ERROR:  requested starting 
point AA/A100 is ahead of the WAL flush position of this server AA/A0FFFC20

LOG:  started streaming WAL from primary at AA/A100 on timeline 1
LOG:  invalid contrecord length 1523 at AA/A0FFFB70
FATAL:  terminating walreceiver process due to administrator command
LOG:  invalid contrecord length 1523 at AA/A0FFFB70
LOG:  invalid contrecord length 1523 at AA/A0FFFB70
[ previous line keep repeating ]

And several hours later:
FATAL:  terminating walreceiver process due to administrator command

No other messages arround, just the same "invalid contrecord length 1523 at 
AA/A0FFFB70" and from time to time "terminating walreceiver process due to 
administrator command".


During all this time, the primary kept WAL files because replication slot was 
inactive, until FS was full again.


Then, I decided to restart both (I just removed some useless files to have more 
place). Hoping the secondary be able to start streaming replication.


Unfortunately, the secondary asked for an old WAL file which where already 
recycled on primary:


LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  shutting down
LOG:  database system is shut down
LOG:  starting PostgreSQL 13.2 (Debian 13.2-1.pgdg90+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

LOG:  listening on IPv4 address "127.0.0.1", port 5433
LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
LOG:  database system was shut down in recovery at 2021-05-02 20:23:49 CEST
LOG:  entering standby mode
LOG:  redo starts at AA/9F29F7C8
LOG:  consistent recovery state reached at AA/A0FFFB70
LOG:  invalid contrecord length 1523 at AA/A0FFFB70
LOG:  database system is ready to accept read only connections
LOG:  started streaming WAL from primary at AA/A000 on timeline 1
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 
000100AA00A0 has already been removed


(Notice, he started replication at AA/A000 whereas, before the restart, he 
started at AA/A100)


I dig a little bit, on the primary, the restart_lsn of the replication slot was: 
AA/A1004B78


pg_controldata on secondary told:
Latest checkpoint location:   AA/9FC68B48
Latest checkpoint's REDO location:AA/9F29F7C8
Latest checkpoint's REDO WAL file:000100AA009F

I understood why he asked for a WAL before its restart_lsn:

He reached AA/A1004B78, after pg_ctl restart by me, he start replaying from 
"REDO location" at AA/9F29F7C8 until he reached AA/A0FFFB70 then asked for the 
WAL A0 and boom.


Why he stopped at AA/A0FFFB70 whereas the restart_lsn is AA/A1004B78 ?

I looked at 000100AA00A0 on the secondary, it end by:
rmgr: Transaction len (rec/tot): 34/34, tx:   55621934, lsn: 
AA/A0FFFAF8, prev AA/A0FFE7F8, desc: COMMIT 2021-05-01 17:25:51.949124 CEST
rmgr: Heaplen (rec/tot): 76/76, tx:   55621935, lsn: 
AA/A0FFFB20, prev A