On Wed, Jan 10, 2024 at 07:24:50PM -0500, Melanie Plageman wrote: > I have code review feedback as well, but I've saved that for my next email.
Ah, cool. > On Wed, Jan 3, 2024 at 8:11 AM Nazir Bilal Yavuz <byavu...@gmail.com> wrote: >> On Sun, 31 Dec 2023 at 03:58, Michael Paquier <mich...@paquier.xyz> wrote: >> Oh, I understand it now. Yes, that makes sense. >> I thought removing op_bytes completely ( as you said "This patch >> extends it with two more operation sizes, and there are even cases >> where it may be a variable" ) from pg_stat_io view then adding >> something like {read | write | extend}_bytes and {read | write | >> extend}_calls could be better, so that we don't lose any information. > > Upthread, Michael says: > >> I find the use of 1 in this context a bit confusing, because when >> referring to a counter at N, then it can be understood as doing N >> times a operation, > > I didn't understand this argument, so I'm not sure if I agree or > disagree with it. Nazir has mentioned upthread one thing: what should we do for the case where a combination of (io_object,io_context) does I/O with a *variable* op_bytes, because that may be the case for the WAL receiver? For this case, he has mentioned that we should set op_bytes to 1, but that's something I find confusing because it would mean that we are doing read, writes or extends 1 byte at a time. My suggestion would be to use op_bytes = -1 or NULL for the variable case instead, with reads, writes and extends referring to a number of bytes rather than a number of operations. > I think these are the three proposals for handling WAL reads: > > 1) setting op_bytes to 1 and the number of reads is the number of bytes > 2) setting op_bytes to XLOG_BLCKSZ and the number of reads is the > number of calls to pg_pread() or similar > 3) setting op_bytes to NULL and the number of reads is the number of > calls to pg_pread() or similar 3) could be a number of bytes, actually. > Looking at the patch, I think it is still doing 2. The patch disables stats for the WAL receiver, while the startup process reads WAL with XLOG_BLCKSZ, so yeah that's 2) with a trick to discard the variable case. > For an unpopular idea: we could add separate [IOOp]_bytes columns for > all those IOOps for which it would be relevant. It kind of stinks but > it would give us the freedom to document exactly what a single IOOp > means for each combination of BackendType, IOContext, IOObject, and > IOOp (as relevant) and still have an accurate number in the *bytes > columns. Everyone will probably hate us if we do that, though. > Especially because having bytes for the existing IOObjects is an > existing feature. An issue I have with this one is that having multiple tuples for each (object,context) if they have multiple op_bytes leads to potentially a lot of bloat in the view. That would be up to 8k extra tuples just for the sake of op_byte's variability. > A separate question: suppose [1] goes in (to read WAL from WAL buffers > directly). Now, WAL reads are not from permanent storage anymore. Are > we only tracking permanent storage I/O in pg_stat_io? I also had this > question for some of the WAL receiver functions. Should we track any > I/O other than permanent storage I/O? Or did I miss this being > addressed upthread? That's a good point. I guess that this should just be a different IOOp? That's not a IOOP_READ. A IOOP_HIT is also different. > In terms of what I/O we should track in a streaming/asynchronous > world, the options would be: > > 1) track read/write syscalls > 2) track blocks of BLCKSZ submitted to the kernel > 3) track bytes submitted to the kernel > 4) track merged I/Os (after doing any merging in the application) > > I think the debate was largely between 2 and 4. There was some > disagreement, but I think we landed on 2 because there is merging that > can happen at many levels in the storage stack (even the storage > controller). Distinguishing between whether or not Postgres submitted > 2 32k I/Os or 8 8k I/Os could be useful while you are developing AIO, > but I think it might be confusing for the Postgres user trying to > determine why their query is slow. It probably makes the most sense to > still track in block size. > > No matter what solution we pick, you should get a correct number if > you multiply op_bytes by an IOOp (assuming nothing is NULL). Or, > rather, there should be some way of getting an accurate number in > bytes of the amount of a particular kind of I/O that has been done. Yeah, coming back to op_bytes = -1/NULL as a tweak to mean that reads, writes or extends are counted as bytes, because we don't have a fixed operation size for some (object,context) cases. -- Michael
signature.asc
Description: PGP signature