Re: extensible options syntax for replication parser?
Hello Robert, My 0.02 €: It seems to me that we're making the same mistake with the replication parser that we've made in various placesin the regular parser: using a syntax for options that requires that every potential option be a keyword, and every potential option requires modification of the grammar. In particular, the syntax for the BASE_BACKUP command has accreted a whole lot of cruft already and I think that trend is likely to continue. I don't think that trying to keep people from adding options is a good strategy, Indeed. so instead I'd like to have a better way to do it. Attached is v1 of a patch to refactor things so that parts of the BASE_BACKUP and CREATE_REPLICATION_SLOT are replaced with a flexible options syntax. Patch applies cleanly, however compilation fails on: repl_gram.y:271:106: error: expected ‘;’ before ‘}’ Getting rid of "ident_or_keyword", some day, would be a relief. For boolean options, you are generating (EXPORT_SNAPSHOT TRUE) where (EXPORT_SNAPSHOT) would do. Maybe allowing (!EXPORT_SNAPSHOT) for (FOO FALSE) would be nice, if it is not allowed yet. That would also allow to get rid of FOO/NOFOO variants if any for FOO & !FOO, so one keyword is enough for a concept. There are some debatable decisions here, so I'd be happy to get some feedback on whether to go further with this, or less far, or maybe even just abandon the idea altogether. I doubt the last one is the right course, though: ISTM something's got to be done about the BASE_BACKUP case, at least. ISTM that it would be better to generalize the approach to all commands which accept options, so that the syntax is homogeneous. If people agree with the basic approach, I'll write docs. The intention is that we'd continue to support the existing syntax for the existing options, but the client tools would be adjusted to use the new syntax if the server's new enough, and any new options would be supported only through the new syntax. Yes. At some point in the distant future we could retire the old syntax, when we've stopped caring about compatibility with pre-14 versions. Just wondering: ISTM that the patch implies that dumping a v14 db generates the new syntax, which makes sense. Now I see 4 use cases wrt to version. # sourcetarget comment 1 v < 14v < 14 probably the dump would use one of the older version 2 v < 14v >= 14 upgrade 3 v >= 14 v < 14 downgrade: oops, the output uses the new syntax 4 v >= 14 v >= 14 ok Both cross version usages may be legitimate. In particular, 3 (oops, hardware issue, I have to move the db to a server where pg has not been upgraded) seems not possible because the generated syntax uses the new approach. Should/could there be some option to tell "please generate vXXX syntax" to allow that? -- Fabien.
Re: Recording test runtimes with the buildfarm
Hello Tom, I have in the past scraped the latter results and tried to make sense of them. They are *mighty* noisy, even when considering just one animal that I know to be running on a machine with little else to do. Maybe averaging across the whole buildfarm could reduce the noise level, but I'm not very hopeful. I'd try with median instead of average, so that bad cases due to animal overloading are ignored. -- Fabien.
Re: problem with RETURNING and update row movement
Hi Amit-san, On Thu, Jun 11, 2020 at 6:10 PM Amit Langote wrote: > Reproduction steps: > > create table foo (a int, b int) partition by list (a); > create table foo1 (c int, b int, a int); > alter table foo1 drop c; > alter table foo attach partition foo1 for values in (1); > create table foo2 partition of foo for values in (2); > create table foo3 partition of foo for values in (3); > create or replace function trigfunc () returns trigger language > plpgsql as $$ begin new.b := 2; return new; end; $$; > create trigger trig before insert on foo2 for each row execute > function trigfunc(); > insert into foo values (1, 1), (2, 2), (3, 3); > update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x returning > *; > ERROR: attribute 5 of type record has wrong type > DETAIL: Table has type record, but query expects integer. Reproduced. Could you add the patch to the next commitfest so that it doesn't get lost? Best regards, Etsuro Fujita
Re: Transactions involving multiple postgres foreign servers, take 2
>> Won't it create an inconsistency in viewing the data from the >> different servers? Say, such a transaction inserts one row into a >> local server and another into the foreign server. Now, if we follow >> the above protocol, the user will be able to see the row from the >> local server but not from the foreign server. > > Yes, you're right. This atomic commit feature doesn't guarantee such > consistent visibility so-called atomic visibility. Even the local > server is not modified, since a resolver process commits prepared > foreign transactions one by one another user could see an inconsistent > result. Providing globally consistent snapshots to transactions > involving foreign servers is one of the solutions. Another approach to the atomic visibility problem is to control snapshot acquisition timing and commit timing (plus using REPEATABLE READ). In the REPEATABLE READ transaction isolation level, PostgreSQL assigns a snapshot at the time when the first command is executed in a transaction. If we could prevent any commit while any transaction is acquiring snapshot, and we could prevent any snapshot acquisition while committing, visibility inconsistency which Amit explained can be avoided. This approach was proposed in a academic paper [1]. Good point with the approach is, we don't need to modify PostgreSQL at all. Downside of the approach is, we need someone who controls the timings (in [1], a middleware called "Pangea" was proposed). Also we need to limit the transaction isolation level to REPEATABLE READ. [1] http://www.vldb.org/pvldb/vol2/vldb09-694.pdf Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: Internal key management system
On Sat, 13 Jun 2020 at 05:59, Fabien COELHO wrote: > > > Hello Masahiko-san, > > > Summarizing the discussed points so far, I think that the major > > advantage points of your idea comparing to the current patch's > > architecture are: > > > > * More secure. Because it never loads KEK in postgres processes we can > > lower the likelihood of KEK leakage. > > Yes. > > > * More extensible. We will be able to implement more protocols to > > outsource other operations to the external place. > > Yes. > > > On the other hand, here are some downsides and issues: > > > > * The external place needs to manage more encryption keys than the > > current patch does. > > Why? If the external place is just a separate process on the same host, > probably it would manage the very same amount as what your patch. In the current patch, the external place needs to manage only one key whereas postgres needs to manages multiple DEKs. But with your idea, the external place needs to manage both KEK and DEKs. > > > Some cloud key management services are charged by the number of active > > keys and key operations. So the number of keys postgres requires affects > > the charges. It'd be worse if we were to have keys per table. > > Possibly. Note that you do not have to use a cloud storage paid as a > service. However, you could do it if there is an interface, because it > would allow postgres to do so if the user wishes that. That is the point > of having an interface that can be implemented differently for different > use cases. The same is true for the current patch. The user can get KEK from anywhere they want using cluster_passphrase_command. But as I mentioned above the number of keys that the user manages outside postgres is different. > > > * If this approach supports only GET protocol, the user needs to > > create encryption keys with appropriate ids in advance so that > > postgres can get keys by id. If postgres TDE creates keys as needed, > > CREATE protocol would also be required. > > I'm not sure. ISTM that if there is a KMS to manage keys, it could be its > responsability to actually create a key, however the client (pg) would > have to request it, basically say "given me a new key for this id". > > This could even work with a "get" command only, if the KMS is expected to > create a new key when asked for a key which does not exists yet. ISTM that > the client could (should?) only have to create identifiers for its keys. Yeah, it depends on KMS, meaning we need different extensions for different KMS. A KMS might support an interface that creates key if not exist during GET but some KMS might support CREATE and GET separately. > > > * If we need only GET protocol, the current approach (i.g. > > cluser_passphase_command) would be more simple. I imagine the > > interface between postgres and the extension is C function. > > Yes. ISTM that can be pretty simple, something like: > > A guc to define the process to start the interface (having a process means > that its uid can be changed), which would communicate on its stdin/stdout. > > A guc to define how to interact with the interface (eg whether DEK are > retrieved, or whether the interface is to ask for encryption/decryption, > or possibly some other modes). > > A few function: > > - set_key(, ); > # may retrieve the DEK, or only note that local id of some key. > > - encode(, ) -> > # may fail if no key is associated to local-id > # or if the service is down somehow > > - decode(, ) -> > # could also fail if there is some integrity check associated > > > This approach is more extensible > > Yep. > > > but it also means extensions need to support multiple protocols, leading > > to increase complexity and development cost. > > I do not understand what you mean by "multiple protocols". For me there is > one protocol, possibly a few commands in this protocol between client > (postgres) and DMS. Anyway, sending "GET " to retreive a DEK, for > instance, does not sound "complex". Here is some pseudo code: > > For get_key: > >if (mode of operation is to have DEKS locally) > try >send to KMS "get " >keys[local-id] = answer > catch & rethrow possible errors >elif (mode is to keep DEKs remote) > key_id[local-id] = key-id; >else ... > > For encode, the code is basically: > >if (has_key(local-id)) > if (mode of operation is to have DEKs locally) > return some_encode(key[local-id], data); > elif (mode is to keep DEKs remote) > send to KMS "encode key_id[local-id] data" > return answer; # or error > else ... >else > throw error local-id has no associated key; > > decode is more or less the same as encode. > > There is another thing to consider is how the client "proves" its identity > to the KMS interface, which might suggest some provisions when starting a > process, but you already have things in your patch to deal with the KEK, > which could be turned into some g
Re: Internal key management system
Hello Masahiko-san, * The external place needs to manage more encryption keys than the current patch does. Why? If the external place is just a separate process on the same host, probably it would manage the very same amount as what your patch. In the current patch, the external place needs to manage only one key whereas postgres needs to manages multiple DEKs. But with your idea, the external place needs to manage both KEK and DEKs. Hmmm. I do not see a good use case for a "management system" which would only have to manage a singleton. ISTM that one point of using one KEK is to allows several DEKs under it. Maybe I have missed something in your patch, but only one key is a very restricted use case. Some cloud key management services are charged by the number of active keys and key operations. So the number of keys postgres requires affects the charges. It'd be worse if we were to have keys per table. Possibly. Note that you do not have to use a cloud storage paid as a service. However, you could do it if there is an interface, because it would allow postgres to do so if the user wishes that. That is the point of having an interface that can be implemented differently for different use cases. The same is true for the current patch. The user can get KEK from anywhere they want using cluster_passphrase_command. Yep. Somehow I'm proposing to have a command to get DEKs instead of just the KEK, otherwise it is not that far. But as I mentioned above the number of keys that the user manages outside postgres is different. Yep, and I do not think that "only one key" approach is good. I really missed something in the patch. From a use case point of view, I thought that the user could have has many keys has they see fit. Maybe one per cluser, or database, or table, or a row if for some reason the application would demand it. I do not think that the pg should decide that, among other things. That is why I'm constantly refering to a "key identifier", and in the pseudo code I added a "local id" (typically an int). * If this approach supports only GET protocol, the user needs to create encryption keys with appropriate ids in advance so that postgres can get keys by id. If postgres TDE creates keys as needed, CREATE protocol would also be required. I'm not sure. ISTM that if there is a KMS to manage keys, it could be its responsability to actually create a key, however the client (pg) would have to request it, basically say "given me a new key for this id". This could even work with a "get" command only, if the KMS is expected to create a new key when asked for a key which does not exists yet. ISTM that the client could (should?) only have to create identifiers for its keys. Yeah, it depends on KMS, meaning we need different extensions for different KMS. A KMS might support an interface that creates key if not exist during GET but some KMS might support CREATE and GET separately. I disagree that it is necessary, but this is debatable. The KMS-side interface code could take care of that, eg: if command is "get X" if (X does not exist in KMS) create a new key stored in KMS, return it; else return KMS-stored key; ... So you can still have a "GET" only interface which adapts to the "final" KMS. Basically, the glue code which implements the interface for the KMS can include some logic to adapt to the KMS point of view. * If we need only GET protocol, the current approach (i.g. The point of a discussion is basically to present arguments. My point is the same as Bruce. I'm concerned about the fact that even if we introduce this approach the present data could still be stolen when a postgres process is compromised. Yes, sure. It seems to me that your approach is extensible and can protect data from threats in addition to threats that the current patch can protect but it would bring some costs and complexity instead comparing to the current patch. I'd like to hear opinions from other hackers in the community. I'd like an extensible design to have anything in core. As I said in an other mail, if you want to handle a somehow restricted use case, just provide an external extension and do nothing in core, please. Put in core something that people with a slightly different use case or auditor can build on as well. The current patch makes a dozen hard-coded decisions which it should not, IMHO. I think the actual code would help to explain how your approach is not complexed. I provided quite some pseudo code, including some python. I'm not planning to redevelop the whole thing: my contribution is a review, currently about the overall design, then if somehow I agree on the design, I would look at the code more precisely. -- Fabien.
[PATCH] Initial progress reporting for COPY command
Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist ( https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting. Few examples first: "COPY (SELECT * FROM test) TO '/tmp/ids';" yr=# SELECT * from pg_stat_progress_copy; pid | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed -+---+-+---+---+--+-+-+-- 3347126 | 16384 | yr | 0 | TO| t| f | 3529943 | 24906226 (1 row) "COPY test FROM '/tmp/ids'; yr=# SELECT * from pg_stat_progress_copy; pid | datid | datname | relid | direction | file | program | lines_processed | file_bytes_processed -+---+-+---+---+--+-+-+-- 3347126 | 16384 | yr | 16385 | FROM | t| f | 121591999 |957218816 (1 row) Columns are inspired by CREATE INDEX progress report system view. pid - integer - PID of backend datid - oid - OID of related database datname - name - name of related database (this seems redundant, since oid should be enough, but it is the same in CREATE INDEX) relid - oid - oid of table related to COPY command, when not known (for example when copying to file, it is 0) direction - text - one of "FROM" or "TO" depends on command used file - bool - is file is used? program - bool - is program used? lines_processed - bigint - amount of processed lines, works for both directions (FROM/TO) file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction ( FROM/TO) when file is used (file = t) Patch is attached and can be found also at https://github.com/simi/postgres/pull/5. Diff version: https://github.com/simi/postgres/pull/5.diff Patch version: https://github.com/simi/postgres/pull/5.patch I havefew initial notes and questions. I'm using ftell to get current position in file to populate file_bytes_processed without error handling (ftell can return -1L and also populate errno on problems). 1. Is that a good way to get progress of file processing? 2. Is it safe in given context to not care about errors? If not, what to do on error? Some columns are not populated on certain COPY commands. For example when a file is not used, file_bytes_processed is set to 0. Would it be better to use NULL instead when the column is not related to the current command? Same problem is for relid column. I have not found any tests for progress reporting. Are there any? It would need two backends running (one running COPY, one checking output of report view). Is there any similar test I can inspire at? In theory, it should be possible to use dblink_send_query to run async COPY command in the background. My initial (attached) patch also doesn't introduce documentation for this system view. I can add that later once this patch is finalized (if that happens). copy-progress-v1.patch Description: Binary data
Re: problem with RETURNING and update row movement
On Sun, Jun 14, 2020 at 4:23 PM Etsuro Fujita wrote: > Hi Amit-san, > > On Thu, Jun 11, 2020 at 6:10 PM Amit Langote wrote: > > Reproduction steps: > > > > create table foo (a int, b int) partition by list (a); > > create table foo1 (c int, b int, a int); > > alter table foo1 drop c; > > alter table foo attach partition foo1 for values in (1); > > create table foo2 partition of foo for values in (2); > > create table foo3 partition of foo for values in (3); > > create or replace function trigfunc () returns trigger language > > plpgsql as $$ begin new.b := 2; return new; end; $$; > > create trigger trig before insert on foo2 for each row execute > > function trigfunc(); > > insert into foo values (1, 1), (2, 2), (3, 3); > > update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x > > returning *; > > ERROR: attribute 5 of type record has wrong type > > DETAIL: Table has type record, but query expects integer. > > Reproduced. Could you add the patch to the next commitfest so that it > doesn't get lost? Done. Thank you for taking a look. https://commitfest.postgresql.org/28/2597/ -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Re: Postgresql13_beta1 (could not rename temporary statistics file) Windows 64bits
posix rename, "renames a file, moving it between directories if required". pgrename, win32 port uses MoveFileEx, to support rename files at Windows side, but, actually don't allow "renames a file, moving it between directories if required". To match the same characteristics as posix rename, we need to add a flag to MoveFileEx (MOVEFILE_COPY_ALLOWED) Which allows, if necessary, to move between volumes, drives and directories. Such a resource seems to decrease the chances of occurring, permission denied, when renaming the temporary statistics file. allow_win32_rename_across_volumes.patch Description: Binary data
Re: hashagg slowdown due to spill changes
Hi, On 2020-06-12 15:29:08 -0700, Jeff Davis wrote: > On Fri, 2020-06-12 at 14:37 -0700, Andres Freund wrote: > > I don't see why it's ok to force an additional projection in the very > > common case of hashaggs over a few rows. So I think we need to > > rethink > > 4cad2534da6. > > One possibility is to project only spilled tuples, more similar to > Melanie's patch from a while ago: > > > https://www.postgresql.org/message-id/caakru_aefesv+ukqwqu+ioenoil2lju9diuh9br8mbyxuz0...@mail.gmail.com > > Which makes sense, but it's also more code. I'm somewhat inclined to think that we should revert 4cad2534da6 and then look at how precisely to tackle this in 14. It'd probably make sense to request small tlists when the number of estimated groups is large, and not otherwise. Greetings, Andres Freund
Re: [PATCH] Initial progress reporting for COPY command
Hi Josef, On Sun, Jun 14, 2020 at 02:32:33PM +0200, Josef Šimánek wrote: > Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL > maillist ( > https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), > I have prepared an initial patch for COPY command progress reporting. Sounds like a good idea to me. > I have not found any tests for progress reporting. Are there any? It would > need two backends running (one running COPY, one checking output of report > view). Is there any similar test I can inspire at? In theory, it should be > possible to use dblink_send_query to run async COPY command in the > background. We don't have any tests in core. I think that making deterministic test cases is rather tricky here as long as we don't have a more advanced testing framework that allows is to lock certain code paths and keep around an expected state until a second session comes around and looks at the progress catalog (even that would need adding more code to core to mark the extra point looked at). So I think that it is fine to not focus on that for this feature. The important parts are the choice of the progress points and the data sent to MyProc, and both should be chosen wisely. > My initial (attached) patch also doesn't introduce documentation for this > system view. I can add that later once this patch is finalized (if that > happens). You may want to add it to the next commit fest: https://commitfest.postgresql.org/28/ Documentation is necessary, and having some would ease reviews. -- Michael signature.asc Description: PGP signature
Re: Include access method in listTables output
On Tue, Jun 9, 2020 at 6:45 PM Georgios wrote: > > > Please add it to the commitfest at https://commitfest.postgresql.org/28/ > > Thank you very much for your time. Added to the commitfest as suggested. Patch applies cleanly, make check & make check-world passes. Few comments: + if (pset.sversion >= 12) + appendPQExpBufferStr(&buf, + "\n LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam"); Should we include pset.hide_tableam check along with the version check? + if (pset.sversion >= 12 && !pset.hide_tableam) + { + appendPQExpBuffer(&buf, + ",\n am.amname as \"%s\"", + gettext_noop("Access Method")); + } Braces can be removed & the second line can be combined with earlier. We could include the test in psql file by configuring HIDE_TABLEAM. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: valgrind versus pg_atomic_init()
On Sun, Jun 07, 2020 at 12:23:35AM -0400, Tom Lane wrote: > I experimented with running "make check" on ARM64 under a reasonably > bleeding-edge valgrind (3.16.0). One thing I ran into is that > regress.c's test_atomic_ops fails; valgrind shows the stack trace > >fun:__aarch64_cas8_acq_rel >fun:pg_atomic_compare_exchange_u64_impl >fun:pg_atomic_exchange_u64_impl >fun:pg_atomic_write_u64_impl >fun:pg_atomic_init_u64_impl >fun:pg_atomic_init_u64 >fun:test_atomic_uint64 >fun:test_atomic_ops >fun:ExecInterpExpr > > Now, this is basically the same thing as is already memorialized in > src/tools/valgrind.supp: > > # Atomic writes to 64bit atomic vars uses compare/exchange to > # guarantee atomic writes of 64bit variables. pg_atomic_write is used > # during initialization of the atomic variable; that leads to an > # initial read of the old, undefined, memory value. But that's just to > # make sure the swap works correctly. > { > uninitialized_atomic_init_u64 > Memcheck:Cond > fun:pg_atomic_exchange_u64_impl > fun:pg_atomic_write_u64_impl > fun:pg_atomic_init_u64_impl > } > > so my first thought was that we just needed an architecture-specific > variant of that. But on thinking more about this, it seems like > generic.h's version of pg_atomic_init_u64_impl is just fundamentally > misguided. Why isn't it simply assigning the value with an ordinary > unlocked write? By definition, we had better not be using this function > in any circumstance where there might be conflicting accesses Does something guarantee the write will be globally-visible by the time the first concurrent accessor shows up? (If not, one could (a) do an unlocked ptr->value=0, then the atomic write, or (b) revert and improve the suppression.) I don't doubt it's fine for the ways PostgreSQL uses atomics today, which generally initialize an atomic before the concurrent-accessor processes even exist. > , so I don't > see why we should need to tolerate a valgrind exception here. Moreover, > if a simple assignment *isn't* good enough, then surely the spinlock > version in atomics.c is 100% broken. Are you saying it would imply a bug in atomics.c pg_atomic_init_u64_impl(), pg_atomic_compare_exchange_u64_impl(), or pg_atomic_fetch_add_u64_impl()? Can you explain that more? If you were referring to unlocked "*(lock) = 0", that is different since it's safe to have a delay in propagation of the change from locked state to unlocked state.
Re: Postgresql13_beta1 (could not rename temporary statistics file) Windows 64bits
On Fri, Jun 12, 2020 at 03:15:52PM -0300, Ranier Vilela wrote: > Posgres13_beta1, is consistently writing to the logs, "could not rename > temporary statistics file". > When analyzing the source that writes the log, I simplified the part that > writes the logs a little. FWIW, I have been running a server on Windows for some time with pgbench running in the background, combined with some starts and stops but I cannot see that. This log entry uses LOG, which is the level we use for the TAP tests and please note that there are four buildfarm animals for Windows able to run the TAP tests and they don't seem to show that problem either: drongo, fairywen, jacana and bowerbird. I may be missing something of course. -- Michael signature.asc Description: PGP signature
Re: valgrind versus pg_atomic_init()
Noah Misch writes: > On Sun, Jun 07, 2020 at 12:23:35AM -0400, Tom Lane wrote: >> ... But on thinking more about this, it seems like >> generic.h's version of pg_atomic_init_u64_impl is just fundamentally >> misguided. Why isn't it simply assigning the value with an ordinary >> unlocked write? By definition, we had better not be using this function >> in any circumstance where there might be conflicting accesses > Does something guarantee the write will be globally-visible by the time the > first concurrent accessor shows up? (If not, one could (a) do an unlocked > ptr->value=0, then the atomic write, or (b) revert and improve the > suppression.) I don't doubt it's fine for the ways PostgreSQL uses atomics > today, which generally initialize an atomic before the concurrent-accessor > processes even exist. Perhaps it'd be worth putting a memory barrier at the end of the _init function(s)? As you say, this is hypothetical right now, but that'd be a cheap improvement. >> if a simple assignment *isn't* good enough, then surely the spinlock >> version in atomics.c is 100% broken. > Are you saying it would imply a bug in atomics.c pg_atomic_init_u64_impl(), > pg_atomic_compare_exchange_u64_impl(), or pg_atomic_fetch_add_u64_impl()? Can > you explain that more? My point was that doing SpinLockInit while somebody else is already trying to acquire or release the spinlock is not going to work out well. So there has to be a really clear boundary between "initialization" mode and "use" mode; which is more or less the same point you make above. In practice, if that line is so fine that we need a memory sync operation to enforce it, things are probably broken anyhow. regards, tom lane
Re: Postgresql13_beta1 (could not rename temporary statistics file) Windows 64bits
On Fri, Jun 12, 2020 at 03:15:52PM -0300, Ranier Vilela wrote: > Posgres13_beta1, is consistently writing to the logs, "could not rename > temporary statistics file". > When analyzing the source that writes the log, I simplified the part that > writes the logs a little. What windows version and compiler ? Please show the full CSV log for this event, and not an excerpt. Preferably with several lines of "context" for the stats process PID, with log_min_messages=debug or debug2 and log_error_verbosity=verbose, so that you get the file location where it's erroring, if you don't already know that. https://wiki.postgresql.org/wiki/Guide_to_reporting_problems > 1. I changed from if else if to if > 2. For the user, better to have more errors recorded, which can help in > discovering the problem > 3. Errors are independent of each other > 4. If I can't release tmpfile, there's no way to delete it (unlink) > 5. If I can rename, there is no need to delete it (unlink) tmpfile. > > Attached is the patch that proposes these changes. > Now, the problem has not been solved. It sounds like you haven't yet found the problem, right ? These are all unrelated changes which are confusing the problem report and discussion. And introducing behavior regressions, like renaming files with write errors on top of known good files. I think you'll want to 1) identify where the problem is occuring, and attach a debugger there. 2) figure out when the problem was introduced. If this problem doesn't happen under v12: git log --cherry-pick -p origin/REL_12_STABLE...origin/REL_13_STABLE -- src/backend/postmaster/pgstat.c or just: git log -p origin/REL_12_STABLE.. src/backend/postmaster/pgstat.c You could try git-bisecting between v12..v13, but there's only 30 commits which touched pgstat.c (assuming that's where the ERROR is being thrown). Do you have a special value of stats_temp_directory? Or a symlink or junction at pg_stat_tmp ? > 1. statfile, is it really closed or does it not exist in the directory? > There is no way to rename a file, which is open and in use. > 2. Why delete (pgstat_stat_filename), if permanent is true: > const char * statfile = permanent? PGSTAT_STAT_PERMANENT_FILENAME: > pgstat_stat_filename; > statfile is PGSTAT_STAT_PERMANENT_FILENAME and not pgstat_stat_filename You can find answers to a lot of questions in the git history. In this case, 70d756970 and 187492b6c. -- Justin
Re: create database with template doesn't copy database ACL
On Sun, Jun 14, 2020 at 07:26:13AM +, Joseph Nahmias wrote: > On Fri, Jun 12, 2020 at 05:29:51PM -0400, Bruce Momjian wrote: > > On Fri, Jun 5, 2020 at 02:31:34PM +, PG Doc comments form wrote: > > > The following documentation comment has been logged on the website: > > > > > > Page: https://www.postgresql.org/docs/11/sql-createdatabase.html > > > Description: > > > > > > My understanding is that not copying the ACL is the (currently) expected > > > behavior when issuing CREATE DATABASE newdb WITH TEMPLATE my_tmpl; > > > It would be useful for the documentation to note this caveat. > > > > Uh, what ACLs are not copied? > > The ACL on the database itself. For example: > > postgres@postgres[[local]#9655]=# CREATE DATABASE acl_template WITH > IS_TEMPLATE = 1; > CREATE DATABASE > postgres@postgres[[local]#9655]=# REVOKE ALL ON DATABASE acl_template FROM > PUBLIC; > REVOKE > postgres@postgres[[local]#9655]=# CREATE DATABASE acl_test WITH TEMPLATE = > acl_template; > CREATE DATABASE > postgres@postgres[[local]#9655]=# SELECT datname, datacl FROM pg_database > WHERE datname LIKE 'acl%'; >datname| datacl > --+- > acl_template | {postgres=CTc/postgres} > acl_test | > (2 rows) > > Here, the ACL on the new acl_test database does NOT match the ACL on the > acl_template database upon which it is based. [I am moving this to the hackers list because I am not clear if this is a documentation problem or a bug.] Effectively, we have three levels of objects: 1 global, cluster-wide, e.g., tablespaces, users 2 database attributes, e.g., database encoding, database tablespace 3 objects inside of databases We don't clearly describe it that way though. Looking at the test: psql -a
Re: create database with template doesn't copy database ACL
Bruce Momjian writes: > I am unclear if we should be copying the CONNECT and TEMPORARY > attributes or documenting that CREATE DATABASE does not copy them. We should absolutely not copy them. As an example, it'd make sense for an admin to revoke CONNECT on a template database, just to help ensure that nobody modifies it. If that propagated to every created database, it would be a complete fail. Moreover, since the ACLs of an object depend quite a bit on who the owner is, it'd make no sense to copy them to a new object that has a different owner. The granted-by fields would be wrong, if nothing else. In practice, CREATE DATABASE never has copied any database-level property of the template DB, only its contents. (Well, I guess it copies encoding and collation by default, but those are descriptive of the contents.) regards, tom lane
Re: create database with template doesn't copy database ACL
On Sun, Jun 14, 2020 at 11:24:56PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > I am unclear if we should be copying the CONNECT and TEMPORARY > > attributes or documenting that CREATE DATABASE does not copy them. > > We should absolutely not copy them. > > As an example, it'd make sense for an admin to revoke CONNECT on a > template database, just to help ensure that nobody modifies it. > If that propagated to every created database, it would be a complete > fail. > > Moreover, since the ACLs of an object depend quite a bit on who the owner > is, it'd make no sense to copy them to a new object that has a different > owner. The granted-by fields would be wrong, if nothing else. > > In practice, CREATE DATABASE never has copied any database-level property > of the template DB, only its contents. (Well, I guess it copies encoding > and collation by default, but those are descriptive of the contents.) Well, I thought we copied everything except things tha can be specified as different in CREATE DATABASE, though I can see why we would not copy them. Should we document this or issue a notice about not copying non-default database attributes? -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
min_safe_lsn column in pg_replication_slots view
Hi, Per the docs, pg_replication_slots.min_safe_lsn inedicates "the minimum LSN currently available for walsenders". When I executed pg_walfile_name() with min_safe_lsn, the function returned the name of the last removed WAL file instead of minimum available WAL file name. This happens because min_safe_lsn actually indicates the ending position (the boundary byte) of the last removed WAL file. I guess that some users would want to calculate the minimum available WAL file name from min_safe_lsn by using pg_walfile_name(), but the result would be incorrect. Isn't this confusing? min_safe_lsn should indicate the bondary byte + 1, instead? BTW, I just wonder why each row in pg_replication_slots needs to have min_safe_lsn column? Basically min_safe_lsn should be the same between every replication slots. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Fri, Jun 12, 2020 at 4:35 PM Amit Kapila wrote: > > On Fri, Jun 12, 2020 at 11:38 AM Dilip Kumar wrote: > > > > - Currently, while reading/writing the streaming/subxact files we are > > reporting the wait event for example > > 'pgstat_report_wait_start(WAIT_EVENT_LOGICAL_SUBXACT_WRITE);', but > > BufFileWrite/BufFileRead internally reports the read/write wait event. > > So I think we can avoid reporting that? > > > > Yes, we can avoid that. No other place using BufFileRead does any > such reporting. I agree. > > Basically, this part is still > > I have to work upon, once we get the consensus then I can remove those > > extra wait event from the patch. > > > > Okay, feel free to send an updated patch with the above change. Sure, I will do that in the next patch set. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: doc examples for pghandler
On Sat, Jun 13, 2020 at 01:19:17PM +0900, Michael Paquier wrote: > On Fri, Jun 12, 2020 at 10:13:41PM -0400, Tom Lane wrote: > > On second thought, contrib/ is not quite the right place, because we > > typically expect modules there to actually get installed, meaning they > > have to have at least some end-user usefulness. The right place for > > a toy PL handler is probably src/test/modules/; compare for example > > src/test/modules/test_parser/, which is serving quite the same sort > > of purpose as a skeleton text search parser. > > +1 for src/test/modules/, and if you can provide some low-level API > coverage through this module, that's even better. Sounds good to me. Something more like the attached patch? Regards, Mark -- Mark Wong 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/ diff --git a/doc/src/sgml/plhandler.sgml b/doc/src/sgml/plhandler.sgml index e1b0af7a60..7b2c5624c0 100644 --- a/doc/src/sgml/plhandler.sgml +++ b/doc/src/sgml/plhandler.sgml @@ -96,62 +96,12 @@ -This is a template for a procedural-language handler written in C: - -#include "postgres.h" -#include "executor/spi.h" -#include "commands/trigger.h" -#include "fmgr.h" -#include "access/heapam.h" -#include "utils/syscache.h" -#include "catalog/pg_proc.h" -#include "catalog/pg_type.h" - -PG_MODULE_MAGIC; - -PG_FUNCTION_INFO_V1(plsample_call_handler); - -Datum -plsample_call_handler(PG_FUNCTION_ARGS) -{ -Datum retval; - -if (CALLED_AS_TRIGGER(fcinfo)) -{ -/* - * Called as a trigger function - */ -TriggerData*trigdata = (TriggerData *) fcinfo->context; - -retval = ... -} -else -{ -/* - * Called as a function - */ - -retval = ... -} - -return retval; -} - -Only a few thousand lines of code have to be added instead of the -dots to complete the call handler. - - - -After having compiled the handler function into a loadable module -(see ), the following commands then -register the sample procedural language: - -CREATE FUNCTION plsample_call_handler() RETURNS language_handler -AS 'filename' -LANGUAGE C; -CREATE LANGUAGE plsample -HANDLER plsample_call_handler; - +A template for a procedural-language handler written as a C extension is +provided in src/test/modules/plsample. This is a +working sample demonstrating one way to create a procedural-language +handler, process parameters, and return a value. A few thousand lines of +additional code may have to be added to complete a fully functional +handler. diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index 29de73c060..95144d8d7c 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -9,6 +9,7 @@ SUBDIRS = \ commit_ts \ dummy_index_am \ dummy_seclabel \ + plsample \ snapshot_too_old \ test_bloomfilter \ test_ddl_deparse \ diff --git a/src/test/modules/plsample/Makefile b/src/test/modules/plsample/Makefile new file mode 100644 index 00..757b47c785 --- /dev/null +++ b/src/test/modules/plsample/Makefile @@ -0,0 +1,20 @@ +# src/test/modules/plsample/Makefile + +PGFILEDESC = "PL/Sample - procedural language" + +REGRESS = create_pl create_func select_func + +EXTENSION = plsample +EXTVERSION = 0.1 + +MODULE_big = plsample + +OBJS = plsample.o + +DATA = plsample.control plsample--0.1.sql + +plsample.o: plsample.c + +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) diff --git a/src/test/modules/plsample/README b/src/test/modules/plsample/README new file mode 100644 index 00..7ee213700b --- /dev/null +++ b/src/test/modules/plsample/README @@ -0,0 +1,3 @@ +plsample is an example procedural-language handler. It is a simple functional +template that demonstrates some of the things that need to be done in order to +build a fully functional procedural-language handler. diff --git a/src/test/modules/plsample/expected/create_func.out b/src/test/modules/plsample/expected/create_func.out new file mode 100644 index 00..df2b915a97 --- /dev/null +++ b/src/test/modules/plsample/expected/create_func.out @@ -0,0 +1,5 @@ +CREATE FUNCTION plsample_func(a1 NUMERIC, a2 TEXT, a3 INTEGER[]) +RETURNS TEXT +AS $$ + This is function's source text. +$$ LANGUAGE plsample; diff --git a/src/test/modules/plsample/expected/create_pl.out b/src/test/modules/plsample/expected/create_pl.out new file mode 100644 index 00..5365391284 --- /dev/null +++ b/src/test/modules/plsample/expected/create_pl.out @@ -0,0 +1,8 @@ +CREATE FUNCTION plsample_call_handler() +RETURNS language_handler +AS '$libdir/plsample' +LANGUAGE C; +CREATE LANGUAGE plsample +HANDLER plsample_call_handler; +COMMENT ON LANGUAGE plsample +IS 'PL/Sample procedural language'; diff --git a/src/test/modules/plsample/expected/select_func.out b/src/test/modules/plsample/expected/select_func.out new fi
Re: Asynchronous Append on postgres_fdw nodes.
The patch has a problem with partitionwise aggregates. Asynchronous append do not allow the planner to use partial aggregates. Example you can see in attachment. I can't understand why: costs of partitionwise join are less. Initial script and explains of the query with and without the patch you can see in attachment. -- Andrey Lepikhov Postgres Professional https://postgrespro.com frgn2n.sh Description: application/shellscript Execution without asynchronous append = explain analyze SELECT sum(parts.b) FROM parts, second WHERE parts.a = second.a AND second.b < 100; QUERY PLAN - Finalize Aggregate (cost=2144.36..2144.37 rows=1 width=8) (actual time=25.821..25.821 rows=1 loops=1) -> Append (cost=463.39..2144.35 rows=4 width=8) (actual time=9.495..25.816 rows=4 loops=1) -> Partial Aggregate (cost=463.39..463.40 rows=1 width=8) (actual time=9.495..9.495 rows=1 loops=1) -> Hash Join (cost=5.58..463.33 rows=27 width=4) (actual time=0.109..9.486 rows=27 loops=1) Hash Cond: (parts.a = second.a) -> Seq Scan on part_0 parts (cost=0.00..363.26 rows=25126 width=8) (actual time=0.018..5.901 rows=25126 loops=1) -> Hash (cost=5.24..5.24 rows=27 width=4) (actual time=0.084..0.084 rows=27 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on second_0 second (cost=0.00..5.24 rows=27 width=4) (actual time=0.014..0.071 rows=27 loops=1) Filter: (b < 100) Rows Removed by Filter: 232 -> Partial Aggregate (cost=560.68..560.69 rows=1 width=8) (actual time=6.017..6.017 rows=1 loops=1) -> Foreign Scan (cost=105.29..560.61 rows=29 width=4) (actual time=6.008..6.011 rows=30 loops=1) Relations: (part_1 parts_1) INNER JOIN (second_1) -> Partial Aggregate (cost=560.88..560.89 rows=1 width=8) (actual time=5.920..5.920 rows=1 loops=1) -> Foreign Scan (cost=105.75..560.82 rows=24 width=4) (actual time=5.908..5.912 rows=25 loops=1) Relations: (part_2 parts_2) INNER JOIN (second_2) -> Partial Aggregate (cost=559.33..559.34 rows=1 width=8) (actual time=4.380..4.381 rows=1 loops=1) -> Foreign Scan (cost=105.09..559.29 rows=16 width=4) (actual time=4.371..4.373 rows=17 loops=1) Relations: (part_3 parts_3) INNER JOIN (second_3) Planning Time: 6.734 ms Execution Time: 26.079 ms Execution with asynchronous append == explain analyze SELECT sum(parts.b) FROM parts, second WHERE parts.a = second.a AND second.b < 100; QUERY PLAN --- Finalize Aggregate (cost=5758.83..5758.84 rows=1 width=8) (actual time=184.849..184.849 rows=1 loops=1) -> Append (cost=727.82..5758.82 rows=4 width=8) (actual time=11.735..184.843 rows=4 loops=1) -> Partial Aggregate (cost=727.82..727.83 rows=1 width=8) (actual time=11.735..11.735 rows=1 loops=1) -> Hash Join (cost=677.34..725.94 rows=753 width=4) (actual time=11.693..11.729 rows=27 loops=1) Hash Cond: (second.a = parts.a) -> Seq Scan on second_0 second (cost=0.00..38.25 rows=753 width=4) (actual time=0.024..0.052 rows=27 loops=1) Filter: (b < 100) Rows Removed by Filter: 232 -> Hash (cost=363.26..363.26 rows=25126 width=8) (actual time=11.644..11.644 rows=25126 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 1238kB -> Seq Scan on part_0 parts (cost=0.00..363.26 rows=25126 width=8) (actual time=0.013..5.595 rows=25126 loops=1) -> Partial Aggregate (cost=1676.97..1676.98 rows=1 width=8) (actual time=58.958..58.958 rows=1 loops=1) -> Hash Join (cost=1377.15..1440.85 rows=94449 width=4) (actual time=58.922..58.948 rows=30 loops=1) Hash Cond: (second_1.a = parts_1.a) -> Foreign Scan on second_1 (cost=100.00..153.31 rows=753 width=4) (actual time=0.366..0.374 rows=30 loops=1) -> Hash (cost=963.58..963.58 rows=25086 width=8) (actual time=58.534..58.534 rows=24978 loops=1)
Re: create database with template doesn't copy database ACL
Bruce Momjian writes: > Well, I thought we copied everything except things tha can be specified > as different in CREATE DATABASE, though I can see why we would not copy > them. Should we document this or issue a notice about not copying > non-default database attributes? We do not need a notice for behavior that (a) has stood for twenty years or so, and (b) is considerably less broken than any alternative would be. If you feel the docs need improvement, have at that. regards, tom lane
Re: valgrind versus pg_atomic_init()
Hi, On 2020-06-14 18:55:27 -0700, Noah Misch wrote: > Does something guarantee the write will be globally-visible by the time the > first concurrent accessor shows up? The function comments say: * * Has to be done before any concurrent usage.. * * No barrier semantics. > (If not, one could (a) do an unlocked ptr->value=0, then the atomic > write, or (b) revert and improve the suppression.) I don't doubt it's > fine for the ways PostgreSQL uses atomics today, which generally > initialize an atomic before the concurrent-accessor processes even > exist. I think it's unlikely that there are cases where you could safely initialize the atomic without needing some form of synchronization before it can be used. If a barrier were needed, what'd guarantee the concurrent access happened after the initialization in the first place? Greetings, Andres Freund
Re: valgrind versus pg_atomic_init()
Hi, On 2020-06-14 22:30:25 -0400, Tom Lane wrote: > Noah Misch writes: > > On Sun, Jun 07, 2020 at 12:23:35AM -0400, Tom Lane wrote: > >> ... But on thinking more about this, it seems like > >> generic.h's version of pg_atomic_init_u64_impl is just fundamentally > >> misguided. Why isn't it simply assigning the value with an ordinary > >> unlocked write? By definition, we had better not be using this function > >> in any circumstance where there might be conflicting accesses > > > Does something guarantee the write will be globally-visible by the time the > > first concurrent accessor shows up? (If not, one could (a) do an unlocked > > ptr->value=0, then the atomic write, or (b) revert and improve the > > suppression.) I don't doubt it's fine for the ways PostgreSQL uses atomics > > today, which generally initialize an atomic before the concurrent-accessor > > processes even exist. > > Perhaps it'd be worth putting a memory barrier at the end of the _init > function(s)? As you say, this is hypothetical right now, but that'd be > a cheap improvement. I don't think it'd be that cheap for some cases. There's an atomic for every shared buffer, making their initialization full memory barriers would likely be noticable for larger shared_buffers values. As you say: > In practice, if that line is so fine that we need a memory sync operation > to enforce it, things are probably broken anyhow. Greetings, Andres Freund
Re: valgrind versus pg_atomic_init()
Andres Freund writes: > On 2020-06-14 22:30:25 -0400, Tom Lane wrote: >> Perhaps it'd be worth putting a memory barrier at the end of the _init >> function(s)? As you say, this is hypothetical right now, but that'd be >> a cheap improvement. > I don't think it'd be that cheap for some cases. There's an atomic for > every shared buffer, making their initialization full memory barriers > would likely be noticable for larger shared_buffers values. Fair point --- if we did need to do something to make this safer, doing it at the level of individual atomic values would be the wrong thing anyway. regards, tom lane
Re: vacuum verbose: show pages marked allvisible/frozen/hintbits
On Sun, 26 Jan 2020 at 23:13, Justin Pryzby wrote: > > I'm forking this thread since it's separate topic, and since keeping in a > single branch hasn't made maintaining the patches any easier. > https://www.postgresql.org/message-id/CAMkU%3D1xAyWnwnLGORBOD%3Dpyv%3DccEkDi%3DwKeyhwF%3DgtB7QxLBwQ%40mail.gmail.com > On Sun, Dec 29, 2019 at 01:15:24PM -0500, Jeff Janes wrote: > > Also, I'd appreciate a report on how many hint-bits were set, and how many > > pages were marked all-visible and/or frozen. When I do a manual vacuum, it > > is more often for those purposes than it is for removing removable rows > > (which autovac generally does a good enough job of). > > The first patch seems simple enough but the 2nd could use critical review. Here is comments on 0001 patch from a quick review: - BlockNumber pages_removed; + BlockNumber pages_removed; /* Due to truncation */ + BlockNumber pages_allvisible; + BlockNumber pages_frozen; Other codes in vacuumlazy.c uses ‘all_frozen', so how about pages_allfrozen instead of pages_frozen? --- @@ -1549,8 +1558,12 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, { uint8 flags = VISIBILITYMAP_ALL_VISIBLE; - if (all_frozen) + if (all_frozen) { flags |= VISIBILITYMAP_ALL_FROZEN; + vacrelstats->pages_frozen++; + } @@ -1979,10 +2000,14 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, uint8 flags = 0; /* Set the VM all-frozen bit to flag, if needed */ - if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0) + if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0) { flags |= VISIBILITYMAP_ALL_VISIBLE; - if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0 && all_frozen) + vacrelstats->pages_allvisible++; + } + if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0 && all_frozen) { flags |= VISIBILITYMAP_ALL_FROZEN; + vacrelstats->pages_frozen++; + } The above changes need to follow PostgreSQL code format (a newline is required after if statement). --- /* * If the all-visible page is all-frozen but not marked as such yet, * mark it as all-frozen. Note that all_frozen is only valid if * all_visible is true, so we must check both. */ else if (all_visible_according_to_vm && all_visible && all_frozen && !VM_ALL_FROZEN(onerel, blkno, &vmbuffer)) { /* * We can pass InvalidTransactionId as the cutoff XID here, * because setting the all-frozen bit doesn't cause recovery * conflicts. */ visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr, vmbuffer, InvalidTransactionId, VISIBILITYMAP_ALL_FROZEN); } We should also count up vacrelstats->pages_frozen here. For 0002 patch, how users will be able to make any meaning out of how many hint bits were updated by vacuumu? Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Initial progress reporting for COPY command
On 2020/06/14 21:32, Josef Šimánek wrote: Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL maillist (https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer), I have prepared an initial patch for COPY command progress reporting. Sounds nice! file - bool - is file is used? program - bool - is program used? Are these fields really necessary in a progress view? What values are reported when STDOUT/STDIN is specified in COPY command? file_bytes_processed - amount of bytes processed when file is used (otherwise 0), works for both direction ( FROM/TO) when file is used (file = t) What value is reported when STDOUT/STDIN is specified in COPY command? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Review for GetWALAvailability()
At Sat, 13 Jun 2020 01:38:49 +0900, Fujii Masao wrote in > Hi, > > The document explains that "lost" value that > pg_replication_slots.wal_status reports means > > some WAL files are definitely lost and this slot cannot be used to > resume replication anymore. > > However, I observed "lost" value while inserting lots of records, > but replication could continue normally. So I wonder if > pg_replication_slots.wal_status may have a bug. > > wal_status is calculated in GetWALAvailability(), and probably I found > some issues in it. > > > keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments), > wal_segment_size) + > 1; > > max_wal_size_mb is the number of megabytes. wal_keep_segments is > the number of WAL segment files. So it's strange to calculate max of > them. Oops! I don't want to believe I did that but it's definitely wrong. > The above should be the following? > > Max(ConvertToXSegs(max_wal_size_mb, wal_segment_size), > wal_keep_segments) + 1 Looks reasonable. > if ((max_slot_wal_keep_size_mb <= 0 || >max_slot_wal_keep_size_mb >= max_wal_size_mb) && > oldestSegMaxWalSize <= targetSeg) > return WALAVAIL_NORMAL; > > This code means that wal_status reports "normal" only when > max_slot_wal_keep_size is negative or larger than max_wal_size. > Why is this condition necessary? The document explains "normal > means that the claimed files are within max_wal_size". So whatever > max_slot_wal_keep_size value is, IMO that "normal" should be > reported if the WAL files claimed by the slot are within max_wal_size. > Thought? It was a kind of hard to decide. Even when max_slot_wal_keep_size is smaller than max_wal_size, the segments more than max_slot_wal_keep_size are not guaranteed to be kept. In that case the state transits as NORMAL->LOST skipping the "RESERVED" state. Putting aside whether the setting is useful or not, I thought that the state transition is somewhat abrupt. > Or, if that condition is really necessary, the document should be > updated so that the note about the condition is added. Does the following make sense? https://www.postgresql.org/docs/13/view-pg-replication-slots.html normal means that the claimed files are within max_wal_size. + If max_slot_wal_keep_size is smaller than max_wal_size, this state + will not appear. > If the WAL files claimed by the slot exceeds max_slot_wal_keep_size > but any those WAL files have not been removed yet, wal_status seems > to report "lost". Is this expected behavior? Per the meaning of "lost" > described in the document, "lost" should be reported only when > any claimed files are removed, I think. Thought? > > Or this behavior is expected and the document is incorrect? In short, it is known behavior but it was judged as useless to prevent that. That can happen when checkpointer removes up to the segment that is being read by walsender. I think that that doesn't happen (or happenswithin a narrow time window?) for physical replication but happenes for logical replication. While development, I once added walsender a code to exit for that reason, but finally it is moved to InvalidateObsoleteReplicationSlots as a bit defferent function. With the current mechanism, there's a case where once invalidated slot came to revive but we decided to allow that behavior, but forgot to document that. Anyway if you see "lost", something bad is being happening. - lost means that some WAL files are definitely lost and this slot - cannot be used to resume replication anymore. + lost means that some required WAL files are removed and this slot is + no longer usable after once disconnected during this status. If it is crucial that the "lost" state may come back to reserved or normal state, + Note that there are cases where the state moves back to reserved or + normal state when all wal senders have left the just removed segment + before being terminated. There is a case where the state moves back to reserved or normal state when wal senders leaves the just removed segment before being terminated. > BTW, if we want to implement GetWALAvailability() as the document > advertises, we can simplify it like the attached POC patch. I'm not sure it is right that the patch removes wal_keep_segments from the function. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 700271fd40..199053dd4a 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -11240,18 +11240,25 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx normal means that the claimed files - are within max_wal_size. + are within max_wal_size. If + max_slot_wal_keep_size is smaller than +
Re: Read access for pg_monitor to pg_replication_origin_status view
Michael Paquier writes: > On Wed, Jun 10, 2020 at 12:35:49PM +0900, Michael Paquier wrote: >> OK, thanks. Then let's wait a couple of days to see if anybody has >> any objections with the removal of the hardcoded superuser check >> for those functions. > Committed the part removing the superuser checks as of cc07264. FWIW, I'd have included a catversion bump in this, to enforce that the modified backend functions are used with matching pg_proc entries. It's not terribly important at this phase of the devel cycle, but still somebody might wonder why the regression tests are failing for them (if they tried to skip an initdb). regards, tom lane
Re: min_safe_lsn column in pg_replication_slots view
On Mon, Jun 15, 2020 at 12:40:03PM +0900, Fujii Masao wrote: > BTW, I just wonder why each row in pg_replication_slots needs to have > min_safe_lsn column? Basically min_safe_lsn should be the same between > every replication slots. Indeed, that's confusing in its current shape. I would buy putting this value into pg_replication_slots if there were some consistency of the data to worry about because of locking issues, but here this data is controlled within info_lck, which is out of the the repslot LW lock. So I think that it is incorrect to put this data in this view and that we should remove it, and that instead we had better push for a system view that maps with the contents of XLogCtl. -- Michael signature.asc Description: PGP signature
Re: Resetting spilled txn statistics in pg_stat_replication
On Sat, Jun 13, 2020 at 5:07 PM Fujii Masao wrote: > > > On 2020/06/13 14:23, Amit Kapila wrote: > > On Fri, Jun 12, 2020 at 6:11 PM Magnus Hagander wrote: > >> > >> On Fri, Jun 12, 2020 at 10:23 AM Amit Kapila > >> wrote: > >>> > >> > >> > >> The problem with "lifetime of a process" is that it's not predictable. A > >> replication process might "bounce" for any reason, and it is normally not > >> a problem. But if you suddenly lose your stats when you do that, it starts > >> to matter a lot more. Especially when you don't know if it bounced. (Sure > >> you can look at the backend_start time, but that adds a whole different > >> sets of complexitites). > >> > > > > It is not clear to me what is a good way to display the stats for a > > process that has exited or bounced due to whatever reason. OTOH, if > > we just display per-slot stats, it is difficult to imagine how the > > user can make any sense out of it or in other words how such stats can > > be useful to users. > > If we allow users to set logical_decoding_work_mem per slot, > maybe the users can tune it directly from the stats? > How will it behave when same slot is used from multiple sessions? I think it will be difficult to make sense of the stats for slots unless we also somehow see which process has lead to that stats and if we do so then there won't be much difference w.r.t what we can do now? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [POC] Fast COPY FROM command for the table with foreign partitions
Thanks Andrey for the patch. I am glad that the patch has taken care of some corner cases already but there exist still more. COPY command constructed doesn't take care of dropped columns. There is code in deparseAnalyzeSql which constructs list of columns for a given foreign relation. 0002 patch attached here, moves that code to a separate function and reuses it for COPY. If you find that code change useful please include it in the main patch. While working on that, I found two issues 1. The COPY command constructed an empty columns list when there were no non-dropped columns in the relation. This caused a syntax error. Fixed that in 0002. 2. In the same case, if the foreign table declared locally didn't have any non-dropped columns but the relation that it referred to on the foreign server had some non-dropped columns, COPY command fails. I added a test case for this in 0002 but haven't fixed it. I think this work is useful. Please add it to the next commitfest so that it's tracked. On Tue, Jun 2, 2020 at 11:21 AM Andrey Lepikhov wrote: > > Thank you for the answer, > > 02.06.2020 05:02, Etsuro Fujita пишет: > > I think I also thought something similar to this before [1]. Will take a > > look. > > > [1] > > https://www.postgresql.org/message-id/23990375-45a6-5823-b0aa-a6a7a6a957f0%40lab.ntt.co.jp > > > I have looked into the thread. > My first version of the patch was like your idea. But when developing > the “COPY FROM” code, the following features were discovered: > 1. Two or more partitions can be placed at the same node. We need to > finish COPY into one partition before start COPY into another partition > at the same node. > 2. On any error we need to send EOF to all started "COPY .. FROM STDIN" > operations. Otherwise FDW can't cancel operation. > > Hiding the COPY code under the buffers management machinery allows us to > generalize buffers machinery, execute one COPY operation on each buffer > and simplify error handling. > > As i understand, main idea of the thread, mentioned by you, is to add > "COPY FROM" support without changes in FDW API. > It is possible to remove BeginForeignCopy() and EndForeignCopy() from > the patch. But it is not trivial to change ExecForeignInsert() for the > COPY purposes. > All that I can offer in this place now is to introduce one new > ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM > STDIN" operation, send tuples and close the operation. We can use the > ExecForeignInsert() routine for each buffer tuple if > ExecForeignBulkInsert() is not supported. > > One of main questions here is to use COPY TO machinery for serializing a > tuple. It is needed (if you will take a look into the patch) to > transform the CopyTo() routine to an iterative representation: > start/next/finish. May it be acceptable? > > In the attachment there is a patch with the correction of a stupid error. > > -- > Andrey Lepikhov > Postgres Professional > https://postgrespro.com > The Russian Postgres Company -- Best Wishes, Ashutosh Bapat From 9c4e09bd03cb98b1f84c42c34ce7b76e0a87011c Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Fri, 29 May 2020 10:39:57 +0500 Subject: [PATCH 1/2] Fast COPY FROM into the foreign (or sharded) table. --- contrib/postgres_fdw/deparse.c| 25 ++ .../postgres_fdw/expected/postgres_fdw.out| 5 +- contrib/postgres_fdw/postgres_fdw.c | 95 contrib/postgres_fdw/postgres_fdw.h | 1 + src/backend/commands/copy.c | 213 -- src/include/commands/copy.h | 5 + src/include/foreign/fdwapi.h | 9 + 7 files changed, 286 insertions(+), 67 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index ad37a74221..427402c8eb 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -1758,6 +1758,31 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte, withCheckOptionList, returningList, retrieved_attrs); } +/* + * Deparse COPY FROM into given buf. + * We need to use list of parameters at each query. + */ +void +deparseCopyFromSql(StringInfo buf, Relation rel) +{ + int attnum; + + appendStringInfoString(buf, "COPY "); + deparseRelation(buf, rel); + appendStringInfoString(buf, " ( "); + + for(attnum = 0; attnum < rel->rd_att->natts; attnum++) + { + appendStringInfoString(buf, NameStr(rel->rd_att->attrs[attnum].attname)); + + if (attnum != rel->rd_att->natts-1) + appendStringInfoString(buf, ", "); + } + + appendStringInfoString(buf, " ) "); + appendStringInfoString(buf, " FROM STDIN "); +} + /* * deparse remote UPDATE statement * diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 82fc1290ef..922c08d2dc 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -8063,8 +8063,9 @@ copy rem2 from stdin; copy rem2 fr
Re: [PATCH] Initial progress reporting for COPY command
> I'm using ftell to get current position in file to populate > file_bytes_processed without error handling (ftell can return -1L and also > populate errno on problems). > > 1. Is that a good way to get progress of file processing? IMO, it's better to handle the error cases. One possible case where ftell can return -1 and set errno is when the total bytes processed is more than LONG_MAX. Will your patch handle file_bytes_processed reporting for COPY FROM STDIN cases? For this case, ftell can't be used. Instead of using ftell and worrying about the errors, a simple approach could be to have a uint64 variable in CopyStateData to track the number of bytes read whenever CopyGetData is called. This approach can also handle the case of COPY FROM STDIN. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: hashagg slowdown due to spill changes
On Sun, 2020-06-14 at 11:14 -0700, Andres Freund wrote: > I'm somewhat inclined to think that we should revert 4cad2534da6 and > then look at how precisely to tackle this in 14. I'm fine with that. > It'd probably make sense to request small tlists when the number of > estimated groups is large, and not otherwise. That seems like a nice compromise that would be non-invasive, at least for create_agg_plan(). Regards, Jeff Davis
Re: Transactions involving multiple postgres foreign servers, take 2
On Sun, Jun 14, 2020 at 2:21 PM Tatsuo Ishii wrote: > > >> Won't it create an inconsistency in viewing the data from the > >> different servers? Say, such a transaction inserts one row into a > >> local server and another into the foreign server. Now, if we follow > >> the above protocol, the user will be able to see the row from the > >> local server but not from the foreign server. > > > > Yes, you're right. This atomic commit feature doesn't guarantee such > > consistent visibility so-called atomic visibility. Okay, I understand that the purpose of this feature is to provide atomic commit which means the transaction on all servers involved will either commit or rollback. However, I think we should at least see at a high level how the visibility will work because it might influence the implementation of this feature. > > Even the local > > server is not modified, since a resolver process commits prepared > > foreign transactions one by one another user could see an inconsistent > > result. Providing globally consistent snapshots to transactions > > involving foreign servers is one of the solutions. How would it be able to do that? Say, when it decides to take a snapshot the transaction on the foreign server appears to be committed but the transaction on the local server won't appear to be committed, so the consistent data visibility problem as mentioned above could still arise. > > Another approach to the atomic visibility problem is to control > snapshot acquisition timing and commit timing (plus using REPEATABLE > READ). In the REPEATABLE READ transaction isolation level, PostgreSQL > assigns a snapshot at the time when the first command is executed in a > transaction. If we could prevent any commit while any transaction is > acquiring snapshot, and we could prevent any snapshot acquisition while > committing, visibility inconsistency which Amit explained can be > avoided. > I think the problem mentioned above can occur with this as well or if I am missing something then can you explain in further detail how it won't create problem in the scenario I have used above? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: TAP tests and symlinks on Windows
On Sat, Jun 13, 2020 at 03:00:54PM +0900, Michael Paquier wrote: > My take would be to actually enforce that as a requirement for 14~ if > that works reliably, and of course not backpatch that change as that's > clearly an improvement and not a bug fix. It would be good to check > the status of each buildfarm member first though. And I would need to > also check my own stuff to begin with.. So, I have been looking at that. And indeed as Peter said we are visibly missing one call to perl2host in 010_pg_basebackup.pl. Another thing I spotted is that Win32::Symlink does not allow to detect properly if a path is a symlink using -l, causing one of the tests of pg_basebackup to fail when checking if a tablespace path has been updted. It would be good to get more people to test this patch with different environments than mine. I am also adding Andrew Dunstan in CC as the owner of the buildfarm animals running currently TAP tests for confirmation about the presence of Win32::Symlink there as I am afraid it would cause failures: drongo, fairywen, jacana and bowerbird. -- Michael diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 208df557b8..17643489d1 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -17,7 +17,7 @@ my $tempdir = TestLib::tempdir; my $node = get_new_node('main'); # Set umask so test directories and files are created with default permissions -umask(0077); +umask(0077) if (!$windows_os); # Initialize node without replication settings $node->init(extra => ['--data-checksums']); @@ -211,154 +211,168 @@ $node->command_fails( 'pg_basebackup tar with long name fails'); unlink "$pgdata/$superlongname"; -# The following tests test symlinks. Windows doesn't have symlinks, so -# skip on Windows. -SKIP: +# Move pg_replslot out of $pgdata and create a symlink to it. +$node->stop; + +# Set umask so test directories and files are created with group permissions +if (!$windows_os) { - skip "symlinks not supported on Windows", 18 if ($windows_os); - - # Move pg_replslot out of $pgdata and create a symlink to it. - $node->stop; - - # Set umask so test directories and files are created with group permissions - umask(0027); + umask(0027) if (!$windows_os); # Enable group permissions on PGDATA chmod_recursive("$pgdata", 0750, 0640); +} - rename("$pgdata/pg_replslot", "$tempdir/pg_replslot") - or BAIL_OUT "could not move $pgdata/pg_replslot"; - symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot") - or BAIL_OUT "could not symlink to $pgdata/pg_replslot"; +rename("$pgdata/pg_replslot", "$tempdir/pg_replslot") + or BAIL_OUT "could not move $pgdata/pg_replslot"; +symlink("$tempdir/pg_replslot", "$pgdata/pg_replslot") + or BAIL_OUT "could not symlink to $pgdata/pg_replslot"; - $node->start; +$node->start; - # Create a temporary directory in the system location and symlink it - # to our physical temp location. That way we can use shorter names - # for the tablespace directories, which hopefully won't run afoul of - # the 99 character length limit. - my $shorter_tempdir = TestLib::tempdir_short . "/tempdir"; - symlink "$tempdir", $shorter_tempdir; +# Create a temporary directory in the system location and symlink it +# to our physical temp location. That way we can use shorter names +# for the tablespace directories, which hopefully won't run afoul of +# the 99 character length limit. +my $shorter_tempdir = TestLib::tempdir_short . "/tempdir"; +my $realTsDir = TestLib::perl2host("$shorter_tempdir/tblspc1"); +symlink "$tempdir", $shorter_tempdir; - mkdir "$tempdir/tblspc1"; - $node->safe_psql('postgres', - "CREATE TABLESPACE tblspc1 LOCATION '$shorter_tempdir/tblspc1';"); - $node->safe_psql('postgres', - "CREATE TABLE test1 (a int) TABLESPACE tblspc1;"); - $node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup2", '-Ft' ], - 'tar format with tablespaces'); - ok(-f "$tempdir/tarbackup2/base.tar", 'backup tar was created'); - my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar"; - is(scalar(@tblspc_tars), 1, 'one tablespace tar was created'); - rmtree("$tempdir/tarbackup2"); +mkdir "$tempdir/tblspc1"; +$node->safe_psql('postgres', + "CREATE TABLESPACE tblspc1 LOCATION '$realTsDir';"); +$node->safe_psql('postgres', + "CREATE TABLE test1 (a int) TABLESPACE tblspc1;"); +$node->command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup2", '-Ft' ], + 'tar format with tablespaces'); +ok(-f "$tempdir/tarbackup2/base.tar", 'backup tar was created'); +my @tblspc_tars = glob "$tempdir/tarbackup2/[0-9]*.tar"; +is(scalar(@tblspc_tars), 1, 'one tablespace tar was created'); +rmtree("$tempdir/tarbackup2"); - # Create an unlogged table to test that forks other than init are not copied. - $node->safe_psql('postgres', - 'CREATE UNLOGGED TABLE tblspc1_unlogged (id int) TABLESPACE tblspc1;' - ); +# Create an unlogged table to test that forks other than init are not co
Re: Read access for pg_monitor to pg_replication_origin_status view
On Mon, Jun 15, 2020 at 12:44:02AM -0400, Tom Lane wrote: > FWIW, I'd have included a catversion bump in this, to enforce that > the modified backend functions are used with matching pg_proc entries. > It's not terribly important at this phase of the devel cycle, but still > somebody might wonder why the regression tests are failing for them > (if they tried to skip an initdb). Thanks, I am fixing that now. (It may be effective to print a T-shirt of that at some point and give it to people scoring the most in this area..) -- Michael signature.asc Description: PGP signature