Re: ICU for global collation
Julien Rouhaud wrote: > > > While on that topic, the doc should probably mention that default ICU > > > collations can only be deterministic. > > > > Well, there is no option to do otherwise, so I'm not sure where/how to > > mention that. We usually don't document options that don't exist. ;-) > > Sure, but I'm afraid that users may still be tempted to use ICU locales like > und-u-ks-level2 from the case_insensitive example in the doc and hope that > it will work accordingly. +1. The CREATE DATABASE doc says this currently: icu_locale Specifies the ICU locale ID if the ICU locale provider is used. ISTM that we need to say explicitly that this locale will be used by default to compare all collatable strings, except that it's overruled by a bytewise comparison to break ties in case of equality. The idea is to describe what the backend will do with the setting rather than saying that we don't have a nondeterministic option. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: ICU for global collation
Finnerty, Jim wrote: > In ICU, the "locale" is just the first part of what we can pass to the > "locale" parameter in CREATE COLLATION - the part before the optional '@' > delimiter. The ICU locale does not include the secondary or tertiary > properties, Why not? Please see https://unicode-org.github.io/icu/userguide/locale It says that "a locale consists of one or more pieces of ordered information", the pieces being a language code, a script code, a country code, a variant code, and keywords. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: popcount
David Fetter wrote: +Datum +byteapopcount(PG_FUNCTION_ARGS) +{ + bytea *t1 = PG_GETARG_BYTEA_PP(0); + int len, result; + + len = VARSIZE_ANY_EXHDR(t1); + result = pg_popcount(VARDATA_ANY(t1), len); + + PG_RETURN_INT32(result); +} The input may have more than 2 billion bits set to 1. The biggest possible result should be 8 billion for bytea (1 GB with all bits set to 1). So shouldn't this function return an int8? There is a pre-existing function in core: bit_length(bytea) returning int4, but it errors out for large values (in fact it's a SQL function that returns octet_length($1)*8). For the varbit case, it doesn't seem like it can hold so many bits, although I don't see the limit documented in https://www.postgresql.org/docs/current/datatype-bit.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: PATCH: Batch/pipelining support for libpq
Alvaro Herrera wrote: > I adapted the test code to our code style. I also removed the "timings" > stuff; I think that's something better left to pgbench. > > (I haven't looked at Daniel's pgbench stuff yet, but I will do that > next.) The patch I posted in [1] was pretty simple, but at the time, query results were always discarded. Now that pgbench can instantiate variables from query results, a script can do: select 1 as var \gset select :var; This kind of sequence wouldn't work in batch mode since it sends queries before getting results of previous queries. So maybe \gset should be rejected when inside a batch section. Or alternatively pgbench should collect results before a variable is reinjected into a query, thereby stalling the pipeline. To do this only when necessary, it would have to track read-write dependencies among variables, which seems overly complicated though. [1] https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da2...@manitou-mail.org Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: PATCH: Batch/pipelining support for libpq
Hi, Here's a new version with the pgbench support included. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 9ce32fb39b..2a94f8f6b9 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3061,6 +3061,30 @@ ExecStatusType PQresultStatus(const PGresult *res); + + + PGRES_BATCH_END + + +The PGresult represents the end of a batch. +This status occurs only when batch mode has been selected. + + + + + + PGRES_BATCH_ABORTED + + +The PGresult represents a batch that's +received an error from the server. PQgetResult +must be called repeatedly, and it will return this status code, +until the end of the current batch, at which point it will return +PGRES_BATCH_END and normal processing can resume. + + + + If the result status is PGRES_TUPLES_OK or @@ -4819,6 +4843,482 @@ int PQflush(PGconn *conn); + + Batch Mode + + + libpq + batch mode + + + + pipelining + in libpq + + + + libpq batch mode allows applications to + send a query without having to read the result of the previously + sent query. Taking advantage of the batch mode, a client will wait + less for the server, since multiple queries/results can be sent/ + received in a single network transaction. + + + + While batch mode provides a significant performance boost, writing + clients using the batch mode is more complex because it involves + managing a queue of pending queries and finding which result + corresponds to which query in the queue. + + + + Using Batch Mode + + +To issue batches the application must switch a connection into batch mode. +Enter batch mode with +or test whether batch mode is active with +. +In batch mode, only asynchronous operations +are permitted, and COPY is not recommended as it +may trigger failure in batch processing. Using any synchronous +command execution functions such as PQfn, +PQexec or one of its sibling functions are error +conditions. + + + + + It is best to use batch mode with libpq in + non-blocking mode. If used + in blocking mode it is possible for a client/server deadlock to occur. + + +The client will block trying to send queries to the server, but the +server will block trying to send results to the client from queries +it has already processed. This only occurs when the client sends +enough queries to fill its output buffer and the server's receive +buffer before switching to processing input from the server, +but it's hard to predict exactly when that will happen. + + + + + Batch mode consumes more memory when send/receive is not done as required, + even in non-blocking mode. + + + + +Issuing Queries + + + After entering batch mode the application dispatches requests using + normal asynchronous libpq functions such as + PQsendQueryParams, PQsendPrepare, + PQsendQueryPrepared, PQsendDescribePortal, + PQsendDescribePrepared. + The asynchronous requests are followed by a + + call to mark the end of the batch. The client needs not + call PQgetResult immediately after + dispatching each operation. + Result processing + is handled separately. + + + + The server executes statements, and returns results, in the order the + client sends them. The server may begin executing the batch before all + commands in the batch are queued and the end of batch command is sent. + If any statement encounters an error the server aborts the current + transaction and skips processing the rest of the batch. + Query processing resumes after the end of the failed batch. + + + + It's fine for one operation to depend on the results of a + prior one. One query may define a table that the next query in the same + batch uses; similarly, an application may create a named prepared statement + then execute it with later statements in the same batch. + + + + +Processing Results + + + To process the result of one batch query, the application calls + PQgetResult repeatedly and handles each result + until PQgetResult returns null. + The result from the next batch query may then be retrieved using + PQgetResult again and the cycle repeated. + The application handles individual statement results as normal. + When the results of all the queries in the batch have been + returned, PQgetResult returns a result + containing the status value PGRES_BA
Re: popcount
Peter Eisentraut wrote: > + /* > +* ceil(VARBITLEN(ARG1)/BITS_PER_BYTE) > +* done to minimize branches and instructions. > +*/ > > I don't know what that is supposed to mean or why that kind of tuning > would be necessary for a user-callable function. Also, the formula just below looks wrong in the current patch (v3): + /* + * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE) + * done to minimize branches and instructions. + */ + len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE; + p = VARBITS(arg1); + + popcount = pg_popcount((const char *)p, len); It should be len = (VARBITLEN(arg1) + BITS_PER_BYTE - 1) / BITS_PER_BYTE If we add 1 to BITS_PER_BYTE instead of subtracting 1, when VARBITLEN(arg1) is a multiple of BITS_PER_BYTE, "len" is one byte too large. The correct formula is already used in include/utils/varbit.h near the definition of VARBITLEN: #define VARBITTOTALLEN(BITLEN) (((BITLEN) + BITS_PER_BYTE-1)/BITS_PER_BYTE + \ VARHDRSZ + VARBITHDRSZ) Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Are datcollate/datctype always libc even under --with-icu ?
Chapman Flack wrote: > Next question: the "currently" in that comment suggests that could change, > but is there any present intention to change it, or is this likely to just > be the way it is for the foreseeable future? Some related patches and discussions: * ICU as default collation provider https://commitfest.postgresql.org/21/1543/ * ICU for global collation https://commitfest.postgresql.org/25/2256/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: ICU for global collation
Julien Rouhaud wrote: > If you want a database with an ICU default collation the lc_collate > and lc_ctype should inherit what's in the template database and not > what was provided in the LOCALE I think. You could still probably > overload them in some scenario, but without a list of what isn't > ICU-aware I can't really be sure of how often one might have to do > it. I guess we'd need that when creating a database with a different encoding than the template databases, at least. About what's not ICU-aware, I believe the most significant part in core is the Full Text Search parser. It doesn't care about sorting strings, but it relies on the functions from POSIX , which depend on LC_CTYPE (it looks however that this could be improved by following what has been done in backend/regex/regc_pg_locale.c, which has comparable needs and calls ICU functions when applicable). Also, any extension is potentially concerned. Surely many extensions call functions from ctype.h assuming that the current value of LC_CTYPE works with the data they handle. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: ICU for global collation
Peter Eisentraut wrote: > Unlike in the previous patch, where the ICU > collation name was written in datcollate, there is now a third column > (daticucoll), so we can store all three values. I think some users would want their db-wide ICU collation to be case/accent-insensitive. Postgres users are trained to expect case-sensitive comparisons, but some apps initially made for e.g. MySQL or MS-SQL that use such collations natively would be easier to port to Postgres. IIRC, that was the context for some questions where people were enquiring about db-wide ICU collations. With the current patch, it's not possible, AFAICS, because the user can't tell that the collation is non-deterministic. Presumably this would require another option to CREATE DATABASE and another column to store that bit of information. The "daticucol" column also suggests that we don't expect to add other collation providers in the future. Maybe a pair of columns like (datcollprovider, datcolllocale) would be more future-proof, or a (datcollprovider, datcolllocale, datcollisdeterministic) triplet if non-deterministic collations are allowed. Also, pg_collation has "collversion" to detect a mismatch between the ICU runtime and existing indexes. I don't see that field for the db-wide ICU collation, so maybe we currently miss the capability to detect the mismatch for the db-wide collation? The lack of these fields overall suggest the idea that when CREATE DATABASE is called with a global ICU collation, what if it somehow inserted the collation into pg_collation in the new database? Then pg_database would just store the collation oid and no other collation-related field would need to be added into it, now or in the future. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: ICU for global collation
Julien Rouhaud wrote: > > The lack of these fields overall suggest the idea that when CREATE > > DATABASE is called with a global ICU collation, what if it somehow > > inserted the collation into pg_collation in the new database? > > Then pg_database would just store the collation oid and no other > > collation-related field would need to be added into it, now > > or in the future. > > I don't think it would be doable given the single-database-per-backend > restriction. By that I understand that CREATE DATABASE is limited to copying a template database and then not write anything into it beyond that, as it's not even connected to it. I guess there's still the possibility of requiring that the ICU db-wide collation of the new database does exist in the template database, and then the CREATE DATABASE would refer to that collation instead of an independent locale string. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: ICU for global collation
Julien Rouhaud wrote: > > I guess there's still the possibility of requiring that the ICU db-wide > > collation of the new database does exist in the template database, > > and then the CREATE DATABASE would refer to that collation instead of > > an independent locale string. > > That could work. However if having the collation in the template database a > strict requirement the something should also be done for initdb, and it will > probably be a bigger problem. If CREATE DATABASE referred to a collation in the template db, either that collation already exists, or the user would have to add it to the template db with CREATE COLLATION. initdb already populates the template databases with a full set of ICU collations through pg_import_system_collations(). I don't quite see what change you're seeing that would be needed in initdb. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: pgsql: Add libpq pipeline mode support to pgbench
Fabien COELHO wrote: > For consistency with the existing \if … \endif, ISTM that it could have > been named \batch … \endbatch or \pipeline … \endpipeline? "start" mirrors "end". To me, the analogy with \if-\endif is not obvious. Grammatically \if is meant to introduce the expression after it, whereas \startpipeline takes no argument. Functionally \startpipeline can be thought as "open the valve" and \endpipeline "close the valve". They're "call-to-action" kinds of commands, and in that sense quite different from the \if-\endif pair. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: insensitive collations
Jim Finnerty wrote: > For a UTF8 encoded, case-insensitive (nondeterministic), accent-sensitive > ICU > collation, a LIKE predicate can be used with a small transformation of the > predicate, and the pattern can contain multi-byte characters: > > from: > > SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%'; > -- ERROR: nondeterministic collations are not supported for LIKE > > to: > > SELECT * FROM locations WHERE lower(location) COLLATE "C" LIKE > lower('midi-Pyrené%'); For prefix matching, there's a simpler way with non-deterministic collations based on the advice in [1] The trick is that if an ICU collation is assigned to "location", whether it's deterministic or not, SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%'; is equivalent to: SELECT * FROM locations WHERE location BETWEEN 'midi-Pyrené' AND 'midi-Pyrené' || E'\u'; and that will use a btree index if available. Also, it works with all features of ND-collations and all encodings, not just case-insensitiveness and UTF-8. Now that doesn't solve LIKE '%midi-Pyrené%', or LIKE '%midi_Pyrené%', but that trick could be a building block for an algorithm implementing LIKE with ND-collations in the future. [1] https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: insensitive collations
Jim Finnerty wrote: > Currently nondeterministic collations are disabled at the database level. Deterministic ICU collations are also disabled. > The cited reason was because of the lack of LIKE support and because certain > catalog views use LIKE. But the catalogs shouldn't use the default collation of the database. commit 586b98fdf1aaef4a27744f8b988479aad4bd9a01 provides some details about this: Author: Tom Lane Date: Wed Dec 19 17:35:12 2018 -0500 Make type "name" collation-aware. The "name" comparison operators now all support collations, making them functionally equivalent to "text" comparisons, except for the different physical representation of the datatype. They do, in fact, mostly share the varstr_cmp and varstr_sortsupport infrastructure, which has been slightly enlarged to handle the case. To avoid changes in the default behavior of the datatype, set name's typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that by default comparisons to a name value will continue to use strcmp semantics. (This would have been the case for system catalog columns anyway, because of commit 6b0faf723, but doing this makes it true for user-created name columns as well. In particular, this avoids locale-dependent changes in our regression test results.) In consequence, tweak a couple of places that made assumptions about collatable base types always having typcollation DEFAULT_COLLATION_OID. I have not, however, attempted to relax the restriction that user- defined collatable types must have that. Hence, "name" doesn't behave quite like a user-defined type; it acts more like a domain with COLLATE "C". (Conceivably, if we ever get rid of the need for catalog name columns to be fixed-length, "name" could actually become such a domain over text. But that'd be a pretty massive undertaking, and I'm not volunteering.) Discussion: https://postgr.es/m/15938.1544377...@sss.pgh.pa.us Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Calendar support in localization
Thomas Munro wrote: > Right, so if this is done by trying to extend Daniel Verite's icu_ext > extension (link given earlier) and Robert's idea of a fast-castable > type, I suppose you might want now()::icu_date + '1 month'::internal > to advance you by one Ethiopic month if you have done SET > icu_ext.ICU_LC_TIME = 'am_ET@calendar=traditional'. I've pushed a calendar branch on icu_ext [1] with preliminary support for non-gregorian calendars through ICU, so far with only format and parse of timetamptz. The ICU locale drives both the localization of field names (language) and the choice of calendar. It looks like this: \set fmt 'dd// HH:mm:ss.SSS zz' WITH list(cal) AS ( values ('gregorian'), ('japanese'), ('buddhist'), ('roc'), ('persian'), ('islamic-civil'), ('islamic'), ('hebrew'), ('chinese'), ('indian'), ('coptic'), ('ethiopic'), ('ethiopic-amete-alem'), ('iso8601'), ('dangi') ), fmt AS ( select cal, icu_format_date(now(), :'fmt', 'fr@calendar='||cal) as now_str, icu_format_date(now()+'1 month'::interval, :'fmt', 'fr@calendar='||cal) as plus_1m from list ) SELECT cal, now_str, icu_parse_date(now_str, :'fmt', 'fr@calendar='||cal) as now_parsed, plus_1m, icu_parse_date(plus_1m, :'fmt', 'fr@calendar='||cal) as plus_1m_parsed FROM fmt; -[ RECORD 1 ]--+--- cal| gregorian now_str| 26/mars/2021 après Jésus-Christ 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 26/avril/2021 après Jésus-Christ 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 2 ]--+--- cal| japanese now_str| 26/mars/0033 Heisei 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 26/avril/0033 Heisei 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 3 ]--+--- cal| buddhist now_str| 26/mars/2564 ère bouddhique 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 26/avril/2564 ère bouddhique 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 4 ]--+--- cal| roc now_str| 26/mars/0110 RdC 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 26/avril/0110 RdC 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 5 ]--+--- cal| persian now_str| 06/farvardin/1400 Anno Persico 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 06/ordibehešt/1400 Anno Persico 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 6 ]--+--- cal| islamic-civil now_str| 12/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 7 ]--+--- cal| islamic now_str| 13/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 8 ]--+--- cal| hebrew now_str| 13/nissan/5781 Anno Mundi 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 14/iyar/5781 Anno Mundi 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 9 ]--+--- cal| chinese now_str| 14/èryuè/0038 78 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 15/sānyuè/0038 78 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 10 ]-+--- cal| indian now_str| 05/chaitra/1943 ère Saka 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 06/vaishākh/1943 ère Saka 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 11 ]-+--- cal| coptic now_str| 17/barmahât/1737 après Dioclétien 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01 plus_1m| 18/barmoudah/1737 après Dioclétien 18:22:07.566 UTC+2 plus_1m_parsed | 2021-04-26 18:22:07.566+02 -[ RECORD 12 ]-+--- cal| ethiopic now_str| 17/mägabit/2013 après l’Incarnation 18:22:07.566 UTC+1 now_parsed | 2021-03-26 18:22:07.566+01
Re: Calendar support in localization
Matthias van de Meent wrote: > The results for the Japanese locale should be "0003 Reiwa" instead of > "0033 Heisei", as the era changed in 2019. ICU releases have since > implemented this and other corrections; this specific change was > implemented in the batched release of ICU versions on 2019-04-12. Right. I've run this test on an Ubuntu 18.04 desktop which comes with libicu60 . The current version for my system is 60.2-3ubuntu3.1. Ubuntu maintainers did not pick up the change of the new Japanese era. As a guess, it's because it's not a security fix. This contrasts with the baseline maintainers, who did an exceptional effort to backpatch this down to ICU 53 (exceptional in the sense that they don't do that for bugfixes). >> For instance with the ethiopic calendar, the query above displays today as >> 17/mägabit/2013 and 1 month from now as 18/miyazya/2013, >> while the correct result is probably 17/miyazya/2013 (?) > Seeing the results for Japanese locale, you might want to update your > ICU library, which could fix this potential inconsistency. I agree it's always best to have the latest ICU version, but in the context of Postgres, we have to work with the versions that are typically installed on users systems. People who have pre-2019 versions will simply be stuck with the previous Japanese era. Anyway, for the specific problem that the interval datatype cannot be used seamlessly across all calendars, it's essentially about how days are mapped into calendars, and it's unrelated to ICU updates AFAIU. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Calendar support in localization
Surafel Temesgen wrote: > > About intervals, if there were locale-aware functions like > > add_interval(timestamptz, interval [, locale]) returns timestamptz > > or > > sub_timestamp(timestamptz, timestamptz [,locale]) returns interval > > that would use ICU to compute the results according to the locale, > > wouldn't it be good enough? > > > > > Yes it can be enough for now but there are patches proposed to support the > system and application time period which are in SQL standard To clarify, these function signatures are not meant to oppose a core vs extension implementation, nor an ICU vs non-ICU implementation. They're meant to illustrate the case of using specific functions instead of adding specific data types. AFAIU, adding data types come from the idea that since (non-gregorian-date + interval) doesn't have the same result as (gregorian-date + interval), we could use a different type for non-gregorian-date and so a different "+" operator, maybe even a specific interval type. For the case of temporal tables, I'm not quite familiar with the feature, but I notice that the patch says: +When system versioning is specified two columns are added which +record the start timestamp and end timestamp of each row verson. +The data type of these columns will be TIMESTAMP WITH TIME ZONE. The user doesn't get to choose the data type, so if we'd require to use specific data types for non-gregorian calendars, that would seemingly complicate things for this feature. This is consistent with the remark upthread that the SQL standard assumes the gregorian calendar. > what it takes to support calendar locally is input/output function > and a converter from and to julian calendar and that may not be that > much hard since most of the world calendar is based on julian or > gregorian calendar[0] The conversions from julian dates are not necessarily hard, but the I/O functions means having localized names for all days, months, eras of all calendars in all supported languages. If you're thinking of implementing this from scratch (without the ICU dependency), where would these names come from? OTOH if we're using ICU, then why bother reinventing the julian-to-calendars conversions that ICU already does? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: insensitive collations
Jim Finnerty wrote: > SET client_encoding = WIN1252; > [...] > postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt__'; -- > the wildcard is applied byte by byte instead of character by character, so > the 2-byte accented character is matched only by 2 '_'s >location > > Franche-Comté > (1 row) The most plausible explanation is that the client-side text is encoded in UTF-8, rather than WIN1252 as declared. If you added length('Franche-Comté') to the above query, I suspect it would tell that the string is one character longer than expected, and octet_length('Franche-Comté') would be two-byte longer than expected. Also dumping the contents of the "location" column with convert_to() would show that the accents have been wrongly translated, if the explanation of the encoding snafu is correct. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Add header support to text format and matching feature
Rémi Lapeyre wrote: > Here’s a new version that fix all the issues. Here's a review of v4. The patch improves COPY in two ways: - COPY TO and COPY FROM now accept "HEADER ON" for the TEXT format (previously it was only for CSV) - COPY FROM also accepts "HEADER match" to tell that there's a header and that its contents must match the columns of the destination table. This works for both the CSV and TEXT formats. The syntax for the columns is the same as for the data and the match is case-sensitive. The first improvement when submitted alone (in 2018 by Simon Muller) has been judged not useful enough or even hazardous without any "match" feature. It was returned with feedback in 2018-10 and resubmitted by Rémi in 2020-02 with the match feature. The patches apply cleanly, "make check" and "make check-world" pass. In my tests it works fine except for one crash that I can reproduce on a fresh build and default configuration with: $ cat >file.txt i 1 $ psql postgres=# create table x(i int); CREATE TABLE postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) COPY 1 postgres=# \copy x(i) from file.txt with (header match) PANIC: ERRORDATA_STACK_SIZE exceeded server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I suspect the reason is the way that PG_TRY/PG_CATCH is used, see below. Code comments: +/* + * Represents whether the header must be absent, present or present and match. + */ +typedef enum CopyHeader +{ + COPY_HEADER_ABSENT, + COPY_HEADER_PRESENT, + COPY_HEADER_MATCH +} CopyHeader; + /* * This struct contains all the state variables used throughout a COPY * operation. For simplicity, we use the same struct for all variants of COPY, @@ -136,7 +146,7 @@ typedef struct CopyStateData boolbinary; /* binary format? */ boolfreeze; /* freeze rows on loading? */ boolcsv_mode; /* Comma Separated Value format? */ - boolheader_line;/* CSV or text header line? */ + CopyHeader header_line;/* CSV or text header line? */ After the redefinition into this enum type, there are still a bunch of references to header_line that treat it like a boolean: 1190: if (cstate->header_line) 1398: if (cstate->binary && cstate->header_line) 2119: if (cstate->header_line) 3635: if (cstate->cur_lineno == 0 && cstate->header_line) It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum, but maybe it's not good style to count on that. + PG_TRY(); + { + if (defGetBoolean(defel)) + cstate->header_line = COPY_HEADER_PRESENT; + else + cstate->header_line = COPY_HEADER_ABSENT; + } + PG_CATCH(); + { + char*sval = defGetString(defel); + + if (!cstate->is_copy_from) + PG_RE_THROW(); + + if (pg_strcasecmp(sval, "match") == 0) + cstate->header_line = COPY_HEADER_MATCH; + else + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), +errmsg("header requires a boolean or \"match\""))); + } + PG_END_TRY(); It seems wrong to use a PG_CATCH block for this. I understand that it's because defGetBoolean() calls ereport() on non-booleans, but then it should be split into an error-throwing function and a non-error-throwing lexical analysis of the boolean, the above code calling the latter. Besides the comments in elog.h above PG_TRY say that "the error recovery code can either do PG_RE_THROW to propagate the error outwards, or do a (sub)transaction abort. Failure to do so may leave the system in an inconsistent state for further processing." Maybe this is what happens with the repeated uses of "match" eventually failing with ERRORDATA_STACK_SIZE exceeded. -HEADER [ boolean ] +HEADER { match | true | false } This should be enclosed in square brackets because HEADER with no argument is still accepted. + names from the table. On input, the first line is discarded when set + to
EDB builds Postgres 13 with an obsolete ICU version
Hi, As a follow-up to bug #16570 [1] and other previous discussions on the mailing-lists, I'm checking out PG13 beta for Windows from: https://www.enterprisedb.com/postgresql-early-experience and it ships with the same obsolete ICU 53 that was used for PG 10,11,12. Besides not having the latest Unicode features and fixes, ICU 53 ignores the BCP 47 tags syntax in collations used as examples in Postgres documentation, which leads to confusion and false bug reports. The current version is ICU 67. I don't see where the suggestion to upgrade it before the next PG release should be addressed but maybe some people on this list do know or have the leverage to make it happen? [1] https://www.postgresql.org/message-id/16570-58cc04e1a6ef3c3f%40postgresql.org Best regards, -- Daniel Vérité PostgreSQL-powered mailer: https://www.manitou-mail.org Twitter: @DanielVerite
Re: Re: csv format for psql
David Steele wrote: > Do you know when you'll have an updated patch that addresses the minor > issues brought up in review and the concern above? Here's an update addressing the issues discussed: - fieldsep and recordsep are used, no more fieldsep_csv - the command line option is --csv without short option and is equivalent to -P format=csv -P fieldsep=',' - pset output formats are reordered alphabetically on display - a couple more cases in the regression tests Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 10b9795..c984a9c 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -152,6 +152,16 @@ EOF + --csv + + + Switches to csv output mode. This is equivalent to \pset format + csv followed by \pset fieldsep ','. + + + + + -d dbname --dbname=dbname @@ -246,7 +256,7 @@ EOF Use separator as the - field separator for unaligned output. This is equivalent to + field separator for unaligned and csv outputs. This is equivalent to \pset fieldsep or \f. @@ -382,7 +392,7 @@ EOF Use separator as the - record separator for unaligned output. This is equivalent to + record separator for unaligned and csv outputs. This is equivalent to \pset recordsep. @@ -558,7 +568,7 @@ EOF Set the field separator for unaligned output to a zero byte. This is - equvalent to \pset fieldsep_zero. + equivalent to \pset fieldsep_zero. @@ -1937,9 +1947,9 @@ Tue Oct 26 21:40:57 CEST 1999 -Sets the field separator for unaligned query output. The default -is the vertical bar (|). It is equivalent to -\pset fieldsep. +Sets the field separator for unaligned and csv query outputs. The +default is the vertical bar (|). It is equivalent +to \pset fieldsep. @@ -2546,8 +2556,8 @@ lo_import 152801 fieldsep - Specifies the field separator to be used in unaligned output - format. That way one can create, for example, tab- or + Specifies the field separator to be used in unaligned and csv output + formats. That way one can create, for example, tab- or comma-separated output, which other programs might prefer. To set a tab as field separator, type \pset fieldsep '\t'. The default field separator is @@ -2584,9 +2594,13 @@ lo_import 152801 format - Sets the output format to one of unaligned, - aligned, wrapped, - html, asciidoc, + Sets the output format to one of + unaligned, + aligned, + csv, + wrapped, + html, + asciidoc, latex (uses tabular), latex-longtable, or troff-ms. @@ -2601,6 +2615,15 @@ lo_import 152801 format). + csv format writes columns separated + by fieldsep, applying the CSV quoting rules + described in RFC-4180 and compatible with the CSV format + of the COPY command. + The header with column names is output unless the + tuples_only parameter is on. + Title and footers are not printed. + + aligned format is the standard, human-readable, nicely formatted text output; this is the default. @@ -2747,8 +2770,8 @@ lo_import 152801 recordsep - Specifies the record (line) separator to use in unaligned - output format. The default is a newline character. + Specifies the record (line) separator to use in unaligned or + csv output formats. The default is a newline character. diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 3560318..1d8cc96 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3603,6 +3603,9 @@ _align2string(enum printFormat in) case PRINT_TROFF_MS: return "troff-ms"; break; + case PRINT_CSV: + return "csv"; + break; } return "unknown"; } @@ -3658,25 +3661,27 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) { if (!value) ; - else if (pg_strncasecmp("unaligned", value, vallen) == 0) - popt->topt.format = PRINT_UNALIGNED; else if (pg_strncasecmp("aligned", value, vallen) == 0) popt->topt.form
Re: Re: csv format for psql
Pavel Stehule wrote: > It should not be hard. All formats can has '|' like now, and csv can have a > ',' - then if field separator is not explicit, then default field separator > is used, else specified field separator is used. > > You can see my idea in attached patch With that patch, consider this sequence: postgres=# \pset format csv Output format is csv. postgres=# \pset fieldsep Field separator is "|". postgres=# select 1 as a,2 as b; a,b 1,2 Here psql reports that fieldsep is "|" and right away is using something else in the output. That doesn't look good. You may object that it's fixable by tweaking the output of \pset, \pset fieldsep, and \? variables so that it knows that the current output format is going to use a "hidden" default separator, and then these commands should display that value instead. But that'd be somewhat playing whack-a-mole, as the following sequence would now be possible, with '|' being used as the separator instead of the ',' reported just above: postgres=# \pset format csv Output format is csv. postgres=# \pset fieldsep Field separator is ",". postgres=# \a Output format is aligned. postgres=# select 1 as a,2 as b; a | b ---+--- 1 | 2 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Kyotaro HORIGUCHI wrote: > A disucssion on psql batch mode was held in another branch of > this thread. How do we treat that? There's a batch mode for pgbench in a patch posted in [1], with \beginbatch and \endbatch commands, but nothing for psql AFAICS. psql is more complicated because currently it uses a blocking PQexec() call at its core. Craig mentioned psql integration in [2] and [3]. Also a script can have inter-query dependencies such as in insert into table(...) returning id \gset update othertable set col= :id where ...; which is a problem in batch mode, as we don't want to send the update before the right value for :id is known. Whether we want to support these dependencies and how needs discussion. For instance we might not support them at all, or create a synchronization command that collects all results of queries sent so far, or do it implicitly when a variable is injected into a query... This looks like substantial work that might be best done separately from the libpq patch. [1] https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da2...@manitou-mail.org [2] https://www.postgresql.org/message-id/CAMsr+YGLhaDkjymLuNVQy4MrSKQoA=F1vO=aN8XQf30N=aq...@mail.gmail.com [3] https://www.postgresql.org/message-id/CAMsr+YE6BK4iAaQz=ny3xdnblhnnz_4tp-ptjqbnnpszmgo...@mail.gmail.com Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Re: csv format for psql
Pavel Stehule wrote: > implemented in attached patch Consider your initial output of \pset, when no option is specified = $ ./psql psql (11devel) Type "help" for help. postgres=# \pset border 1 columns 0 expanded off fieldsep not used fieldsep_zeroERROR footer on format aligned linestyleascii null '' numericlocaleoff pager1 pager_min_lines 0 recordsep'\n' recordsep_zero off resetERROR tableattr title tuples_only off unicode_border_linestyle single unicode_column_linestyle single unicode_header_linestyle single These entries with ERROR correspond in fact to no error at all, or we have to pretend that the default state of psql is erroneous, which doesn't make sense. Also "reset" is not a variable, it seems to be a command, so it probably shouldn't be there in the first place. More generally, I'd think the point of reusing "fieldsep" was to reuse the concept, not reimplement it, let alone changing bits of behavior of the unaligned mode along the way. With this patch, again without specifying any option, just looking at what fieldsep is leads to this: postgres=# \pset fieldsep User didn't specified field separator. Current format doesn't specify default field separator. If this is the way to "solve" the fact that a user has to do \pset fieldsep ',' to get commas in csv mode, then IMV the proposed solution is clearly worse than the stated problem, and worse than simply adding fieldsep_csv to be independant from the unaligned mode. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Re: csv format for psql
Pavel Stehule wrote: > b) the list of pset options is bloating - every possible new format can > introduce fieldsep_X option What new format? The usefulness of fieldsep does not extend outside of xSV, and it's no suprise that there have been no other use for a fieldsep-like variable until now, with the other formats supported in psql. In fact it's even absurd for any format to use a non-fixed separator at a place that is key for being parsed unambiguously. We could even support only the comma and make it non-configurable based on the fact it's Comma-Separated-Values, not Whatever-Separated-Values, except that won't do much to serve the users interests, as the reality is that people use various separators. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
David G. Johnston wrote: > Could someone post how captions, rows-only, and footer pset settings factor > into this? Specifically are they fixed to on/off or will they hide/show if > users request them explicitly? This is described in the doc with: + csv format writes columns separated + by fieldsep, applying the CSV quoting rules + described in RFC-4180 and compatible with the CSV format + of the COPY command. + The header with column names is output unless the + tuples_only parameter is on. + Title and footers are not printed. + Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
David G. Johnston wrote: > Or, really, just make --csv take an optional argument which, if present, is > the delimiter. I don't think psql can support optional arguments because psql --option foo would be ambiguous, since foo could be the option's value or the name of a database to connect to. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Isaac Morland wrote: > The actual reason I'm posting this is because some of the discussion has > made me a bit confused: there is already a CSV format defined for the COPY > command and used by the psql \copy. I just want to check that what is being > discussed here would use the exact same format as the existing CSV COPY Please see the start of the thread at https://www.postgresql.org/message-id/a8de371e-006f-4f92-ab72-2bbe3ee78f03%40manitou-mail.org where what is proposed is compared to what \copy and COPY already provide. About the CSV acronym itself, the COPY documentation https://www.postgresql.org/docs/current/static/sql-copy.html already refers to it as "Comma Separated Values", even though as we know, the delimiter is user-configurable. There's no difference in psql that would justify a different wording. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
David G. Johnston wrote: > Unaligned format could grow its own fieldsep if it wanted to but it can > also just use the default provided fieldsep var whose default value is > pipe. If it did grow one it would need to understand "not set" in order to > preserve existing behavior. We don't have that problem with csv. fielsep is already owned by the unaligned format. No other format uses fieldsep currently. Why would it needs to grow its own? I don't quite see what you mean here. For csv, Fabien and Peter expressed the opinion that we shouldn't create another fieldsep-like variable specifically for it, but instead reuse fieldsep. That's what my latest patch does. Now it turns out that sharing fieldsep comes with some problems. 1. since a single variable can't have two different default values, we have either to accept '|' as a default csv separator, or introduce some tricks to automagically commute the value. 2. when a user does \pset fieldsep ';' there are doubts on whether this should affect the current mode only, or both, or even the other one when the user goes back and forth between formats. Any proposal where a \pset var value affects by side-effect another pset variable is inconsistent with what has been done until now with these variables, and I expect that side-effects are not exactly sought after in general. 3. the command-line cannot express: use character 'X' for unaligned and 'Y' for csv for the rest of the session. The current patch provides '--csv' to select both the csv format and a comma separator, but when combined with -F, the end result depend on the relative position of the options, which may feel unintuitive or buggy. Again we could tweak that, but only by being inconsistent with how we process other options. Personally I think the benefit of sharing fieldsep is not worth these problems, but I'm waiting for the discussion to continue with more opinions. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Pavel Stehule wrote: > so we can to have > > \pset format xxx > > and set of local possibly changed defaults > > \pset csv_fieldsep , > \pset csv_tuplesonly on > \pset unaligned_fieldsep | > \pset unaligned_tuplesonly off tuples_only (\t) is a single setting that is currently used by all formats. It makes sense as it is and I don't quite see what we would gain by "exploding" it. There's also "footer" that goes in tandem with "tuples_only", to switch off the footer while keeping the header and title. Whatever change is considered to "tuples_only", "footer" must be considered with it. For the csv format, tuples_only=off is interpreted as "output the header" and tuples_only=on as "don't output the header". This is consistent with what other formats do. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Peter Eisentraut wrote: > Another thought: Isn't CSV just the same as unaligned output plus some > quoting? Could we add a quote character setting and then define --csv > to be quote-character = " and fieldsep = , ? Plus footer set to off. So --csv would be like \pset format unaligned \pset fieldsep ',' \pset footer off \pset unaligned_quote_character '"' I guess we'd add a \csv command that does these together. There would still be some discomfort with some of the issues discussed upthread. For instance psql -F ';' --csv is likely to reset fieldsep to ',' having then a different outcome than psql --csv -F';' And there's the point on how to come back from \csv to another output, given that it has overwritten "footer" that is used across all formats, and fieldsep used by unaligned. So a user might need to figure out anyway the intricaties behind \csv, if it's not an independant format. On the whole I'm inclined to resubmit the patch with fieldsep_csv and some minor changes based on the rest of the discussion. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Isaac Morland wrote: > OK, mostly trying to avoid commenting because I doubt I have much to add. > But. If I ask for CSV and don't specify any overrides, I expect to get > "C"omma separated values, not some other character. More specifically, if I > say --csv I expect to get files that are identical with what I would get if > I used COPY ... CSV. Actually, COPY ... CSV HEADER, given that psql shows > column headings. This also implies that I expect the quoting and related > details that are associated with CSV. Yes, this is what the current patch does. Only the first version used the same default delimiter as the unaligned mode, but nobody liked that part. As for the header, it's driven by how it's already done for other formats in psql, with the tuples_only setting, which implies that column names are output by default as you mention. Both settings must be changeable, the exact details of how/when is what's being discussed. The output might still differ compared to COPY in that line endings depend on the client-side OS. There's also the minor issue of a single \ by itself on a line, which gets quoted by COPY and not by psql. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Daniel Verite wrote: > The output might still differ compared to COPY in that line endings > depend on the client-side OS. There's also the minor issue > of a single \ by itself on a line, which gets quoted by COPY > and not by psql. I meant \. or backslash followed by period. This sequence means nothing in particular in csv outside of COPY, but it means end-of-data for COPY. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Proposal: Adding json logging
David Arnold wrote: > Not claiming this assumption does imply parsing of a *rolling* set > of log lines with *previously unkown cardinality*. That's expensive > on computing resources. I don't have actual numbers, but it doesn't > seem too far fetched, neither. > I filed a question to the author of fluent-bit to that extend which > you can consult here: > https://github.com/fluent/fluent-bit/issues/564 Let's see what > Eduardo has to inform us about this... fluent-bit does not appear to support CSV, as mentioned in https://github.com/fluent/fluent-bit/issues/459 which got flagged as an enhancement request some time ago. In CSV a line break inside a field is easy to process for a parser, because (per https://tools.ietf.org/html/rfc4180): "Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes" So there is no look ahead to do. In a character-by-character loop, when encountering a line break, either the current field did not start with a double quote and the line break is part of the content, or it did start with a double quote and the line break ends the current record. What doesn't quite work is to parse CSV with a regex, it's discussed in some detail here for instance: https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Proposal: Adding json logging
David Arnold wrote: > Interesting, does that implicitly mean the whole log event would get > transmitted as a "line" (with CRLF) in CSV. To me it's implied by the doc at: https://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG > In the affirmative scenario, this then would work for a true streaming > aggregator (if CSV was supported). Assuming a real CSV parser tailing the log, there shouldn't be any trouble with that. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [PATCH] vacuumlo: print the number of large objects going to be removed
Michael Paquier wrote: > Sure. However do we need to introduce this much complication as a > goal for this patch though whose goal is just to provide hints about > the progress of the work done by vacuumlo? Yeah, I went off on a tangent when realizing that ~500 lines of C client-side code in vacuumlo could be turned into ~50 lines of plpgsql in a block. That was not meant as on objection to the patch (besides I followed the plpgsql approach and got disappointed with the performance of lo_unlink() in a loop compared to the client-side equivalent, so I won't bother -hackers with this idea anymore, until I figure out why it's not faster and if I can do something about it). One comment about the patch: + longto_delete = 0; ... + to_delete = strtol(PQcmdTuples(res), NULL, 10); I believe the maximum number of large objects is almost 2^32, and as a count above 2^31 may not fit into a signed long, shouldn't we use an unsigned long instead? This would also apply to the preexisting "deleted" variable. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Create collation reporting the ICU locale display name
Hi, The 'locale' or 'lc_collate/lc_ctype' argument of an ICU collation may have a complicated syntax, especially with non-deterministic collations, and input mistakes in these names will not necessarily be detected as such by ICU. The "display name" of a locale is a simple way to get human-readable feedback about the characteristics of that locale. pg_import_system_collations() already push these as comments when creating locales en masse. I think it would be nice to have CREATE COLLATION report this information as feedback in the form of a NOTICE message. PFA a simple patch implementing that. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index 919e092483..ad011d149c 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -36,6 +36,10 @@ #include "utils/syscache.h" +#ifdef USE_ICU +static char *get_icu_locale_comment(const char *localename); +#endif + typedef struct { char *localename; /* name of locale, as per "locale -a" */ @@ -232,6 +236,16 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e if (!OidIsValid(newoid)) return InvalidObjectAddress; +#ifdef USE_ICU + if (collprovider == COLLPROVIDER_ICU) + { + char *display_name = get_icu_locale_comment(collcollate); + if (display_name != NULL) + ereport(NOTICE, + (errmsg("ICU locale: \"%s\"", display_name))); + } +#endif + /* * Check that the locales can be loaded. NB: pg_newlocale_from_collation * is only supposed to be called on non-C-equivalent locales.
Re: Create collation reporting the ICU locale display name
Michael Paquier wrote: > On Wed, Sep 11, 2019 at 04:53:16PM +0200, Daniel Verite wrote: > > I think it would be nice to have CREATE COLLATION report this > > information as feedback in the form of a NOTICE message. > > PFA a simple patch implementing that. > > Why is that better than the descriptions provided with \dO[S]+ in > psql? There is no description for collations created outside of pg_import_system_collations(). Example: db=# create collation mycoll(provider=icu, locale='fr-FR-u-ks-level1'); NOTICE: ICU locale: "French (France, colstrength=primary)" db=# \x auto db=# \dO+ List of collations -[ RECORD 1 ]--+-- Schema | public Name | mycoll Collate| fr-FR-u-ks-level1 Ctype | fr-FR-u-ks-level1 Provider | icu Deterministic? | yes Description| The NOTICE above is with the patch. Otherwise, the "display name" is never shown nor stored anywhere AFAICS. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Create collation reporting the ICU locale display name
Michael Paquier wrote: > Or could it make sense to provide a system function which returns a > collation description for at least an ICU-provided one? We could make > use of that in psql for example. If we prefer having a function over the instant feedback effect of the NOTICE, the function might look like icu_collation_attributes() [1] from the icu_ext extension. It returns a set of (attribute,value) tuples, among which the displayname is one of the values. An advantage of this approach is that you may execute the function before creating the collation, instead of creating the collation, realizing there was something wrong in your locale/lc_collate argument, dropping the collation and trying again. Another advantage would be the possibility of localizing the display name, leaving the localization as a choice to the user. Currently get_icu_locale_comment() forces "en" as the language because it want results in US-ASCII, but a user-callable function could have the language code as an optional argument. When not being forced, the language has a default value obtained by ICU from the environment (so that would be from where the postmaster is started in our case), and is also settable with uloc_setDefault(). Example with icu_ext functions: test=> select icu_set_default_locale('es'); icu_set_default_locale es test=> select value from icu_collation_attributes('en-US-u-ka-shifted') where attribute='displayname'; value inglés (Estados Unidos, alternate=shifted) This output tend to reveal mistakes with tags, which is why I thought to expose it as a NOTICE. It addresses the case of a user who wouldn't suspect an error, so the "in-your-face" effect is intentional. With the function approach, the user must be proactive. An example of mistake I found myself doing is forgetting the '-u-' before the collation tags, which doesn't error out but is detected relatively easily with the display name. -- wrong test=> select value from icu_collation_attributes('de-DE-ks-level1') where attribute='displayname'; value - German (Germany, KS_LEVEL1) -- right test=> select value from icu_collation_attributes('de-DE-u-ks-level1') where attribute='displayname'; value --- German (Germany, colstrength=primary) [1] https://github.com/dverite/icu_ext#icu_collation_attributes Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Create collation reporting the ICU locale display name
Tom Lane wrote: > I think that's a useful function, but it's a different function from > the one first proposed, which was to tell you the properties of a > collation you already installed (which might not be ICU, even). > Perhaps we should have both. The pre-create use case would look like: SELECT * FROM describe_collation(locale_string text, collprovider "char") Post-creation, one could do: SELECT * FROM describe_collation(collcollate, collprovider) FROM pg_catalog.pg_collation WHERE oid = :OID; Possibly it could exists as SELECT * FROM describe_collation(oid) but that's essentially the same function. Or I'm missing something about why we'd need two functions. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Create collation reporting the ICU locale display name
Tom Lane wrote: > > This output tend to reveal mistakes with tags, which is why I thought > > to expose it as a NOTICE. It addresses the case of a user > > who wouldn't suspect an error, so the "in-your-face" effect is > > intentional. With the function approach, the user must be > > proactive. > > That argument presupposes (a) manual execution of the creation query, > and (b) that the user pays close attention to the NOTICE output. > Unfortunately, I think our past over-use of notices has trained > people to ignore them. What about DEBUG1 as the level? Surely we can draw a line somewhere beyond which the benefit of getting that information surpasses the annoyance factor that you're foreseeing? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: ICU for global collation
Hi, When trying databases defined with ICU locales, I see that backends that serve such databases seem to have their LC_CTYPE inherited from the environment (as opposed to a per-database fixed value). That's a problem for the backend code that depends on libc functions that themselves depend on LC_CTYPE, such as the full text search parser and dictionaries. For instance, if you start the instance with a C locale (LC_ALL=C pg_ctl...) , and tries to use FTS in an ICU UTF-8 database, it doesn't work: template1=# create database "fr-utf8" template 'template0' encoding UTF8 locale 'fr' collation_provider 'icu'; template1=# \c fr-utf8 You are now connected to database "fr-utf8" as user "daniel". fr-utf8=# show lc_ctype; lc_ctype -- fr (1 row) fr-utf8=# select to_tsvector('été'); ERROR: invalid multibyte character for locale HINT: The server's LC_CTYPE locale is probably incompatible with the database encoding. If I peek into the "real" LC_CTYPE when connected to this database, I can see it's "C": fr-utf8=# create extension plperl; CREATE EXTENSION fr-utf8=# create function lc_ctype() returns text as '$ENV{LC_CTYPE};' language plperl; CREATE FUNCTION fr-utf8=# select lc_ctype(); lc_ctype -- C Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: insensitive collations
Peter Eisentraut wrote: [v7-0001-Collations-with-nondeterministic-comparison.patch] +GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation) { + if (collation && !lc_ctype_is_c(collation) && collation != DEFAULT_COLLATION_OID) + { +pg_locale_tlocale = pg_newlocale_from_collation(collation); + +if (locale && !locale->deterministic) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("nondeterministic collations are not supported for LIKE"))); + } This test gets side-stepped when pattern_fixed_prefix() in selfuncs.c returns Pattern_Prefix_Exact, and the code optimizes the operation by converting it to a bytewise equality test, or a bytewise range check in the index with Pattern_Type_Prefix. Here's a reproducer: === create collation ciai (locale='und-u-ks-level1', deterministic=false, provider='icu'); create table w(t text collate "C"); insert into w select md5(i::text) from generate_series(1,1) as i; insert into w values('abc'); create index indexname on w(t ); select t from w where t like 'ABC' collate ciai; t --- (0 rows) select t from w where t like 'ABC%' collate ciai; t --- (0 rows) === For the LIKE operator, I think the fix should be that like_fixed_prefix() should always return Pattern_Prefix_None for non-deterministic collations. For regular expressions, pg_set_regex_collation() is called at some point when checking for a potential prefix, and since it errors out with non-deterministic collations, this issue is taken care of already. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: insensitive collations
Peter Eisentraut wrote: > Older ICU versions (<54) don't support all the locale customization > options, so many of my new tests in collate.icu.utf8.sql will fail on > older systems. What should we do about that? Have another extra test file? Maybe stick to the old-style syntax for the regression tests? The declarations that won't work as expected with older ICU versions would be: CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); 'und-u-ks-level2' is equivalent to 'und@colStrength=secondary' CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false); 'und-u-ks-level1-kc-true' => 'und@colStrength=primary;colCaseLevel=yes' Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: insensitive collations
Peter Eisentraut wrote: > The problem is not the syntax but that the older ICU versions don't > support the *functionality* of ks-level2 or colStrength=secondary. If > you try it, you will simply get a normal case-sensitive behavior. My bad, I see now that the "old locale extension syntax" was actually introduced at the same time than the "language tag syntax" in ICU 54: http://bugs.icu-project.org/trac/ticket/8260 With previous versions, we'd need to call ucol_setAttribute(), with the attributes and values defined here: http://icu-project.org/apiref/icu4c/ucol_8h.html for instance to get colStrength=secondary: ucol_setAttribute(coll, UCOL_STRENGTH , UCOL_SECONDARY, &status); which I've just checked gives the expected result with ICU-4.2. These attributes are flagged as "Stable: ICU 2.0" up to "Stable: ICU 2.8" (for UCOL_NUMERIC_COLLATION ). So if we really wanted to have these functionalities with pre-54 ICU, we could but that would mean implementing an interface to pass to CREATE COLLATION the attributes/values we want to support. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Willing to fix a PQexec() in libpq module
Tom Lane wrote: > Unfortunately, if the default behavior doesn't change, then there's little > argument for doing this at all. The security reasoning behind doing > anything in this area would be to provide an extra measure of protection > against SQL-injection attacks on carelessly-written clients, and of course > it's unlikely that a carelessly-written client would get changed to make > use of a non-default feature. A patch introducing an "allow_multiple_queries" GUC to control this was proposed and eventually rejected for lack of consensus some time ago (also there were some concerns about the implementation that might have played against it too): https://www.postgresql.org/message-id/CALAY4q_eHUx%3D3p1QUOvabibwBvxEWGm-bzORrHA-itB7MBtd5Q%40mail.gmail.com About the effectiveness of this feature, there's a valid use case in which it's not the developers who decide to set this GUC, but the DBA or the organization deploying the application. That applies to applications that of course do not intentionally use multiple queries per command. That would provide a certain level a protection against SQL injections, without changing the application or libpq or breaking backward compatibility, being optional. But both in this thread and the other thread, the reasoning about the GUC seems to make the premise that applications would need to be updated or developpers need to be aware of it, as if they _had_ to issue SET allow_multiple_queries TO off/on, rather than being submitted to it, as imposed upon them by postgresql.conf or the database settings. If we compare this to, say, lo_compat_privileges. An application typically doesn't get to decide whether it's "on". It's for a superuser to decide which databases or which users must operate with this setting to "on". Why wouldn't that model work for disallowing multiple queries per command? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: psql - add SHOW_ALL_RESULTS option
Fabien COELHO wrote: > >> I'd go further and suggest that there shouldn't be a variable > >> controlling this. All results that come in should be processed, period. > > > > I agree with that. > > I kind of agree as well, but I was pretty sure that someone would complain > if the current behavior was changed. If queries in a compound statement must be kept silent, they can be converted to CTEs or DO-blocks to produce the same behavior without having to configure anything in psql. That cost on users doesn't seem too bad, compared to introducing a knob in psql, and presumably maintaining it forever. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: psql - add SHOW_ALL_RESULTS option
Fabien COELHO wrote: > Attached a "do it always version", which does the necessary refactoring. > There is seldom new code, it is rather moved around, some functions are > created for clarity. Thanks for the update! FYI you forgot to remove that bit: --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3737,7 +3737,7 @@ psql_completion(const char *text, int start, int end) else if (TailMatchesCS("\\set", MatchAny)) { if (TailMatchesCS("AUTOCOMMIT|ON_ERROR_STOP|QUIET|" - "SINGLELINE|SINGLESTEP")) + "SINGLELINE|SINGLESTEP|SHOW_ALL_RESULTS")) Also copydml does not seem to be exercised with combined queries, so do we need this chunk: --- a/src/test/regress/sql/copydml.sql +++ b/src/test/regress/sql/copydml.sql @@ -70,10 +70,10 @@ drop rule qqq on copydml_test; create function qqq_trig() returns trigger as $$ begin if tg_op in ('INSERT', 'UPDATE') then -raise notice '% %', tg_op, new.id; +raise notice '% % %', tg_when, tg_op, new.id; return new; else -raise notice '% %', tg_op, old.id; +raise notice '% % %', tg_when, tg_op, old.id; return old; end if; Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: psql - add SHOW_ALL_RESULTS option
Fabien COELHO wrote: > sh> /usr/bin/psql > psql (12beta2 ...) > fabien=# \set FETCH_COUNT 2 > fabien=# SELECT 1234 \; SELECT 5432 ; > fabien=# > > same thing with pg 11.4, and probably down to every version of postgres > since the feature was implemented... > > I think that fixing this should be a separate bug report and patch. I'll > try to look at it. That reminds me that it was already discussed in [1]. I should add the proposed fix to the next commitfest. [1] https://www.postgresql.org/message-id/flat/a0a854b6-563c-4a11-bf1c-d6c6f924004d%40manitou-mail.org Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [HACKERS] proposal: psql command \graw
Pavel Stehule wrote: > We are able to generate html/tex/markdown formats on client side. CSV is > more primitive, but much more important data format, so it should not be a > problem. But I remember some objections related to code duplication. While experimenting with adding csv as an output format (similar to unaligned except that it quotes with CSV rules), I find out that we can already do this: \o | client-side-program with arguments or \o /path/to/file.csv COPY (select query) TO STDOUT CSV [HEADER] ; \o This can be used to route multiple resultsets into multiple programs or files without the minor drawbacks of \copy (single line formatting, no variables interpolation), or psql -c "COPY (query) TO STDOUT.." >file.csv which is a bit rigid (single redirect). OTOH I notice that this simpler form with a right-side argument of \g: =# COPY (select 1) TO STDOUT CSV HEADER \g /tmp/file.csv outputs to the console rather into a file. Not sure why it acts differently than \o and whether it's intentional. Anyway I think the above sequence with \o already does everything that is being proposed as an alternative to \graw, or am I missing something? What \o + COPY CSV lacks compared to a csv output format is the ability to export results from meta-commands in csv, such as \g in \x mode or \gx, \crosstabview, \d, \l... The other problem with \o |program + COPY CSV is that it's not easy to discover that combination. It's more obvious if we have csv explicitly listed as an output format. I'll post the patch implementing that in a new thread and we'll see how it goes. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
csv format for psql
Hi, This patch implements csv as an output format in psql (\pset format csv). It's quite similar to the unaligned format, except that it applies CSV quoting rules (obviously!) and that it prints no footer and no title. As with unaligned, a header with column names is output unless tuples_only is on. It also supports the fieldsep/fielsep_zero and recordsep/recordsep_zero settings. Most of times, the need for CSV is covered by \copy or COPY with the CSV option, but there are some cases where it would be more practical to have it as an output format in psql. * \copy does not interpolate psql variables and is a single-line command, so making a query fit these contraints can be cumbersome. It can be got around by defining a temporary view and \copy from that view, but that doesn't work in a read-only context such as when connected to a standby. * the server-side COPY TO STDOUT can also be used from psql, typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv, but that's too simple to extract multiple result sets per script. COPY is also more rigid than psql in the options to delimit fields and records. * copy with csv can't help for the output of meta-commands such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql does work with these. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7ea7edc..ebb3d35 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -246,7 +246,7 @@ EOF Use separator as the - field separator for unaligned output. This is equivalent to + field separator for unaligned and csv outputs. This is equivalent to \pset fieldsep or \f. @@ -376,7 +376,7 @@ EOF Use separator as the - record separator for unaligned output. This is equivalent to + record separator for unaligned and csv outputs. This is equivalent to \pset recordsep. @@ -551,8 +551,8 @@ EOF --field-separator-zero - Set the field separator for unaligned output to a zero byte. This is - equvalent to \pset fieldsep_zero. + Set the field separator for unaligned and csv outputs to a zero byte. + This is equivalent to \pset fieldsep_zero. @@ -562,8 +562,9 @@ EOF --record-separator-zero - Set the record separator for unaligned output to a zero byte. This is - useful for interfacing, for example, with xargs -0. + Set the record separator for unaligned and csv outputs to a zero byte. + This is useful for interfacing, for example, with + xargs -0. This is equivalent to \pset recordsep_zero. @@ -1918,9 +1919,9 @@ Tue Oct 26 21:40:57 CEST 1999 -Sets the field separator for unaligned query output. The default -is the vertical bar (|). It is equivalent to -\pset fieldsep. +Sets the field separator for unaligned and csv query outputs. The +default is the vertical bar (|). It is equivalent +to \pset fieldsep. @@ -2527,8 +2528,8 @@ lo_import 152801 fieldsep - Specifies the field separator to be used in unaligned output - format. That way one can create, for example, tab- or + Specifies the field separator to be used in unaligned and csv output + formats. That way one can create, for example, tab- or comma-separated output, which other programs might prefer. To set a tab as field separator, type \pset fieldsep '\t'. The default field separator is @@ -2541,8 +2542,8 @@ lo_import 152801 fieldsep_zero - Sets the field separator to use in unaligned output format to a zero - byte. + Sets the field separator to use in unaligned or csv output formats to + a zero byte. @@ -2565,9 +2566,13 @@ lo_import 152801 format - Sets the output format to one of unaligned, - aligned, wrapped, - html, asciidoc, + Sets the output format to one of + unaligned, + aligned, + csv, + wrapped, + html, + asciidoc, latex (uses tabular), latex-longtable, or troff-ms. @@ -2582,6 +2587,12 @@ lo_import 152801 format). + csv format is similar to + unaligned, except that column contents are + enclosed in double quotes and quoted when necessary according to the + rules of the CSV format, and that no title or footer are printed. + + aligned format is the standard, human-readable, nicely formatte
Re: csv format for psql
Fabien COELHO wrote: > > This patch implements csv as an output format in psql > > (\pset format csv). > > Would you consider registering it in the next CF? Done here: https://commitfest.postgresql.org/17/1500/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Pavel Stehule wrote: > This format is too important, so some special short or long option can be > practical (it will be printed in help) > > some like --csv I guess -C/--csv could be used, like there already is -H/--html. > I found one issue - PostgreSQL default field separator is "|". Maybe good > time to use more common "," ? > > Or when field separator was not explicitly defined, then use "," for CSV, > and "|" for other. Although it can be little bit messy Currently there's a strong analogy between the unaligned mode and csv mode. In particular they use the existing pset variables fieldsep, fieldsep_zero, recordsep, recordsep_zero in the same way. If we want to make csv special with regard to the delimiters, that complicates the user interface For instance if fieldsep was changed automatically by \pset format csv, it's not obvious if/when we should switch it back to its previous state, and how the fieldsep switch done automatically would mix or conflict with other \pset commands issued by the user. Or we need to duplicate these variables. Or duplicate only fieldsep, having a fieldsep_csv, leaving out the other variables and not being as close to the unaligned mode. These options don't appeal to me much compared to the simplicity of the current patch. Also, although the comma is the separator defined by the RFC4180 and the default for COPY CSV, people also use the semicolon extensively (because it's what Excel does I guess), which somehow mitigates the importance of comma as the default value. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: proposal: alternative psql commands quit and exit
Bruce Momjian wrote: > One open issue is the existing help display is inaccurate on Windows: > > Use \\? for help or press control-C to clear the input buffer. ! #ifndef WIN32 ! puts(_("Use control-D to quit.")); ! #else ! puts(_("Use control-C to quit.")); ! #endif But Control-C exiting on Windows is a bug, isn't it? Shouldn't we try to fix it to behave like in Unix rather than documenting it? Also, the fact that Control-D can quit in the middle of a multiline query without any confirmation is a usability problem, because you can always fat-finger a Ctrl+key. By comparison, bash doesn't accept it and emits the same error as if a script was improperly terminated. Example: $ cat ' > [Hit Ctrl+D here] bash: unexpected EOF while looking for matching `'' bash: syntax error: unexpected end of file $ There's also the issue that, in general, communicating different hints and advice depending on the host operating system is not ideal. Because people ask "how do I do such and such in psql?", they do not ask "how do I do such and such in psql in Windows?". Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: proposal: alternative psql commands quit and exit
Bruce Momjian wrote: > > Also, the fact that Control-D can quit in the middle of a > > multiline query without any confirmation is a usability problem, because > > you can always fat-finger a Ctrl+key. By comparison, bash doesn't > > accept it and emits the same error as if a script was improperly > > terminated. Example: > > > > $ cat ' > > > [Hit Ctrl+D here] bash: unexpected EOF while looking for matching `'' > > bash: syntax error: unexpected end of file > > $ BTW, I've missed that Ctrl+D behavior also depends on IGNOREEOF. From https://www.postgresql.org/docs/current/static/app-psql.html IGNOREEOF If set to 1 or less, sending an EOF character (usually Control+D) to an interactive session of psql will terminate the application. If set to a larger numeric value, that many consecutive EOF characters must be typed to make an interactive session terminate. If the variable is set to a non-numeric value, it is interpreted as 10. The default is 0. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Pavel Stehule wrote: > > I guess -C/--csv could be used, like there already is -H/--html. > > > > I prefer just long option only. Maybe in future we can use short "C" for > something else better. There is only few free short commands. Looking at parse_psql_options(), currently psql has 35 long options and all of them have a corresponding single-character invocation. Unless there is a consensus for it, I don't feel like it's time to create the precedent that some options deserve short forms and others don't. Sure at some point we'll run out of letters, but looking back ten years ago, on Jan 2008 psql had 31 options, so the rate of adding new ones does not look worrying. Besides I like the fact that -C can be seen as a drop-in replacement for -A, because in most cases, it's just a safer version of -A, as it deals with the situation that the separator might be in the contents, which is the main weakness of -A. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Pavel Stehule wrote: > some like "\pset format csv , header_off If we did that, we'd need to reconsider the interactions of this with \t on|off and \pset footer on|off and how to keep things consistent with the unaligned format. I feel like it's not worth the trouble. We can still add this later if users are confused with the interface, but that interface is already well established with the unaligned format. > 2. if we support csv format, then we can introduce \gcsv xxx | gnuplot I thought committers rejected this because they didn't want a new variant of \g The csv format makes it possible to not add any new \g-style metacommand and inject csv into the existing ones. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: ICU for global collation
Peter Eisentraut wrote: > I looked into this problem. The way to address this would be adding > proper collation support to the text search subsystem. See the TODO > markers in src/backend/tsearch/ts_locale.c for starting points. These > APIs spread out to a lot of places, so it will take some time to finish. > In the meantime, I'm pausing this thread and will set the CF entry as RwF. Even if the FTS code is improved in that matter, any extension code with libc functions depending on LC_CTYPE is still going to be potentially problematic. In particular when it happens to be set to a different encoding than the database. Couldn't we simply invent per-database GUC options, as in ALTER DATABASE myicudb SET libc_lc_ctype TO 'value'; ALTER DATABASE myicudb SET libc_lc_collate TO 'value'; where libc_lc_ctype/libc_lc_collate would specifically set the values in the LC_CTYPE and LC_COLLATE environment vars of any backend serving the corresponding database"? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: updating unaccent.rules for Arabic letters
kerbrose khaled wrote: > I would like to update unaccent.rules file to support Arabic letters. so > could someone help me or tell me how could I add such contribution. I > attached the file including the modifications, only the last 4 lines. The Arabic letters are found in the Unicode block U+0600 to U+06FF (https://www.fileformat.info/info/unicode/block/arabic/list.htm) There has been no coverage of this block until now by the unaccent module. Since Arabic uses several diacritics [1] , it would be best to figure out all the transliterations that should go in and let them in one go (plus coding that in the Python script). The canonical way to unaccent is normally to apply a Unicode transformation: NFC -> NFD and remove the non-spacing marks. I've tentatively did that with each codepoint in the 0600-06FF block in SQL with icu_transform in icu_ext [2], and it produces the attached result, with 60 (!) entries, along with Unicode names for readability. Does that make sense to people who know Arabic? For the record, here's the query: WITH block(cp) AS (select * FROM generate_series(x'600'::int,x'6ff'::int) AS cp), dest AS (select cp, icu_transform(chr(cp), 'any-NFD;[:nonspacing mark:] any-remove; any-NFC') AS unaccented FROM block) SELECT chr(cp) as "src", icu_transform(chr(cp), 'Name') as "srcName", dest.unaccented as "dest", icu_transform(dest.unaccented, 'Name') as "destName" FROM dest WHERE chr(cp) <> dest.unaccented; [1] https://en.wikipedia.org/wiki/Arabic_diacritics [2] https://github.com/dverite/icu_ext#icu_transform Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite unaccent-arabic-block.utf8.output Description: Binary data
Making psql error out on output failures
Hi, When exporting data with psql -c "..." >file or select ... \g file inside psql, post-creation output errors are silently ignored. The problem can be seen easily by creating a small ramdisk and filling it over capacity: $ sudo mount -t tmpfs -o rw,size =1M tmpfs /mnt/ramdisk $ psql -d postgres -At \ -c "select repeat('abc', 100)" > /mnt/ramdisk/file $ echo $? 0 $ ls -l /mnt/ramdisk/file -rw-r--r-- 1 daniel daniel 1048576 Dec 16 15:57 /mnt/ramdisk/file $ df -h /mnt/ramdisk/ Filesystem Size Used Avail Use% Mounted on tmpfs 1.0M 1.0M 0 100% /mnt/ramdisk The output that should be 3M byte long is truncated as expected, but we got no error message and no error code from psql, which is obviously not nice. The reason is that PrintQuery() and the code below it in fe_utils/print.c call puts(), putc(), fprintf(),... without checking their return values or the result of ferror() on the output stream. If we made them do that and had the printing bail out at the first error, that would be a painful change, since there are a lot of such calls: $ egrep -w '(fprintf|fputs|fputc)' fe_utils/print.c | wc -l 326 and the call sites are in functions that have no error reporting paths anyway. To start the discussion, here's a minimal patch that checks ferror() in PrintQueryTuples() to raise an error when the printing is done (better late than never). The error message is generic as opposed to using errno, as I don't know whether errno has been clobbered at this point. OTOH, I think that the error indicator on the output stream is not cleared by successful writes after some other previous writes have failed. Are there opinions on whether this should be addressed simply like in the attached, or a large refactoring of print.c to bail out at the first error would be better, or other ideas on how to proceed? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index 9c0d53689e..993484d092 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -718,6 +718,7 @@ static bool PrintQueryTuples(const PGresult *results) { printQueryOpt my_popt = pset.popt; + bool result = true; /* one-shot expanded output requested via \gx */ if (pset.g_expanded) @@ -736,6 +737,12 @@ PrintQueryTuples(const PGresult *results) printQuery(results, &my_popt, fout, false, pset.logfile); + if (ferror(fout)) + { + pg_log_error("Error printing tuples"); + result = false; + } + if (is_pipe) { pclose(fout); @@ -745,9 +752,16 @@ PrintQueryTuples(const PGresult *results) fclose(fout); } else + { printQuery(results, &my_popt, pset.queryFout, false, pset.logfile); + if (ferror(pset.queryFout)) + { + pg_log_error("Error printing tuples"); + result = false; + } + } - return true; + return result; }
Re: Unicode normalization SQL functions
Peter Eisentraut wrote: > Also, there is a way to optimize the "is normalized" test for common > cases, described in UTR #15. For that we'll need an additional data > file from Unicode. In order to simplify that, I would like my patch > "Add support for automatically updating Unicode derived files" > integrated first. Would that explain that the NFC/NFKC normalization and "is normalized" check seem abnormally slow with the current patch, or should it be regarded independently of the other patch? For instance, testing 1 short ASCII strings: postgres=# select count(*) from (select md5(i::text) as t from generate_series(1,1) as i) s where t is nfc normalized ; count --- 1 (1 row) Time: 2573,859 ms (00:02,574) By comparison, the NFD/NFKD case is faster by two orders of magnitude: postgres=# select count(*) from (select md5(i::text) as t from generate_series(1,1) as i) s where t is nfd normalized ; count --- 1 (1 row) Time: 29,962 ms Although NFC/NFKC has a recomposition step that NFD/NFKD doesn't have, such a difference is surprising. I've tried an alternative implementation based on ICU's unorm2_isNormalized() /unorm2_normalize() functions (which I'm currently adding to the icu_ext extension to be exposed in SQL). With these, the 4 normal forms are in the 20ms ballpark with the above test case, without a clear difference between composed and decomposed forms. Independently of the performance, I've compared the results of the ICU implementation vs this patch on large series of strings with all normal forms and could not find any difference. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
Dent John wrote: > I’ve made a revision of this patch. Some comments: * the commitfest app did not extract up the patch from the mail, possibly because it's buried in the MIME structure of the mail (using plain text instead of HTML messages might help with that). The patch has no status in http://commitfest.cputube.org/ probably because of this too. * unnest-refcursor-v3.patch needs a slight rebase because this chunk in the Makefile fails - regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \ + refcursor.o regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \ * I'm under the impression that refcursor_unnest() is functionally equivalent to a plpgsql implementation like this: create function unnest(x refcursor) returns setof record as $$ declare r record; begin loop fetch x into r; exit when not found; return next r; end loop; end $$ language plpgsql; but it would differ in performance, because internally a materialization step could be avoided, but only when the other patch "Allow FunctionScans to pipeline results" gets in? Or are performance benefits expected right away with this patch? * --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -5941,7 +5941,7 @@ array_fill_internal(ArrayType *dims, ArrayType *lbs, /* - * UNNEST + * UNNEST (array) */ This chunk looks unnecessary? * some user-facing doc would be needed. * Is it good to overload "unnest" rather than choosing a specific function name? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
Dent John wrote: > Yes. That’s at least true if unnest(x) is used in the FROM. If it’s used in > the SELECT, actually it can get the performance benefit right away At a quick glance, I don't see it called in the select-list in any of the regression tests. When trying it, it appears to crash (segfault): postgres=# begin; BEGIN postgres=# declare c cursor for select oid::int as i, relname::text as r from pg_class; DECLARE CURSOR postgres=# select unnest('c'::refcursor); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The build is configured with: ./configure --enable-debug --with-icu --with-perl --enable-depend Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Making psql error out on output failures
David Z wrote: > $ psql -d postgres -At -c "select repeat('111', 100)" > > /mnt/ramdisk/file The -A option selects the unaligned output format and -t switches to the "tuples only" mode (no header, no footer). > Test-2: delete the "file", run the command within psql console, > $ rm /mnt/ramdisk/file > $ psql -d postgres In this invocation there's no -A and -t, so the beginning of the output is going to be a left padded column name that is not in the other output. The patch is not involved in that difference. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
Dent John wrote: > It’s crashing when it’s checking that the returned tuple matches the > declared return type in rsinfo->setDesc. Seems rsinfo->setDesc gets > overwritten. So I think I have a memory management problem. What is the expected result anyway? A single column with a "record" type? FWIW I notice that with plpgsql, this is not allowed to happen: CREATE FUNCTION cursor_unnest(x refcursor) returns setof record as $$ declare r record; begin loop fetch x into r; exit when not found; return next r; end loop; end $$ language plpgsql; begin; declare c cursor for select oid::int as i, relname::text as r from pg_class; select cursor_unnest('c'); ERROR: set-valued function called in context that cannot accept a set CONTEXT: PL/pgSQL function cursor_unnest(refcursor) line 8 at RETURN NEXT Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Making psql error out on output failures
David Zhang wrote: > If I change the error log message like below, where "%m" is used to pass the > value of strerror(errno), "could not write to output file:" is copied from > function "WRITE_ERROR_EXIT". > - pg_log_error("Error printing tuples"); > + pg_log_error("could not write to output file: %m"); > then the output message is something like below, which, I believe, is more > consistent with pg_dump. The problem is that errno may not be reliable to tell us what was the problem that leads to ferror(fout) being nonzero, since it isn't saved at the point of the error and the output code may have called many libc functions between the first occurrence of the output error and when pg_log_error() is called. The linux manpage on errno warns specifically about this: NOTES A common mistake is to do if (somecall() == -1) { printf("somecall() failed\n"); if (errno == ...) { ... } } where errno no longer needs to have the value it had upon return from somecall() (i.e., it may have been changed by the printf(3)). If the value of errno should be preserved across a library call, it must be saved: This other bit from the POSIX spec [1] is relevant: "The value of errno shall be defined only after a call to a function for which it is explicitly stated to be set and until it is changed by the next function call or if the application assigns it a value." To use errno in a way that complies with the above, the psql code should be refactored. I don't know if having a more precise error message justifies that refactoring. I've elaborated a bit about that upthread with the initial submission. Besides, I'm not even sure that errno is necessarily set on non-POSIX platforms when fputc or fputs fails. That's why this patch uses the safer approach to emit a generic error message. [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/errno.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: psql - add SHOW_ALL_RESULTS option
Alvaro Herrera wrote: > if this patch enables other psql features, it might be a good step > forward. Yes. For instance if the stored procedures support gets improved to produce several result sets, how is psql going to benefit from it while sticking to the old way (PGresult *r = PQexec(query)) of executing queries that discards N-1 out of N result sets? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: psql - add SHOW_ALL_RESULTS option
Tom Lane wrote: > I'm not really holding my breath for that to happen, considering > it would involve fundamental breakage of the wire protocol. > (For example, extended query protocol assumes that Describe > Portal only needs to describe one result set. There might be > more issues, but that one's bad enough.) I'm not sure that CALL can be used at all with the extended protocol today (just like multiple queries per statements, except that for these, I'm sure). My interpretation is that the extended protocol deliberately lefts out the possibility of multiple result sets because it doesn't fit with how it's designed and if you want to have this, you can just use the old protocol's Query message. There is no need to break anything or invent anything but on the contrary to use the older way. Considering these 3 ways to use libpq to send queries: 1. using old protocol with PQexec: only one resultset. 2. using old protocol with PQsendQuery+looping on PQgetResult: same as #1 except multiple result sets can be processed 3. using extended protocol: not for multiple result sets, not for copy, possibly not for other things, but can use bind parameters, binary format, pipelining,... The current patch is about using #2 instead of #1. They have been patches about doing bits of #3 in some cases (binary output, maybe parameters too?) and none got eventually in. ISTM that the current situation is that psql is stuck at #1 since forever so it's not fully using the capabilities of the protocol, both old and new. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Making psql error out on output failures
David Zhang wrote: > Yes, I agree with you. For case 2 "select repeat('111', 100) \g > /mnt/ramdisk/file", it can be easily fixed with more accurate error > message similar to pg_dump, one example could be something like below. > But for case 1 "psql -d postgres -At -c "select repeat('111', 100)" > > /mnt/ramdisk/file" , it may require a lot of refactoring work. I don't quite see why you make that distinction? The relevant bits of code are common, it's all the code in src/fe_utils/print.c called from PrintQuery(). Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: making the backend's json parser work in frontend code
Robert Haas wrote: > With the format I proposed, you only have to worry that the > file name might contain a tab character, because in that format, tab > is the delimiter It could be CSV, which has this problem already solved, is easier to parse than JSON, certainly no less popular, and is not bound to a specific encoding. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Unicode normalization SQL functions
Peter Eisentraut wrote: > Here is an updated patch set that now also implements the "quick check" > algorithm from UTR #15 for making IS NORMALIZED very fast in many cases, > which I had mentioned earlier in the thread. I found a bug in unicode_is_normalized_quickcheck() which is triggered when the last codepoint of the string is beyond U+1. On encountering it, it does: + if (is_supplementary_codepoint(ch)) + p++; When ch is the last codepoint, it makes p point to the ending zero, but the subsequent p++ done by the for loop makes it miss the exit and go into over-reading. But anyway, what's the reason for skipping the codepoint following a codepoint outside of the BMP? I've figured that it comes from porting the Java code in UAX#15: public int quickCheck(String source) { short lastCanonicalClass = 0; int result = YES; for (int i = 0; i < source.length(); ++i) { int ch = source.codepointAt(i); if (Character.isSupplementaryCodePoint(ch)) ++i; short canonicalClass = getCanonicalClass(ch); if (lastCanonicalClass > canonicalClass && canonicalClass != 0) { return NO;} int check = isAllowed(ch); if (check == NO) return NO; if (check == MAYBE) result = MAYBE; lastCanonicalClass = canonicalClass; } return result; } source.length() is the length in UTF-16 code units, in which a surrogate pair counts for 2. This would be why it does if (Character.isSupplementaryCodePoint(ch)) ++i; it's meant to skip the 2nd UTF-16 code of the pair. As this does not apply to the 32-bit pg_wchar, I think the two lines above in the C implementation should just be removed. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Making psql error out on output failures
David Zhang wrote: > The error "No space left on device" can be logged by fprintf() which is > redefined as pg_fprintf() when print_aligned_text() is called Are you sure? I don't find that redefinition. Besides print_aligned_text() also calls putc and puts. > Will that be a better solution if redefine fputs similar to fprintf and > log the exact error when first time discovered? I think we can assume it's not acceptable to have pg_fprintf() to print anything to stderr, or to set a flag through a global variable. So even if using pg_fprintf() for all the printing, no matter how (through #defines or otherwise), there's still the problem that the error needs to be propagated up the call chain to be processed by psql. > The concern is that if we can't provide a more accurate > information to the end user when error happens, sometimes the > end user might got even confused. It's better to have a more informative message, but I'm for not having the best be the enemy of the good. The first concern is that at the moment, we have no error report at all in the case when the output can be opened but the error happens later along the writes. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Changes to pg_dump/psql following collation "C" in the catalog
Tom Lane wrote: > "Daniel Verite" writes: > > One consequence of using the "C" collation in the catalog versus > > the db collation is that pg_dump -t with a regexp may not find > > the same tables as before. It happens when these conditions are > > all met: > > - the collation of the database is not "C" > > - the regexp has locale-dependant parts > > - the names to match include characters that are sensitive to > > locale-dependant matching > > Hm, interesting. > > > It seems that to fix that, we could qualify the references to columns such > > as "relname" and "schema_name" with COLLATE "default" clauses in the > > queries that use pattern-matching in client-side tools, AFAICS > > pg_dump and psql. > > Seems reasonable. I was initially worried that this might interfere with > query optimization, but some experimentation says that the planner > successfully derives prefix index clauses anyway (which is correct, > because matching a fixed regex prefix doesn't depend on locale). > > It might be better to attach the COLLATE clause to the pattern constant > instead of the column name; that'd be less likely to break if sent to > an older server. > > > Before going any further with this idea, is there agreement that it's an > > issue to address and does this look like the best way to do that? > > That is a question worth asking. We're going to be forcing people to get > used to this when working directly in SQL, so I don't know if masking it > in a subset of tools is really a win or not. I think psql and pg_dump need to adjust, just like the 3rd party tools will, at least those that support collation-aware search in the catalog. PFA a patch that implements the slight changes needed. I'll add an entry for it in the next CF. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c index 5c1732a..69ac6f9 100644 --- a/src/fe_utils/string_utils.c +++ b/src/fe_utils/string_utils.c @@ -808,6 +808,8 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions, * to limit the set of objects returned. The WHERE clauses are appended * to the already-partially-constructed query in buf. Returns whether * any clause was added. + * The pattern matching uses the collation of the database through explicit + * COLLATE "default" clauses. * * conn: connection query will be sent to (consulted for escaping rules). * buf: output parameter. @@ -971,17 +973,18 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern, appendPQExpBuffer(buf, "(%s OPERATOR(pg_catalog.~) ", namevar); appendStringLiteralConn(buf, namebuf.data, conn); +appendPQExpBufferStr(buf, " COLLATE \"default\""); appendPQExpBuffer(buf, "\nOR %s OPERATOR(pg_catalog.~) ", altnamevar); appendStringLiteralConn(buf, namebuf.data, conn); -appendPQExpBufferStr(buf, ")\n"); +appendPQExpBufferStr(buf, " COLLATE \"default\" )\n"); } else { appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", namevar); appendStringLiteralConn(buf, namebuf.data, conn); -appendPQExpBufferChar(buf, '\n'); +appendPQExpBufferStr(buf, " COLLATE \"default\"\n"); } } } @@ -997,7 +1000,7 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern, WHEREAND(); appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", schemavar); appendStringLiteralConn(buf, schemabuf.data, conn); - appendPQExpBufferChar(buf, '\n'); + appendPQExpBufferStr(buf, " COLLATE \"default\"\n"); } } else
Re: Changes to pg_dump/psql following collation "C" in the catalog
Chapman Flack wrote: > >> "Daniel Verite" writes: > >>> One consequence of using the "C" collation in the catalog versus > >>> the db collation > > As an intrigued Person Following At Home, I was happy when I found > this little three-message thread had more context in [1] and [2]. :) > > -Chap > > [1] https://postgr.es/m/15938.1544377...@sss.pgh.pa.us > [2] https://postgr.es/m/5978.1544030...@sss.pgh.pa.us Yes. The concrete issue that the patch addresses can be illustrated with this example: psql (12devel) Type "help" for help. postgres=# show lc_collate ; lc_collate - fr_FR.UTF-8 (1 row) postgres=# create table année_2018(); CREATE TABLE postgres=# \dt '\\w+_\\d+' psql: error: Did not find any relation named "\w+_\d+". In previous versions it would have found the table with the accent in the name. With 12devel it doesn't, because the match is done with the collation of the column, now "C", which does not consider the accented character to be a letter. This also affects pg_dump with the -t and -n switches that accept patterns and I think pretty much all \d* commands that accept patterns too. The purpose of the fix is for the patterns to give the same results as before. It's done by simply adding explicit collate clauses to use the collation of the database for these comparisons. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Changes to pg_dump/psql following collation "C" in the catalog
Tom Lane wrote: > Hm, if that's as much as we have to touch, I think there's a good > argument for squeezing it into v12 rather than waiting. The point > here is mostly to avoid a behavior change from pre-v12 Yes. I was mentioning the next CF because ISTM that nowadays non-committers are expected to file patches in there, committers picking up patches both in the current and next CF based on their evaluation of priorities. But if you plan to process this one shortly, a CF entry is probably superfluous. > Just looking at the patch, I wonder whether it doesn't need some > server-version checks. At the very least this would break with > pre-9.1 servers, which lack COLLATE altogether. PFA a new version adding the clause for only 12 and up, since the previous versions are not concerned, and as you mention, really old versions would fail otherwise. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c index 5c1732a..fe88d73 100644 --- a/src/fe_utils/string_utils.c +++ b/src/fe_utils/string_utils.c @@ -808,6 +808,8 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions, * to limit the set of objects returned. The WHERE clauses are appended * to the already-partially-constructed query in buf. Returns whether * any clause was added. + * The pattern matching uses the collation of the database through explicit + * COLLATE "default" clauses when needed (server version 12 and above). * * conn: connection query will be sent to (consulted for escaping rules). * buf: output parameter. @@ -971,16 +973,22 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern, appendPQExpBuffer(buf, "(%s OPERATOR(pg_catalog.~) ", namevar); appendStringLiteralConn(buf, namebuf.data, conn); +if (PQserverVersion(conn) >= 12) + appendPQExpBufferStr(buf, " COLLATE \"default\""); appendPQExpBuffer(buf, "\nOR %s OPERATOR(pg_catalog.~) ", altnamevar); appendStringLiteralConn(buf, namebuf.data, conn); +if (PQserverVersion(conn) >= 12) + appendPQExpBufferStr(buf, " COLLATE \"default\""); appendPQExpBufferStr(buf, ")\n"); } else { appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", namevar); appendStringLiteralConn(buf, namebuf.data, conn); +if (PQserverVersion(conn) >= 12) + appendPQExpBufferStr(buf, " COLLATE \"default\""); appendPQExpBufferChar(buf, '\n'); } } @@ -997,6 +1005,8 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern, WHEREAND(); appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", schemavar); appendStringLiteralConn(buf, schemabuf.data, conn); + if (PQserverVersion(conn) >= 12) +appendPQExpBufferStr(buf, " COLLATE \"default\""); appendPQExpBufferChar(buf, '\n'); } }
Re: Changes to pg_dump/psql following collation "C" in the catalog
Tom Lane wrote: > > PFA a new version adding the clause for only 12 and up, since the > > previous versions are not concerned, and as you mention, really old > > versions would fail otherwise. > > Pushed with some fiddling with the comments, and changing the collation > names to be schema-qualified for paranoia's sake. Thanks ! Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Cleanup/remove/update references to OID column
Justin Pryzby wrote: > Cleanup/remove/update references to OID column... > > ..in wake of 578b229718e8f. Just spotted a couple of other references that need updates: #1. In catalogs.sgml: attnum int2 The number of the column. Ordinary columns are numbered from 1 up. System columns, such as oid, have (arbitrary) negative numbers. oid should be replaced by xmin or some other system column. #2. In ddl.sgml, when describing ctid: The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows. "The OID" used to refer to an entry above in that list, now it's not clear what it refers to. "serial number" also sounds somewhat obsolete now that IDENTITY is supported. The last sentence could be changed to: "A primary key should be used to identify logical rows". Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: PostgreSQL pollutes the file system
Andreas Karlsson wrote: > The Debian packagers already use pg_createcluster for their script which > wraps initdb, and while pg_initdb is a bit misleading (it creates a > cluster rather than a database) it is not that bad. But that renaming wouldn't achieve anything in relation to the stated goal, since initdb is not in the $PATH in Debian/Ubuntu systems. It's part of the version-specific binaries located in /usr/lib/postgresql/$VERSION/bin, which are not meant to be called directly, but by the pg_*cluster* scripts that you mention, or pg_wrapper. Moreover, aside from package-specific issues, initdb can already be invoked through "pg_ctl initdb" since 2010 and version 9.0, as mentioned in: https://www.postgresql.org/docs/9.0/app-initdb.html This evolution was the result of discussions pretty much like the present thread. 9 years later, who bothers to use or recommend the new form? AFAICS nobody cares. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Cleanup/remove/update references to OID column
Andres Freund wrote: > Yes, I was planning to commit that soon-ish. There still seemed > review / newer versions happening, though, so I was thinking of waiting > for a bit longer. You might want to apply this trivial one in the same batch: index 452f307..7cfb67f 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -428,7 +428,7 @@ main(int argc, char **argv) InitDumpOptions(&dopt); - while ((c = getopt_long(argc, argv, "abBcCd:E:f:F:h:j:n:N:oOp:RsS:t:T:U:vwWxZ:", + while ((c = getopt_long(argc, argv, "abBcCd:E:f:F:h:j:n:N:Op:RsS:t:T:U:vwWxZ:", long_options, &optindex)) != -1) { switch (c) "o" in the options list is a leftover. Leaving it in getopt_long() has the effect that pg_dump -o fails (per the default case in the switch), but it's missing the expected error message (pg_dump: invalid option -- 'o') Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Trouble with FETCH_COUNT and combined queries in psql
Hi, When FETCH_COUNT is set, queries combined in a single request don't work as expected: \set FETCH_COUNT 10 select pg_sleep(2) \; select 1; No result is displayed, the pg_sleep(2) is not run, and no error is shown. That's disconcerting. The sequence that is sent under the hood is: #1 BEGIN #2 DECLARE _psql_cursor NO SCROLL CURSOR FOR select pg_sleep(2) ; select 1; #3 CLOSE _psql_cursor #4 ROLLBACK The root problem is in deciding that a statement can be run through a cursor if the query text starts with "select" or "values" (in is_select_command() in common.c), but not knowing about multiple queries in the buffer, which are not compatible with the cursor thing. When sending #2, psql expects the PQexec("DECLARE...") to yield a PGRES_COMMAND_OK, but it gets a PGRES_TUPLES_OK instead. Given that, it abandons the cursor, rollbacks the transaction (if it opened it), and clears out the results of the second select without displaying them. If there was already a transaction open, the problem is worse because it doesn't rollback and we're silently missing an SQL statement that was possibly meant to change the state of the data, as in BEGIN; SELECT compute_something() \; select get_results(); END; Does anyone have thoughts about how to fix this? ATM I don't see a plausible fix that does not involve the parser to store the information that it's a multiple-query command and pass it down somehow to is_select_command(). Or a more modern approach could be to give up on the cursor-based method in favor of PQsetSingleRowMode(). That might be too big a change for a bug fix though, Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Trouble with FETCH_COUNT and combined queries in psql
Fabien COELHO wrote: > I added some stuff to extract embedded "\;" for pgbench "\cset", which has > been removed though, but it is easy to add back a detection of "\;", and > also to detect select. If the position of the last select is known, the > cursor can be declared in the right place, which would also solve the > problem. Thanks, I'll extract the necessary bits from your patch. I don't plan to go as far as injecting a DECLARE CURSOR inside the query, but rather just forbid the use of the cursor in the combined-queries case. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Trouble with FETCH_COUNT and combined queries in psql
Tom Lane wrote: > Keep in mind that a large part of the reason why the \cset patch got > bounced was exactly that its detection of \; was impossibly ugly > and broken. Don't expect another patch using the same logic to > get looked on more favorably. Looking at the end of the discussion about \cset, it seems what you were against was not much how the detection was done rather than how and why it was used thereafter. In the case of the present bug, we just need to know whether there are any \; query separators in the command string. If yes, then SendQuery() doesn't get to use the cursor technique to avoid any risk with that command string, despite FETCH_COUNT>0. PFA a simple POC patch implementing this. > Having said that, I did like the idea of maybe going over to > PQsetSingleRowMode instead of using an explicit cursor. That > would represent a net decrease of cruftiness here, instead of > layering more cruft on top of what's already a mighty ugly hack. It would also work with queries that start with a CTE, and queries like (UPDATE/INSERT/DELETE.. RETURNING), that the current way with the cursor cannot handle. But that looks like a project for PG13, whereas a fix like the attached could be backpatched. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index bd28444..78641e0 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -1405,7 +1405,8 @@ SendQuery(const char *query) results = NULL; /* PQclear(NULL) does nothing */ } else if (pset.fetch_count <= 0 || pset.gexec_flag || - pset.crosstab_flag || !is_select_command(query)) + pset.crosstab_flag || pset.num_semicolons > 0 || + !is_select_command(query)) { /* Default fetch-it-all-and-print mode */ instr_time before, diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c index 3ae4470..e5af7a2 100644 --- a/src/bin/psql/mainloop.c +++ b/src/bin/psql/mainloop.c @@ -426,6 +426,7 @@ MainLoop(FILE *source) /* execute query unless we're in an inactive \if branch */ if (conditional_active(cond_stack)) { + pset.num_semicolons = psql_scan_get_escaped_semicolons(scan_state); success = SendQuery(query_buf->data); slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR; pset.stmt_lineno = 1; @@ -502,6 +503,7 @@ MainLoop(FILE *source) /* should not see this in inactive branch */ Assert(conditional_active(cond_stack)); + pset.num_semicolons = psql_scan_get_escaped_semicolons(scan_state); success = SendQuery(query_buf->data); /* transfer query to previous_buf by pointer-swapping */ diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h index 5be5091..2755927 100644 --- a/src/bin/psql/settings.h +++ b/src/bin/psql/settings.h @@ -110,6 +110,8 @@ typedef struct _psqlSettings char *inputfile; /* file being currently processed, if any */ uint64 lineno; /* also for error reporting */ uint64 stmt_lineno; /* line number inside the current statement */ + int num_semicolons; /* number of query separators (\;) found + inside the current statement */ bool timing; /* enable timing of all queries */ diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l index 850754e..ee32547 100644 --- a/src/fe_utils/psqlscan.l +++ b/src/fe_utils/psqlscan.l @@ -694,8 +694,15 @@ other . * substitution. We want these before {self}, also. */ -"\\"[;:] { - /* Force a semi-colon or colon into the query buffer */ +"\\"; { + /* Count semicolons in compound commands */ + cur_state->escaped_semicolons++; + /* Force a semicolon into the query buffer */ + psqlscan_emit(cur_state, yytext + 1, 1); +} + +"\\": { + /* Force a colon into the query buffer */ psqlscan_emit(cur_state, yytext + 1, 1); } @@ -1066,6 +1073,9 @@ psql_scan(PsqlScanState state, /* Set current output target */ state->output_buf = query_buf; + /* Reset number of escaped semicolons seen */ + state->escaped_semicolons = 0; + /* Set input source */ if (state->buffer_stack != NULL) yy_switch_to_buffer(state->buffer_stack->buf, state->scanner); @@ -1210,6 +1220,16 @@ psql_scan_reset(PsqlScanState state) } /* + * Return the number of escaped semicolons in the lexed string seen by the + * previous psql_scan call. + */ +int +psql_scan_get_escaped_semicolons(PsqlScanState state) +{ + return state->escaped_semicolons; +} + +/* * Reselect this lexer (psqlscan.l) after using another one. * * Currently and for foreseeable uses, it's sufficient to reset to INITIAL diff --git a/src/include/fe_utils/psqlscan.h b/src/include/fe_utils/psqlscan.h index c3a1d58..395ac78 100644 --- a/src/include/fe_utils/psqlscan.h +++ b/src/include/fe_utils/psqlscan.h @@ -83,6 +83,8 @@ extern PsqlScanResult psql_scan(PsqlScanState state, extern void psql_scan_reset(PsqlScanState state); +extern int
RE: psql - add SHOW_ALL_RESULTS option
Fabien COELHO wrote: > >> IMHO this new setting should be on by default: few people know about \; so > >> it would not change anything for most, and I do not see why those who use > >> it would not be interested by the results of all the queries they asked > >> for. > > I agree with your opinion. > > Ok. I did not yet change the default in the attached version, though. I'd go further and suggest that there shouldn't be a variable controlling this. All results that come in should be processed, period. It's not just about \; If the ability of CALL to produce multiple resultsets gets implemented (it was posted as a POC during v11 development), this will be needed too. > This attached version does: > - ensure that warnings appear just before its > - add the entry in psql's help > - redefine the function boundary so that timing is cleaner > - include somehow improved tests \errverbose seems to no longer work with the patch: test=> select 1/0; psql: ERROR: division by zero test=> \errverbose There is no previous error. as opposed to this output with PG11: test=> \errverbose ERROR: 22012: division by zero LOCATION: int4div, int.c:820 \errverbose has probably no regression tests because its output includes these ever-changing line numbers; hence `make check` cannot be used to find this regression. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: COPY FROM WHEN condition
Pavel Stehule wrote: > > SELECT x.a, sum(x.b) > > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b > > numeric, c text, d date, e json) ) > > WHERE x.d >= '2018-11-01' > > > > > Without some special feature this example is not extra useful. It is based > on copy on server that can use only super user with full rights. And if superuser, one might use the file data wrapper [1] to get the same results without the need for the explicit COPY in the query. BTW, this brings into question whether the [WHEN condition] clause should be included in the options of file_fdw, as it supports pretty much all COPY options. Also, psql's \copy should gain the option too? [1] https://www.postgresql.org/docs/current/static/file-fdw.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Michael Paquier wrote: > If you can produce a new version, please feel free to post it. Here's a rebased version with a couple regression tests added per the discussions during the previous CF. Now at https://commitfest.postgresql.org/20/1861/ Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 13a8b68..98147ef 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -672,6 +672,10 @@ COPY count CSV Format + +CSV +in COPY + This format option is used for importing and exporting the Comma diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index eb9d93a..7617c5e 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -152,6 +152,20 @@ EOF + --csv + + + + CSV + in psql + + Switches to CSV output mode. This is equivalent + to \pset format csv. + + + + + -d dbname --dbname=dbname @@ -2557,6 +2571,19 @@ lo_import 152801 + fieldsep_csv + + + Specifies the field separator to be used in the + CSV format. When the separator appears in a field + value, that field is output inside double quotes according to + CSV rules. To set a tab as field separator, type + \pset fieldsep_csv '\t'. The default is a comma. + + + + + fieldsep_zero @@ -2584,12 +2611,11 @@ lo_import 152801 format - Sets the output format to one of unaligned, - aligned, wrapped, - html, asciidoc, + Sets the output format to one of aligned, + asciidoc, csv, html, latex (uses tabular), - latex-longtable, or - troff-ms. + latex-longtable, troff-ms, + unaligned, or wrapped. Unique abbreviations are allowed. (That would mean one letter is enough.) @@ -2597,14 +2623,27 @@ lo_import 152801 unaligned format writes all columns of a row on one line, separated by the currently active field separator. This is useful for creating output that might be intended to be read - in by other programs (for example, tab-separated or comma-separated - format). + in by other programs. aligned format is the standard, human-readable, nicely formatted text output; this is the default. + csv format writes columns separated by + commas, applying the quoting rules described in RFC 4180. + Alternative separators can be selected with + \pset fieldsep_csv. + The output is compatible with the CSV format of the + COPY command. The header with column names + is output unless the tuples_only parameter is + on. Title and footers are not printed. + Each row is terminated by the system-dependent end-of-line character, + which is typically a single newline (\n) for + Unix-like systems or a carriage return and newline sequence + (\r\n) for Microsoft Windows. + + wrapped format is like aligned but wraps wide data values across lines to make the output fit in the target column width. The target width is determined as described under diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 5b4d54a..8d0ad71 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1941,8 +1941,8 @@ exec_command_pset(PsqlScanState scan_state, bool active_branch) int i; static const char *const my_list[] = { - "border", "columns", "expanded", "fieldsep", "fieldsep_zero", - "footer", "format", "linestyle", "null", + "border", "columns", "expanded", "fieldsep", "fieldsep_csv", + "fieldsep_zero", "footer", "format", "linestyle", "null", "numericlocale", "pager", "pager_min_lines", "recordsep", "recordsep_zero", "tableattr", "title", "tuples_only", @@ -3584,6 +3584,9 @@ _align2string(enum printFormat in) case PRINT_TROFF_MS: return "troff-ms"; break; + case PRINT_CSV: + return "csv"; + break; } return "unknown"; } @@ -3639,25 +3642,27 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) { if (!
Re: COPY FROM WHEN condition
David Fetter wrote: > It also seems like a violation of separation of concerns to couple > FEBE to grammar, so there'd need to be some way to do those things > separately, too. After re-reading psql/copy.c, I withdraw what I said upthread: it doesn't appear necessary to add anything to support the WHEN condition with \copy. \copy does have a dedicated mini-parser, but it doesn't try to recognize every option: it's only concerned with getting the bits of information that are needed to perform the client-side work: - whether it's a copy from or to - what exact form and value has the 'filename' argument immediately after from or to: '' | PROGRAM '' | stdin | stdout | pstdout | pstdout It doesn't really care what the options are, just where they are in the buffer, so they can be copied into the COPY SQL statement. From the code: * The documented syntax is: * \copy tablename [(columnlist)] from|to filename [options] * \copy ( query stmt ) to filename [options] The WHEN clause would be part of the [options], which are handled as simply as this in parse_slash_copy(): /* Collect the rest of the line (COPY options) */ token = strtokx(NULL, "", NULL, NULL, 0, false, false, pset.encoding); if (token) result->after_tofrom = pg_strdup(token); So unless there's something particular in the WHEN clause expression that could make this strtokx() invocation error out, this should work directly. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Michael Paquier wrote: > Ordering them in alphabetical order is a good idea due to the high > number of options available, and more would pile up even if this > separates a bit "aligned" and "unaligned", so I have have separated > those diffs from the core patch and committed it, leaving the core > portion of the patch aside for later. Here's a rebased version following these changes. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 13a8b68..98147ef 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -672,6 +672,10 @@ COPY count CSV Format + +CSV +in COPY + This format option is used for importing and exporting the Comma diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index a1ca940..2897486 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -152,6 +152,20 @@ EOF + --csv + + + + CSV + in psql + + Switches to CSV output mode. This is equivalent + to \pset format csv. + + + + + -d dbname --dbname=dbname @@ -2557,6 +2571,19 @@ lo_import 152801 + fieldsep_csv + + + Specifies the field separator to be used in the + CSV format. When the separator appears in a field + value, that field is output inside double quotes according to + CSV rules. To set a tab as field separator, type + \pset fieldsep_csv '\t'. The default is a comma. + + + + + fieldsep_zero @@ -2584,11 +2611,16 @@ lo_import 152801 format - Sets the output format to one of aligned, - asciidoc, html, + Sets the output format to one of + aligned, + asciidoc, + csv, + html, latex (uses tabular), - latex-longtable, troff-ms, - unaligned, or wrapped. + latex-longtable, + troff-ms, + unaligned, + or wrapped. Unique abbreviations are allowed. (That would mean one letter is enough.) @@ -2596,14 +2628,27 @@ lo_import 152801 unaligned format writes all columns of a row on one line, separated by the currently active field separator. This is useful for creating output that might be intended to be read - in by other programs (for example, tab-separated or comma-separated - format). + in by other programs. aligned format is the standard, human-readable, nicely formatted text output; this is the default. + csv format writes columns separated by + commas, applying the quoting rules described in RFC 4180. + Alternative separators can be selected with + \pset fieldsep_csv. + The output is compatible with the CSV format of the + COPY command. The header with column names + is output unless the tuples_only parameter is + on. Title and footers are not printed. + Each row is terminated by the system-dependent end-of-line character, + which is typically a single newline (\n) for + Unix-like systems or a carriage return and newline sequence + (\r\n) for Microsoft Windows. + + wrapped format is like aligned but wraps wide data values across lines to make the output fit in the target column width. The target width is determined as described under diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 0dea54d..ea064ab 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1941,8 +1941,8 @@ exec_command_pset(PsqlScanState scan_state, bool active_branch) int i; static const char *const my_list[] = { - "border", "columns", "expanded", "fieldsep", "fieldsep_zero", - "footer", "format", "linestyle", "null", + "border", "columns", "expanded", "fieldsep", "fieldsep_csv", + "fieldsep_zero", "footer", "format", "linestyle", "null", "numericlocale", "pager", "pager_min_lines", "recordsep", "recordsep_zero", "tableattr", "title", "tuples_only", @@ -3566,6 +3566,9 @@ _align2string(enum printFormat in) case PRINT_ASCIIDOC: return "asciidoc"; break; + case PRINT_CSV: + return "csv"; +
Doc patch on psql output formats
Hi, psql's documentation has this mention about output formats: "Unique abbreviations are allowed. (That would mean one letter is enough.)" but "one letter is enough" is not true since 9.3 that added "latex-longtable" sharing the same start as "latex", and then 9.5 added "asciidoc" with the same first letter as "aligned". When a non-unique abbreviation is used, psql uses the first match in an arbitrary order defined in do_pset() by a cascade of pg_strncasecmp(). (the recent commit add9182e reorders them alphabetically but it turns out that it doesn't change the relative positions of "aligned" / "asciidoc", or "latex" / "latex-longtables" so \pset format a and \pset format l will continue to select "aligned" and "latex" as before). Anyway, "Unique abbreviations are allowed" deserves to be changed as well. PFA a doc patch to say simply "Abbreviations are allowed". Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index a1ca940..7dd934f 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2589,8 +2589,7 @@ lo_import 152801 latex (uses tabular), latex-longtable, troff-ms, unaligned, or wrapped. - Unique abbreviations are allowed. (That would mean one letter - is enough.) + Abbreviations are allowed. unaligned format writes all columns of a row on one
Re: Doc patch on psql output formats
Tom Lane wrote: > > When a non-unique abbreviation is used, psql uses the first > > match in an arbitrary order defined in do_pset() by > > a cascade of pg_strncasecmp(). > > Ugh. Should we not fix the code so that it complains if there's > not a unique match? I would bet that the code was also written > on the assumption that any abbrevation must be unique. There's a backward compatibility issue, since a script issuing \pset format a would now fail instead of setting the format to "aligned". The doc patch is meant to describe the current behavior. OTOH if we don't fail with non-unique matches, there is the risk that in the future, a new format matching /^a/ will come alphabetically before "aligned", and will be picked up instead of "aligned". In both cases using abbreviations in scripts is not a great idea. Anyway I will look into the changes required in do_pset to implement the error on multiple matches, if that's the preferred behavior. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Michael Paquier wrote: > -C could also be useful for other things, say compression. The patch does not grab any short letter, as the consensus went that way. > + pset.popt.topt.fieldSepCsv = pg_strdup(","); > Let's store that in a variable instead of hardcoding it. I guess it could go into a #define in psql/settings.h, along with these: #define DEFAULT_FIELD_SEP "|" #define DEFAULT_RECORD_SEP "\n" > In the regression tests, "col 9" is wanted with a newline? Yes, since we support column names with embedded newlines (even though it's hard to think of a legit use case for that) and CSV fields support embedded newlines too. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Michael Paquier wrote: > +/* check for value being non-empty and with an MB length of 1 */ > +if (*value == '\0' || value[PQmblen(value, pset.encoding)] != '\0') > > It seems to me that this can just be replaced with that: > if (strlen(value) != 1) The above is meant to accept a multibyte character as the separator, in which case strlen(value) would be greater than 1. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Alternative to \copy in psql modelled after \g
Hi, Currently \copy cannot span multiple lines (like any meta-command) and cannot use psql variables whereas \g can do both. The POC patch attached implements two meta-commands \copyfrom and \copyto that are to COPY what \g is to any other query: - they take the COPY query already var-expanded from the query buffer, which must mention FROM STDIN or TO STDOUT. - they accept an argument declaring the local data source or destination, either a filename or a program (|command args) or empty for stdin/stdout. By contrast \copy has a specific parser to extract the data source or dest from its line of arguments, plus whether it's a COPY FROM or TO, and build a COPY query from that. Examples of use 1. $ psql -v filename="/path/data-$(date -I).csv" COPY (SELECT * FROM table WHERE ...) TO STDOUT (FORMAT csv) \copyto :filename 2. -- copy only the first 100 lines COPY table FROM stdin \copyfrom |head -n 100 /data/datafile.txt 3. $ cat script.sql COPY table1 FROM stdin; -- copy inline data data line data line \. -- copy data from psql's stdin COPY table2 FROM stdin \copyfrom # copy both in-script and out-of-script data $ psql -f script.sql < table2.data Comments? Does that look useful as an alternative to \copy ? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 0dea54d..7398dec 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -65,6 +65,8 @@ static backslashResult exec_command_cd(PsqlScanState scan_state, bool active_bra const char *cmd); static backslashResult exec_command_conninfo(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_copy(PsqlScanState scan_state, bool active_branch); +static backslashResult exec_command_copyfrom(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch); +static backslashResult exec_command_copyto(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_copyright(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_crosstabview(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_d(PsqlScanState scan_state, bool active_branch, @@ -305,6 +307,10 @@ exec_command(const char *cmd, status = exec_command_conninfo(scan_state, active_branch); else if (pg_strcasecmp(cmd, "copy") == 0) status = exec_command_copy(scan_state, active_branch); + else if (pg_strcasecmp(cmd, "copyfrom") == 0) + status = exec_command_copyfrom(query_buf, scan_state, active_branch); + else if (pg_strcasecmp(cmd, "copyto") == 0) + status = exec_command_copyto(query_buf, scan_state, active_branch); else if (strcmp(cmd, "copyright") == 0) status = exec_command_copyright(scan_state, active_branch); else if (strcmp(cmd, "crosstabview") == 0) @@ -634,6 +640,53 @@ exec_command_copy(PsqlScanState scan_state, bool active_branch) } /* + * \copyfrom -- run a COPY FROM command + * The COPY query is obtained from the query buffer + * The argument is 'filename' as in \copy + */ +static backslashResult +exec_command_copyfrom(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch) +{ + bool success = true; + + if (active_branch) + { + char *opt = psql_scan_slash_option(scan_state, + OT_FILEPIPE, NULL, false); + + success = do_copy_query(query_buf->data, opt, copy_from); + resetPQExpBuffer(query_buf); + free(opt); + } + else + ignore_slash_filepipe(scan_state); + + return success ? PSQL_CMD_SKIP_LINE : PSQL_CMD_ERROR; +} +/* + * \copyto -- run a COPY TO command + */ +static backslashResult +exec_command_copyto(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch) +{ + bool success = true; + + if (active_branch) + { + char *opt = psql_scan_slash_option(scan_state, + OT_FILEPIPE, NULL, false); + + success = do_copy_query(query_buf->data, opt, copy_to); + resetPQExpBuffer(query_buf); + free(opt); + } + else + ignore_slash_filepipe(scan_state); + + return success ? PSQL_CMD_SKIP_LINE : PSQL_CMD_ERROR; +} + +/* * \copyright -- print copyright notice */ static backslashResult diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c index 555c633..95c5c5e 100644 --- a/src/bin/psql/copy.c +++ b/src/bin/psql/copy.c @@ -260,28 +260,11 @@ error: } -/* - * Execute a \copy command (frontend copy). We have to open a file (or execute - * a command), then submit a COPY query to the backend and either feed it data - * from the file or route its response into the file. - */ -bool -do_copy(const char *args) +static +FILE* +open_copy_stream(struct copy_options *options) { - PQExpBufferData query; - FILE *copystream; - struct copy_options *options; - bool success; - - /* parse options */ - options = parse_slash_copy(args); - - if (!options) - return false; - - /* prepare to read or write the target file */ - if (options->file && !
Re: csv format for psql
Michael Paquier wrote: > Still what's the point except complicating the code? We don't care > about anything fancy in the backend-side ProcessCopyOptions() when > checking cstate->delim, and having some consistency looks like a good > thing to me. The backend has its reasons that don't apply to the psql output format, mostly import performance according to [1] It's not that nobody wants delimiter outside of US-ASCII, as people do ask for that sometimes: https://www.postgresql.org/message-id/f02ulk%242r3u%241%40news.hub.org https://github.com/greenplum-db/gpdb/issues/1246 > However there is no option to specify > an escape character, no option to specify a quote character, and it is > not possible to force quotes for all values. Those are huge advantages > as any output can be made compatible with other CSV variants. Isn't > what is presented too limited? The guidelines that the patch has been following are those of RFC 4180 [2] with two exceptions on the field separator that we can define and the end of lines that are OS-dependant instead of the fixed CRLF that IETF seems to see as the norm. The only reference to escaping in the RFC is: "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote" The problem with using non-standard QUOTE or ESCAPE is that it's a violation of the format that goes further than choosing a separator different than comma, which is already a pain point. We can always add these options later if there is demand. I suspect it will never happen. I looked at the 2004 archives when CSV was added to COPY, that's around commit 862b20b38 in case anyone cares to look, but I couldn't find a discussion on these options, all I could find is they were present from the start. But again COPY is concerned with importing the data that preexists, even if it's weird, whereas a psql output formats are not. [1] https://www.postgresql.org/message-id/4C9D2BC5.1080006%40optonline.net [2] https://tools.ietf.org/html/rfc4180 Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Alternative to \copy in psql modelled after \g
David G. Johnston wrote: > Do I understand correctly that you are proposing a slightly less > verbose alternative of: > > \o :filename > COPY TO STDOUT > \o Not strictly the same because of this: \o or \out [ filename ] “Query results” includes all tables, command responses, and notices obtained from the database server, as well as output of various backslash commands that query the database (such as \d); but not error messages. If for instance psql received a notification between the two \o it would end up in the file, which it wouldn't with \copyto. The same is true for SELECT... \g file as opposed to \o file Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Alternative to \copy in psql modelled after \g
David G. Johnston wrote: > POLA, for me at least, would be for \g [filename] to do exactly what > you are describing with the \copyto feature. I admit that if we could improve \g to handle COPY, it would be more elegant than the current proposal adding two meta-commands. But the copy-workflow and non-copy-workflow are different, and in order to know which one to start, \g would need to analyze the query to determine whether it's a COPY FROM, COPY TO or something else. psql parses queries syntactically, but not semantically AFAIK, and I suspect we don't want to start doing that, as it breaks a separation of concerns. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Michael Paquier wrote: > - The experience is confusing, as the psql format uses different options > than the backend to do the same things: > -- tuples_only instead of HEADER. > -- fieldsep_csv instead of DELIMITER > -- null is an equivalent of the one with the same name, which is > actually consistent. > -- encoding is also an equivalent of ENCODING. That conveniently ignores the fact that "\pset format somename" as a command doesn't take any additional option, contrary to COPY. We can't do \pset format csv (delimiter=';') If we choosed "delimiter" as an option name, it would have to exist within 20 other names in the \pset namespace and then it would be too vague, whereas "fieldsep_csv" makes it clear what it applies to. "tuples_only" is preexisting, and I don't see how the comment that it's not called "header" could be actionable. Overall, you seem to posit that we should mimic the entire COPY TO interface to implement 'psql --csv'. But the starting point is that 'psql --csv' is just a slightly different (safer) variant of 'psql -At', which is not concerned at all with being consistent with COPY. The model of the csv output format is the unaligned output format, it's just not COPY. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Doc patch on psql output formats
Tom Lane wrote: > > but "one letter is enough" is not true since 9.3 that added > > "latex-longtable" sharing the same start as "latex", and then > > 9.5 added "asciidoc" with the same first letter as "aligned". > > Yeah, that text has clearly outstayed its welcome. > > > When a non-unique abbreviation is used, psql uses the first > > match in an arbitrary order defined in do_pset() by > > a cascade of pg_strncasecmp(). > > Ugh. Should we not fix the code so that it complains if there's > not a unique match? I would bet that the code was also written > on the assumption that any abbrevation must be unique. Here's a patch making "\pset format" reject ambiguous abbreviations. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index a1ca940..6e6d0f4 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2589,8 +2589,7 @@ lo_import 152801 latex (uses tabular), latex-longtable, troff-ms, unaligned, or wrapped. - Unique abbreviations are allowed. (That would mean one letter - is enough.) + Unique abbreviations are allowed. unaligned format writes all columns of a row on one diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 0dea54d..1f23aaf 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3637,28 +3637,52 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) /* set format */ if (strcmp(param, "format") == 0) { + static const struct fmt + { + const char *name; + enum printFormat number; + } formats[] = + { + {"aligned", PRINT_ALIGNED}, + {"asciidoc", PRINT_ASCIIDOC}, + {"html", PRINT_HTML}, + {"latex", PRINT_LATEX}, + {"latex-longtable", PRINT_LATEX_LONGTABLE}, + {"troff-ms", PRINT_TROFF_MS}, + {"unaligned", PRINT_UNALIGNED}, + {"wrapped", PRINT_WRAPPED} + }; + if (!value) ; - else if (pg_strncasecmp("aligned", value, vallen) == 0) - popt->topt.format = PRINT_ALIGNED; - else if (pg_strncasecmp("asciidoc", value, vallen) == 0) - popt->topt.format = PRINT_ASCIIDOC; - else if (pg_strncasecmp("html", value, vallen) == 0) - popt->topt.format = PRINT_HTML; - else if (pg_strncasecmp("latex", value, vallen) == 0) - popt->topt.format = PRINT_LATEX; - else if (pg_strncasecmp("latex-longtable", value, vallen) == 0) - popt->topt.format = PRINT_LATEX_LONGTABLE; - else if (pg_strncasecmp("troff-ms", value, vallen) == 0) - popt->topt.format = PRINT_TROFF_MS; - else if (pg_strncasecmp("unaligned", value, vallen) == 0) - popt->topt.format = PRINT_UNALIGNED; - else if (pg_strncasecmp("wrapped", value, vallen) == 0) - popt->topt.format = PRINT_WRAPPED; else { - psql_error("\\pset: allowed formats are aligned, asciidoc, html, latex, latex-longtable, troff-ms, unaligned, wrapped\n"); - return false; + int match_count = 0; + int first_match = 0; + + /* +* Count the number of left-anchored matches. Exactly one match +* must be found, otherwise error out. +*/ + for (int i = 0; i < lengthof(formats); i++) + { + if (pg_strncasecmp(formats[i].name, value, vallen) == 0) + { + if (++match_count > 1) + { + psql_error("\\pset: ambiguous abbreviation: \"%s\"\n", value); + return false; + } + first_match = i; + } + } + if (match_count == 0) + { + psql_error("\\pset: allowed formats are aligned, asciidoc, html, latex, latex-longtable, troff-ms, unaligned, wrapped\n"); + return false; + } +
Re: Doc patch on psql output formats
Tom Lane wrote: > Pushed. (I simplified the code a bit by using just one state variable, > and also made the error message more verbose.) Thanks! Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: csv format for psql
Tom Lane wrote: > And, in fact, right now *none* of psql's table output formats is both > unambiguous and reasonably simple/popular to use. So the astonishing > thing about this patch, IMO, is that we didn't do it a decade ago. Yeah, that's what motivated this submission in the first place. > I feel that if we allow multi-byte characters here, we might as well > take the training wheels off and just say you can use any separator > string you want, as long as it doesn't contain double quote, \r, or \n. The reason behind disallowing multiple characters was the likeliness of them being mistakes. For instance, this example came up at some point in the discussion: \pset fieldsep_csv ,, To me the probability that a user has fat-fingered this is pretty high, and this would silently produce a really bogus file. Another kind of mistake comes from the difficulty of properly quoting on the command line: psql -- csv -P fieldsep_csv='\t' would be interpreted as a two-character separator despite being obviously not the user's intention. About disallowing characters beyond US-ASCII, I can't find a similar justification. COPY does not allow them, but it's justified (in the archives) by the fear of being slower when importing, which is not a concern here. > We could avoid this self-inflicted confusion by choosing a different > parameter name. I'd be good with "csv_fieldsep" or "csvfieldsep". +1 > Or we could kill both issues by hard-wiring the separator as ','. Ideally people would understand that they can use -A for any delimiter but no quoting, or --csv with strict quoting and in that case a fixed delimiter is fine, since it's going to be safely quoted, its presence in the data is not a problem. But I'm not too confident that everyone would understand it that way, even if it's well explained in the doc. When one is told "please produce CSV files with semi-colons as separators", it's simpler to just produce that rather than arguing that these requirements are probably ill-advised. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Doc patch on psql output formats
Tom Lane wrote: > As of HEAD, it's impossible to select latex output format > at all: > > regression=# \pset format latex > \pset: ambiguous abbreviation "latex" matches both "latex" and > "latex-longtable" Oops! > We could fix that by adding a special case to accept an exact match > immediately. Personally I would favor that one, but to me the problem is that allowing abbreviations doesn't really work well in the long run, that is, if new format names are going to appear recurringly in the future. In interactive mode, tab-completion seems good enough (maybe \pset was not tab-completed in the past, when abbreviations were initially implemented?). In non-interactive mode, anything abbreviated may clash in the future with another format, so there's a forward-compatibility hazzard that should motivate to avoid them as well in scripts. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite