Re: ICU for global collation

2022-03-15 Thread Daniel Verite
Julien Rouhaud wrote:

> > > While on that topic, the doc should probably mention that default ICU
> > > collations can only be deterministic.
> >
> > Well, there is no option to do otherwise, so I'm not sure where/how to
> > mention that.  We usually don't document options that don't exist. ;-)
> 
> Sure, but I'm afraid that users may still be tempted to use ICU locales like
> und-u-ks-level2 from the case_insensitive example in the doc and hope that
> it will work accordingly.

+1.

The CREATE DATABASE doc says this currently:

icu_locale

Specifies the ICU locale ID if the ICU locale provider is used.


ISTM that we need to say explicitly that this locale will be used by
default to compare all collatable strings, except that it's overruled
by a bytewise comparison to break ties in case of equality.

The idea is to describe what the backend will do with the setting
rather than saying that we don't have a nondeterministic option.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: ICU for global collation

2022-03-15 Thread Daniel Verite
Finnerty, Jim wrote:

> In ICU, the "locale" is just the first part of what we can pass to the
> "locale" parameter in CREATE COLLATION - the part before the optional '@'
> delimiter.  The ICU locale does not include the secondary or tertiary
> properties,

Why not? Please see
https://unicode-org.github.io/icu/userguide/locale

It says that "a locale consists of one or more pieces of ordered
information", the pieces being a language code, a script code, a
country code, a variant code, and keywords.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: popcount

2020-12-30 Thread Daniel Verite
David Fetter wrote:

+Datum
+byteapopcount(PG_FUNCTION_ARGS)
+{
+   bytea   *t1 = PG_GETARG_BYTEA_PP(0);
+   int len, result;
+
+   len = VARSIZE_ANY_EXHDR(t1);
+   result = pg_popcount(VARDATA_ANY(t1), len);
+
+   PG_RETURN_INT32(result);
+}

The input may have more than 2 billion bits set to 1. The biggest possible
result should be 8 billion for bytea (1 GB with all bits set to 1).
So shouldn't this function return an int8?

There is a pre-existing function in core: bit_length(bytea) returning int4,
but it errors out for large values (in fact it's a SQL function that returns
octet_length($1)*8).

For the varbit case, it doesn't seem like it can hold so many bits, although
I don't see the limit documented in
https://www.postgresql.org/docs/current/datatype-bit.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: PATCH: Batch/pipelining support for libpq

2020-11-14 Thread Daniel Verite
Alvaro Herrera wrote:

> I adapted the test code to our code style.  I also removed the "timings"
> stuff; I think that's something better left to pgbench.
> 
> (I haven't looked at Daniel's pgbench stuff yet, but I will do that
> next.)

The patch I posted in [1] was pretty simple, but at the time, query
results were always discarded. Now that pgbench can instantiate
variables from query results, a script can do:
  select 1 as var \gset
  select :var;
This kind of sequence wouldn't work in batch mode since it
sends queries before getting results of previous queries.

So maybe \gset should be rejected when inside a batch section.

Or alternatively pgbench should collect results before a variable is
reinjected into a query, thereby stalling the pipeline.
To do this only when necessary, it would have to track read-write
dependencies among variables, which seems overly complicated though.

[1]
https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da2...@manitou-mail.org

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: PATCH: Batch/pipelining support for libpq

2020-11-23 Thread Daniel Verite
 Hi,

Here's a new version with the pgbench support included.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 9ce32fb39b..2a94f8f6b9 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3061,6 +3061,30 @@ ExecStatusType PQresultStatus(const PGresult *res);

   
  
+
+ 
+  PGRES_BATCH_END
+  
+   
+The PGresult represents the end of a 
batch.
+This status occurs only when batch mode has been selected.
+   
+  
+ 
+
+ 
+  PGRES_BATCH_ABORTED
+  
+   
+The PGresult represents a batch that's
+received an error from the server.  
PQgetResult
+must be called repeatedly, and it will return this status code,
+until the end of the current batch, at which point it will return
+PGRES_BATCH_END and normal processing can 
resume.
+   
+  
+ 
+
 
 
 If the result status is PGRES_TUPLES_OK or
@@ -4819,6 +4843,482 @@ int PQflush(PGconn *conn);
 
  
 
+ 
+  Batch Mode
+
+  
+   libpq
+   batch mode
+  
+
+  
+   pipelining
+   in libpq
+  
+
+  
+   libpq batch mode allows applications to
+   send a query without having to read the result of the previously
+   sent query.  Taking advantage of the batch mode, a client will wait
+   less for the server, since multiple queries/results can be sent/
+   received in a single network transaction.
+  
+
+  
+   While batch mode provides a significant performance boost, writing
+   clients using the batch mode is more complex because it involves
+   managing a queue of pending queries and finding which result
+   corresponds to which query in the queue.
+  
+
+  
+   Using Batch Mode
+
+   
+To issue batches the application must switch a connection into batch mode.
+Enter batch mode with 
+or test whether batch mode is active with
+.
+In batch mode, only asynchronous 
operations
+are permitted, and COPY is not recommended as it
+may trigger failure in batch processing.  Using any synchronous
+command execution functions such as PQfn,
+PQexec or one of its sibling functions are error
+conditions.
+   
+
+   
+
+ It is best to use batch mode with libpq in
+ non-blocking mode. If used
+ in blocking mode it is possible for a client/server deadlock to occur.
+  
+   
+The client will block trying to send queries to the server, but the
+server will block trying to send results to the client from queries
+it has already processed. This only occurs when the client sends
+enough queries to fill its output buffer and the server's receive
+buffer before switching to processing input from the server,
+but it's hard to predict exactly when that will happen.
+   
+  
+
+
+ Batch mode consumes more memory when send/receive is not done as required,
+ even in non-blocking mode.
+
+   
+
+   
+Issuing Queries
+
+
+ After entering batch mode the application dispatches requests using
+ normal asynchronous libpq functions such as
+ PQsendQueryParams, 
PQsendPrepare,
+ PQsendQueryPrepared, 
PQsendDescribePortal,
+ PQsendDescribePrepared.
+ The asynchronous requests are followed by a
+ 
+ call to mark the end of the batch. The client needs not
+ call PQgetResult immediately after
+ dispatching each operation.
+ Result processing
+ is handled separately.
+
+
+
+ The server executes statements, and returns results, in the order the
+ client sends them.  The server may begin executing the batch before all
+ commands in the batch are queued and the end of batch command is sent.
+ If any statement encounters an error the server aborts the current
+ transaction and skips processing the rest of the batch.
+ Query processing resumes after the end of the failed batch.
+
+
+
+ It's fine for one operation to depend on the results of a
+ prior one. One query may define a table that the next query in the same
+ batch uses; similarly, an application may create a named prepared 
statement
+ then execute it with later statements in the same batch.
+
+   
+
+   
+Processing Results
+
+
+ To process the result of one batch query, the application calls
+ PQgetResult repeatedly and handles each result
+ until PQgetResult returns null.
+ The result from the next batch query may then be retrieved using
+ PQgetResult again and the cycle repeated.
+ The application handles individual statement results as normal.
+ When the results of all the queries in the batch have been
+ returned, PQgetResult returns a result
+ containing the status value PGRES_BA

Re: popcount

2021-01-18 Thread Daniel Verite
Peter Eisentraut wrote:

> +   /*
> +* ceil(VARBITLEN(ARG1)/BITS_PER_BYTE)
> +* done to minimize branches and instructions.
> +*/
> 
> I don't know what that is supposed to mean or why that kind of tuning 
> would be necessary for a user-callable function.

Also, the formula just below looks wrong in the current patch  (v3):

+ /*
+  * ceil(VARBITLEN(ARG1)/BITS_PER_BYTE)
+  * done to minimize branches and instructions.
+  */
+ len = (VARBITLEN(arg1) + BITS_PER_BYTE + 1) / BITS_PER_BYTE;
+ p = VARBITS(arg1);
+
+ popcount = pg_popcount((const char *)p, len);

It should be 
 len = (VARBITLEN(arg1) + BITS_PER_BYTE - 1) / BITS_PER_BYTE

If we add 1 to BITS_PER_BYTE instead of subtracting 1, when
VARBITLEN(arg1) is a multiple of BITS_PER_BYTE, "len" is one byte too
large.

The correct formula is already used in include/utils/varbit.h near the
definition of VARBITLEN:

#define VARBITTOTALLEN(BITLEN)  (((BITLEN) + BITS_PER_BYTE-1)/BITS_PER_BYTE +
\
 VARHDRSZ +
VARBITHDRSZ)


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Are datcollate/datctype always libc even under --with-icu ?

2021-12-23 Thread Daniel Verite
Chapman Flack wrote:

> Next question: the "currently" in that comment suggests that could change,
> but is there any present intention to change it, or is this likely to just
> be the way it is for the foreseeable future?

Some related patches and discussions:

* ICU as default collation provider
https://commitfest.postgresql.org/21/1543/

* ICU for global collation
https://commitfest.postgresql.org/25/2256/


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: ICU for global collation

2022-01-07 Thread Daniel Verite
Julien Rouhaud wrote:

> If you want a database with an ICU default collation the lc_collate
> and lc_ctype should inherit what's in the template database and not
> what was provided in the LOCALE I think.  You could still probably
> overload them in some scenario, but without a list of what isn't
> ICU-aware I can't really be sure of how often one might have to do
> it.

I guess we'd need that when creating a database with a different
encoding than the template databases, at least.

About what's not ICU-aware, I believe the most significant part in
core is the Full Text Search parser.
It doesn't care about sorting strings, but it relies on the functions
from POSIX , which depend on LC_CTYPE
(it looks however that this could be improved by following
what has been done in backend/regex/regc_pg_locale.c, which has
comparable needs and calls ICU functions when applicable).

Also, any extension is potentially concerned. Surely many
extensions call functions from ctype.h assuming that
the current value of LC_CTYPE works with the data they handle.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: ICU for global collation

2022-01-10 Thread Daniel Verite
Peter Eisentraut wrote:

> Unlike in the previous patch, where the ICU 
> collation name was written in datcollate, there is now a third column 
> (daticucoll), so we can store all three values.

I think some users would want their db-wide ICU collation to be
case/accent-insensitive. Postgres users are trained to expect
case-sensitive comparisons, but some apps initially made for
e.g. MySQL or MS-SQL that use such collations natively would be easier
to port to Postgres.
IIRC, that was the context for some questions where people were
enquiring about db-wide ICU collations.

With the current patch, it's not possible, AFAICS, because the user
can't tell that the collation is non-deterministic. Presumably this
would require another option to CREATE DATABASE and another
column to store that bit of information.

The "daticucol" column also suggests that we don't expect to add
other collation providers in the future. Maybe a pair of columns like
(datcollprovider, datcolllocale) would be more future-proof,
or a (datcollprovider, datcolllocale, datcollisdeterministic)
triplet if non-deterministic collations are allowed.

Also, pg_collation has "collversion" to detect a mismatch between
the ICU runtime and existing indexes. I don't see that field
for the db-wide ICU collation, so maybe we currently miss the capability
to detect the mismatch for the db-wide collation?

The lack of these fields overall suggest the idea that when CREATE
DATABASE is called with a global ICU collation, what if it somehow
inserted the collation into pg_collation in the new database?
Then pg_database would just store the collation oid and no other
collation-related field would need to be added into it, now
or in the future.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: ICU for global collation

2022-01-10 Thread Daniel Verite
Julien Rouhaud wrote:

> > The lack of these fields overall suggest the idea that when CREATE
> > DATABASE is called with a global ICU collation, what if it somehow
> > inserted the collation into pg_collation in the new database?
> > Then pg_database would just store the collation oid and no other
> > collation-related field would need to be added into it, now
> > or in the future.
> 
> I don't think it would be doable given the single-database-per-backend
> restriction.

By that I understand that CREATE DATABASE is limited to copying a template
database and then not write anything into it beyond that, as it's
not even connected to it.
I guess there's still the possibility of requiring that the ICU db-wide
collation of the new database does exist in the template database,
and then the CREATE DATABASE would refer to that collation instead of
an independent locale string.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: ICU for global collation

2022-01-11 Thread Daniel Verite
Julien Rouhaud wrote:

> > I guess there's still the possibility of requiring that the ICU db-wide
> > collation of the new database does exist in the template database,
> > and then the CREATE DATABASE would refer to that collation instead of
> > an independent locale string.
> 
> That could work.  However if having the collation in the template database a
> strict requirement the something should also be done for initdb, and it will
> probably be a bigger problem.

If CREATE DATABASE referred to a collation in the template db,
either that collation already exists, or the user would have to add it
to the template db with CREATE COLLATION.
initdb already populates the template databases with a full set of
ICU collations through pg_import_system_collations().
I don't quite see what change you're seeing that would be needed in
initdb.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: pgsql: Add libpq pipeline mode support to pgbench

2021-03-17 Thread Daniel Verite
Fabien COELHO wrote:

> For consistency with the existing \if … \endif, ISTM that it could have 
> been named \batch … \endbatch or \pipeline … \endpipeline?

"start" mirrors "end". To me, the analogy with \if-\endif is not
obvious.
Grammatically \if is meant to introduce the expression after it,
whereas \startpipeline takes no argument.
Functionally \startpipeline can be thought as "open the valve"
and \endpipeline "close the valve". They're "call-to-action" kinds of
commands, and in that sense quite different from the \if-\endif pair.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: insensitive collations

2021-03-25 Thread Daniel Verite
Jim Finnerty wrote:

> For a UTF8 encoded, case-insensitive (nondeterministic), accent-sensitive
> ICU
> collation, a LIKE predicate can be used with a small transformation of the
> predicate, and the pattern can contain multi-byte characters:
> 
> from:
> 
> SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';  
> -- ERROR:  nondeterministic collations are not supported for LIKE
> 
> to:
> 
> SELECT * FROM locations WHERE lower(location) COLLATE "C" LIKE
> lower('midi-Pyrené%');

For prefix matching, there's a simpler way with non-deterministic
collations based on the advice in [1]

The trick is that if an ICU collation is assigned to "location",
whether it's deterministic or not,

SELECT * FROM locations WHERE location LIKE 'midi-Pyrené%';

is equivalent to:

SELECT * FROM locations WHERE location BETWEEN
   'midi-Pyrené' AND 'midi-Pyrené' || E'\u';

and that will use a btree index if available.

Also, it works with all features of ND-collations and all encodings, not
just case-insensitiveness and UTF-8.

Now that doesn't solve LIKE '%midi-Pyrené%', or LIKE '%midi_Pyrené%',
but that trick could be a building block for an algorithm implementing
LIKE with ND-collations in the future.

[1]
https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: insensitive collations

2021-03-25 Thread Daniel Verite
Jim Finnerty wrote:

> Currently nondeterministic collations are disabled at the database level. 

Deterministic ICU collations are also disabled.

> The cited reason was because of the lack of LIKE support and because certain
> catalog views use LIKE. 

But the catalogs shouldn't use the default collation of the database.

commit 586b98fdf1aaef4a27744f8b988479aad4bd9a01 provides
some details about this:

Author: Tom Lane 
Date:   Wed Dec 19 17:35:12 2018 -0500

Make type "name" collation-aware.

The "name" comparison operators now all support collations, making them
functionally equivalent to "text" comparisons, except for the different
physical representation of the datatype.  They do, in fact, mostly share
the varstr_cmp and varstr_sortsupport infrastructure, which has been
slightly enlarged to handle the case.

To avoid changes in the default behavior of the datatype, set name's
typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that
by default comparisons to a name value will continue to use strcmp
semantics.  (This would have been the case for system catalog columns
anyway, because of commit 6b0faf723, but doing this makes it true for
user-created name columns as well.  In particular, this avoids
locale-dependent changes in our regression test results.)

In consequence, tweak a couple of places that made assumptions about
collatable base types always having typcollation DEFAULT_COLLATION_OID.
I have not, however, attempted to relax the restriction that user-
defined collatable types must have that.  Hence, "name" doesn't
behave quite like a user-defined type; it acts more like a domain
with COLLATE "C".  (Conceivably, if we ever get rid of the need for
catalog name columns to be fixed-length, "name" could actually become
such a domain over text.  But that'd be a pretty massive undertaking,
and I'm not volunteering.)

Discussion: https://postgr.es/m/15938.1544377...@sss.pgh.pa.us


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Calendar support in localization

2021-03-26 Thread Daniel Verite
Thomas Munro wrote:

> Right, so if this is done by trying to extend Daniel Verite's icu_ext
> extension (link given earlier) and Robert's idea of a fast-castable
> type, I suppose you might want now()::icu_date + '1 month'::internal
> to advance you by one Ethiopic month if you have done SET
> icu_ext.ICU_LC_TIME = 'am_ET@calendar=traditional'.

I've pushed a calendar branch on icu_ext [1] with preliminary support
for non-gregorian calendars through ICU, so far with only format and parse
of timetamptz.
The ICU locale drives both the localization of field names (language) and the
choice of calendar.

It looks like this:

\set fmt 'dd//  HH:mm:ss.SSS zz'

 WITH list(cal) AS ( values 
('gregorian'),
('japanese'),
('buddhist'),
('roc'),
('persian'),
('islamic-civil'),
('islamic'),
('hebrew'),
('chinese'),
('indian'),
('coptic'),
('ethiopic'),
('ethiopic-amete-alem'),
('iso8601'),
('dangi')
),
fmt AS (
 select
  cal,
  icu_format_date(now(),  :'fmt', 'fr@calendar='||cal) as now_str,
  icu_format_date(now()+'1 month'::interval,  :'fmt', 'fr@calendar='||cal) as
plus_1m
  from list
)
SELECT
 cal,
 now_str,
 icu_parse_date(now_str, :'fmt', 'fr@calendar='||cal) as now_parsed,
 plus_1m,
 icu_parse_date(plus_1m, :'fmt', 'fr@calendar='||cal) as plus_1m_parsed
FROM fmt;


-[ RECORD 1 ]--+---
cal| gregorian
now_str| 26/mars/2021 après Jésus-Christ 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/2021 après Jésus-Christ 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 2 ]--+---
cal| japanese
now_str| 26/mars/0033 Heisei 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/0033 Heisei 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 3 ]--+---
cal| buddhist
now_str| 26/mars/2564 ère bouddhique 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/2564 ère bouddhique 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 4 ]--+---
cal| roc
now_str| 26/mars/0110 RdC 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 26/avril/0110 RdC 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 5 ]--+---
cal| persian
now_str| 06/farvardin/1400 Anno Persico 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 06/ordibehešt/1400 Anno Persico 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 6 ]--+---
cal| islamic-civil
now_str| 12/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 7 ]--+---
cal| islamic
now_str| 13/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 8 ]--+---
cal| hebrew
now_str| 13/nissan/5781 Anno Mundi 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 14/iyar/5781 Anno Mundi 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 9 ]--+---
cal| chinese
now_str| 14/èryuè/0038 78 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 15/sānyuè/0038 78 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 10 ]-+---
cal| indian
now_str| 05/chaitra/1943 ère Saka 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 06/vaishākh/1943 ère Saka 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 11 ]-+---
cal| coptic
now_str| 17/barmahât/1737 après Dioclétien 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m| 18/barmoudah/1737 après Dioclétien 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 12 ]-+---
cal| ethiopic
now_str| 17/mägabit/2013 après l’Incarnation 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01

Re: Calendar support in localization

2021-03-29 Thread Daniel Verite
Matthias van de Meent wrote:

> The results for the Japanese locale should be "0003 Reiwa" instead of
> "0033 Heisei", as the era changed in 2019. ICU releases have since
> implemented this and other corrections; this specific change was
> implemented in the batched release of ICU versions on 2019-04-12.

Right. I've run this test on an Ubuntu 18.04 desktop which comes with
libicu60 . The current version for my system is 60.2-3ubuntu3.1.
Ubuntu maintainers did not pick up the change of the new Japanese era.
As a guess, it's because it's not a security fix.
This contrasts with the baseline maintainers, who did an
exceptional effort to backpatch this down to ICU 53
(exceptional in the sense that they don't do that for bugfixes).

>> For instance with the ethiopic calendar, the query above displays today as
>> 17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
>> while the correct result is probably 17/miyazya/2013 (?)

> Seeing the results for Japanese locale, you might want to update your
> ICU library, which could fix this potential inconsistency.

I agree it's always best to have the latest ICU version, but in the
context of Postgres, we have to work with the versions that are
typically installed on users systems. People who have pre-2019
versions will simply be stuck with the previous Japanese era.

Anyway, for the specific problem that the interval datatype cannot be
used seamlessly across all calendars, it's essentially about how days
are mapped into calendars, and it's unrelated to ICU updates AFAIU.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Calendar support in localization

2021-03-30 Thread Daniel Verite
Surafel Temesgen wrote:

> > About intervals, if there were locale-aware functions like
> >  add_interval(timestamptz, interval [, locale]) returns timestamptz
> > or
> >  sub_timestamp(timestamptz, timestamptz [,locale]) returns interval
> > that would use ICU to compute the results according to the locale,
> > wouldn't it be good enough?
> >
> >
> Yes it can be enough for now but there are patches proposed to support the
> system and application time period which are in SQL standard 

To clarify, these function signatures are not meant to oppose
a core vs extension implementation, nor an ICU vs non-ICU
implementation. They're meant to illustrate the case of using
specific functions instead of adding specific data types.
AFAIU, adding data types come from the idea that since
(non-gregorian-date + interval) doesn't have the same result as
(gregorian-date + interval), we could use a different type for
non-gregorian-date and so a different "+" operator, maybe
even a specific interval type.

For the case of temporal tables, I'm not quite familiar with the
feature, but I notice that the patch says:

+When system versioning is specified two columns are added which
+record the start timestamp and end timestamp of each row verson.
+The data type of these columns will be TIMESTAMP WITH TIME ZONE.

The user doesn't get to choose the data type, so if we'd require to
use specific data types for non-gregorian calendars, that would
seemingly complicate things for this feature. This is consistent
with the remark upthread that the SQL standard assumes the
gregorian calendar.


> what it takes to support calendar locally is input/output function
> and a converter from and to julian calendar and that may not be that
> much hard since most of the world calendar is based on julian or
> gregorian calendar[0]

The conversions from julian dates are not necessarily hard, but the
I/O functions means having localized names for all days, months, eras
of all calendars in all supported languages. If you're thinking of
implementing this from scratch (without the ICU dependency), where
would these names come from? OTOH if we're using ICU, then why
bother reinventing the julian-to-calendars conversions that ICU
already does?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: insensitive collations

2021-04-03 Thread Daniel Verite
Jim Finnerty wrote:

> SET client_encoding = WIN1252;
> [...]
> postgres=# SELECT * FROM locations WHERE location LIKE 'Franche-Comt__';  --
> the wildcard is applied byte by byte instead of character by character, so
> the 2-byte accented character is matched only by 2 '_'s
>location
> 
> Franche-Comté
> (1 row)

The most plausible explanation is that the client-side text is encoded
in UTF-8, rather than WIN1252 as declared.

If you added  length('Franche-Comté') to the above query, I suspect
it would tell that the string is one character longer than
expected, and octet_length('Franche-Comté') would be
two-byte longer than expected.

Also dumping the contents of the "location" column with
convert_to() would show that the accents have been
wrongly translated, if the explanation of the encoding snafu is
correct.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Add header support to text format and matching feature

2020-07-25 Thread Daniel Verite
Rémi Lapeyre wrote:

> Here’s a new version that fix all the issues.

Here's a review of v4. 

The patch improves COPY in two ways:

- COPY TO and COPY FROM now accept "HEADER ON" for the TEXT format
(previously it was only for CSV)

- COPY FROM also accepts "HEADER match" to tell that there's a header
and that its contents must match the columns of the destination table.
This works for both the CSV and TEXT formats. The syntax for the
columns is the same as for the data and the match is case-sensitive.

The first improvement when submitted alone (in 2018 by Simon Muller)
has been judged not useful enough or even hazardous without any
"match" feature. It was returned with feedback in 2018-10 and
resubmitted by Rémi in 2020-02 with the match feature.

The patches  apply cleanly, "make check" and "make check-world" pass.

In my tests it works fine except for one crash that I can reproduce
on a fresh build and default configuration with:

$ cat >file.txt
i
1

$ psql
postgres=# create table x(i int);
CREATE TABLE
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
PANIC:  ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I suspect the reason is the way that PG_TRY/PG_CATCH is used, see below.

Code comments:


+/*
+ * Represents whether the header must be absent, present or present and
match.
+ */
+typedef enum CopyHeader
+{
+   COPY_HEADER_ABSENT,
+   COPY_HEADER_PRESENT,
+   COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of
COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
boolbinary; /* binary format? */
boolfreeze; /* freeze rows on loading? */
boolcsv_mode;   /* Comma Separated Value
format? */
-   boolheader_line;/* CSV or text header line? */
+   CopyHeader  header_line;/* CSV or text header line? */


After the redefinition into this enum type, there are still a
bunch of references to header_line that treat it like a boolean:

1190:   if (cstate->header_line)
1398:   if (cstate->binary && cstate->header_line)
2119:   if (cstate->header_line)
3635:   if (cstate->cur_lineno == 0 && cstate->header_line)

It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
but maybe it's not good style to count on that.



+   PG_TRY();
+   {
+   if (defGetBoolean(defel))
+   cstate->header_line =
COPY_HEADER_PRESENT;
+   else
+   cstate->header_line =
COPY_HEADER_ABSENT;
+   }
+   PG_CATCH();
+   {
+   char*sval = defGetString(defel);
+
+   if (!cstate->is_copy_from)
+   PG_RE_THROW();
+
+   if (pg_strcasecmp(sval, "match") == 0)
+   cstate->header_line =
COPY_HEADER_MATCH;
+   else
+   ereport(ERROR,
+  
(errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("header requires a
boolean or \"match\"")));
+   }
+   PG_END_TRY();

It seems wrong to use a PG_CATCH block for this. I understand that
it's because defGetBoolean() calls ereport() on non-booleans, but then
it should be split into an error-throwing function and a
non-error-throwing lexical analysis of the boolean, the above code
calling the latter.
Besides the comments in elog.h above PG_TRY say that
 "the error recovery code
  can either do PG_RE_THROW to propagate the error outwards, or do a
  (sub)transaction abort. Failure to do so may leave the system in an
  inconsistent state for further processing."
Maybe this is what happens with the repeated uses of "match"
eventually failing with ERRORDATA_STACK_SIZE exceeded.


-HEADER [ boolean ]
+HEADER { match | true |
false }

This should be enclosed in square brackets because HEADER
with no argument is still accepted.




+  names from the table. On input, the first line is discarded when set
+  to 

EDB builds Postgres 13 with an obsolete ICU version

2020-08-03 Thread Daniel Verite
 Hi,

As a follow-up to bug #16570 [1] and other previous discussions
on the mailing-lists, I'm checking out PG13 beta for Windows
from:
 https://www.enterprisedb.com/postgresql-early-experience
and it ships with the same obsolete ICU 53 that was used
for PG 10,11,12.
Besides not having the latest Unicode features and fixes, ICU 53
ignores the BCP 47 tags syntax in collations used as examples
in Postgres documentation, which leads to confusion and
false bug reports.
The current version is ICU 67.

I don't see where the suggestion to upgrade it before the
next PG release should be addressed but maybe some people on
this list do know or have the leverage to make it happen?

[1]
https://www.postgresql.org/message-id/16570-58cc04e1a6ef3c3f%40postgresql.org

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Re: csv format for psql

2018-03-22 Thread Daniel Verite
David Steele wrote:

> Do you know when you'll have an updated patch that addresses the minor
> issues brought up in review and the concern above?

Here's an update addressing the issues discussed:

- fieldsep and recordsep are used, no more fieldsep_csv
- the command line option is --csv without short option and is equivalent
 to -P format=csv -P fieldsep=','
- pset output formats are reordered alphabetically on display
- a couple more cases in the regression tests


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 10b9795..c984a9c 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -152,6 +152,16 @@ EOF
 
 
 
+  --csv
+  
+  
+  Switches to csv output mode. This is equivalent to \pset format
+  csv followed by \pset fieldsep ','.
+  
+  
+
+
+
   -d dbname
   --dbname=dbname
   
@@ -246,7 +256,7 @@ EOF
   
   
   Use separator as the
-  field separator for unaligned output. This is equivalent to
+  field separator for unaligned and csv outputs. This is equivalent to
   \pset fieldsep or \f.
   
   
@@ -382,7 +392,7 @@ EOF
   
   
   Use separator as the
-  record separator for unaligned output. This is equivalent to
+  record separator for unaligned and csv outputs. This is equivalent to
   \pset recordsep.
   
   
@@ -558,7 +568,7 @@ EOF
   
   
   Set the field separator for unaligned output to a zero byte.  This is
-  equvalent to \pset fieldsep_zero.
+  equivalent to \pset fieldsep_zero.
   
   
 
@@ -1937,9 +1947,9 @@ Tue Oct 26 21:40:57 CEST 1999
 
 
 
-Sets the field separator for unaligned query output. The default
-is the vertical bar (|). It is equivalent to
-\pset fieldsep.
+Sets the field separator for unaligned and csv query outputs. The
+default is the vertical bar (|). It is equivalent
+to \pset fieldsep.
 
 
   
@@ -2546,8 +2556,8 @@ lo_import 152801
   fieldsep
   
   
-  Specifies the field separator to be used in unaligned output
-  format. That way one can create, for example, tab- or
+  Specifies the field separator to be used in unaligned and csv output
+  formats. That way one can create, for example, tab- or
   comma-separated output, which other programs might prefer. To
   set a tab as field separator, type \pset fieldsep
   '\t'. The default field separator is
@@ -2584,9 +2594,13 @@ lo_import 152801
   format
   
   
-  Sets the output format to one of unaligned,
-  aligned, wrapped,
-  html, asciidoc,
+  Sets the output format to one of
+  unaligned,
+  aligned,
+  csv,
+  wrapped,
+  html,
+  asciidoc,
   latex (uses tabular),
   latex-longtable, or
   troff-ms.
@@ -2601,6 +2615,15 @@ lo_import 152801
   format).
   
 
+  csv format writes columns separated
+  by fieldsep, applying the CSV quoting rules
+  described in RFC-4180 and compatible with the CSV format
+  of the COPY command.
+  The header with column names is output unless the
+  tuples_only parameter is on.
+  Title and footers are not printed.
+  
+
   aligned format is the standard, 
human-readable,
   nicely formatted text output;  this is the default.
   
@@ -2747,8 +2770,8 @@ lo_import 152801
   recordsep
   
   
-  Specifies the record (line) separator to use in unaligned
-  output format. The default is a newline character.
+  Specifies the record (line) separator to use in unaligned or
+  csv output formats. The default is a newline character.
   
   
   
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 3560318..1d8cc96 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3603,6 +3603,9 @@ _align2string(enum printFormat in)
case PRINT_TROFF_MS:
return "troff-ms";
break;
+   case PRINT_CSV:
+   return "csv";
+   break;
}
return "unknown";
 }
@@ -3658,25 +3661,27 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
{
if (!value)
;
-   else if (pg_strncasecmp("unaligned", value, vallen) == 0)
-   popt->topt.format = PRINT_UNALIGNED;
else if (pg_strncasecmp("aligned", value, vallen) == 0)
popt->topt.form

Re: Re: csv format for psql

2018-03-23 Thread Daniel Verite
Pavel Stehule wrote:

> It should not be hard. All formats can has '|' like now, and csv can have a
> ',' - then if field separator is not explicit, then default field separator
> is used, else specified field separator is used.
> 
> You can see my idea in attached patch

With that patch, consider this sequence:

postgres=# \pset format csv
Output format is csv.
postgres=# \pset fieldsep
Field separator is "|".
postgres=# select 1 as a,2 as b;
a,b
1,2

Here psql reports that fieldsep is "|" and right away is using something
else in the output. That doesn't look good.

You may object that it's fixable by tweaking the output of \pset,
\pset fieldsep, and \? variables so that it knows that the current
output format is going to use a "hidden" default separator, and
then these commands should display that value instead.
But that'd be somewhat playing whack-a-mole, as the following
sequence would now be possible, with '|' being used  as
the separator instead of the ',' reported just above:

postgres=# \pset format csv
Output format is csv.
postgres=# \pset fieldsep
Field separator is ",".
postgres=# \a
Output format is aligned.
postgres=# select 1 as a,2 as b;
 a | b 
---+---
 1 | 2


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2018-03-23 Thread Daniel Verite
Kyotaro HORIGUCHI wrote:

> A disucssion on psql batch mode was held in another branch of
> this thread. How do we treat that?

There's a batch mode for pgbench in a patch posted in [1],
with \beginbatch and \endbatch commands, but nothing
for psql AFAICS.
psql is more complicated because currently it uses a 
blocking PQexec() call at its core. Craig mentioned psql
integration in [2] and [3].
Also a script can have inter-query dependencies such as in
   insert into table(...) returning id \gset
   update othertable set col= :id where ...;
which is a problem in batch mode, as we don't want
to send the update before the right value for :id is
known. Whether we want to support these dependencies
and how needs discussion.
For instance we might not support them at all, or
create a synchronization command that collects
all results of queries sent so far, or do it implicitly
when a variable is injected into a query...
This looks like substantial work that might be best
done separately from the libpq patch.

[1]
https://www.postgresql.org/message-id/b4e34135-2bd9-4b8a-94ca-27d760da2...@manitou-mail.org
[2]
https://www.postgresql.org/message-id/CAMsr+YGLhaDkjymLuNVQy4MrSKQoA=F1vO=aN8XQf30N=aq...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAMsr+YE6BK4iAaQz=ny3xdnblhnnz_4tp-ptjqbnnpszmgo...@mail.gmail.com


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Re: csv format for psql

2018-03-26 Thread Daniel Verite
Pavel Stehule wrote:

> implemented in attached patch

Consider your initial output of \pset, when no option is specified
=
$ ./psql 
psql (11devel)
Type "help" for help.

postgres=# \pset
border   1
columns  0
expanded off
fieldsep not used
fieldsep_zeroERROR
footer   on
format   aligned
linestyleascii
null ''
numericlocaleoff
pager1
pager_min_lines  0
recordsep'\n'
recordsep_zero   off
resetERROR
tableattr
title
tuples_only  off
unicode_border_linestyle single
unicode_column_linestyle single
unicode_header_linestyle single


These entries with ERROR correspond in fact to no error at all,
or we have to pretend that the default state of psql is erroneous,
which doesn't make sense.

Also "reset" is not a variable, it seems to be a command,
so it probably shouldn't be there in the first place.

More generally, I'd think the point of reusing "fieldsep" was to 
reuse the concept, not reimplement it, let alone changing
bits of behavior of the unaligned mode along the way.

With this patch, again without specifying any option, just looking
at what fieldsep is leads to this:

postgres=# \pset fieldsep
User didn't specified field separator.
Current format doesn't specify default field separator.

If this is the way to "solve" the fact that a user has to do
 \pset fieldsep ','
to get commas in csv mode, then IMV the proposed solution
is clearly worse than the stated problem, and worse than
simply adding fieldsep_csv to be independant from the
unaligned mode.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Re: csv format for psql

2018-03-26 Thread Daniel Verite
Pavel Stehule wrote:

> b) the list of pset options is bloating - every possible new format can
> introduce fieldsep_X option

What new format?
The usefulness of fieldsep does not extend outside of xSV, and it's
no suprise that there have been no other use for a fieldsep-like
variable until now, with the other formats supported in psql.

In fact it's even absurd for any format to use a non-fixed separator
at a place that is key for being parsed unambiguously.

We could even support only the comma and make it non-configurable
based on the fact it's Comma-Separated-Values, not
Whatever-Separated-Values, except that won't do much
to serve the users interests, as the reality is that
people use various separators.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-03-29 Thread Daniel Verite
David G. Johnston wrote:

> Could someone post how captions, rows-only, and footer pset settings factor
> into this?  Specifically are they fixed to on/off or will they hide/show if
> users request them explicitly?

This is described in the doc with:

+  csv format writes columns separated
+  by fieldsep, applying the CSV quoting rules
+  described in RFC-4180 and compatible with the CSV format
+  of the COPY command.
+  The header with column names is output unless the
+  tuples_only parameter is on.
+  Title and footers are not printed.
+  


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-03-29 Thread Daniel Verite
David G. Johnston wrote:

> Or, really, just make --csv take an optional argument which, if present, is
> the delimiter.

I don't think psql can support optional arguments because
psql --option foo
would be ambiguous, since foo could be the option's value or
the name of a database to connect to.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-03-29 Thread Daniel Verite
Isaac Morland wrote:

> The actual reason I'm posting this is because some of the discussion has
> made me a bit confused: there is already a CSV format defined for the COPY
> command and used by the psql \copy. I just want to check that what is being
> discussed here would use the exact same format as the existing CSV COPY

Please see the start of the thread at
https://www.postgresql.org/message-id/a8de371e-006f-4f92-ab72-2bbe3ee78f03%40manitou-mail.org
where what is proposed is compared to what \copy and COPY
already provide.

About the CSV acronym itself, the COPY documentation
https://www.postgresql.org/docs/current/static/sql-copy.html
already refers to it as "Comma Separated Values", even though
as we know, the delimiter is user-configurable.
There's no difference in psql that would justify a different wording.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-03-29 Thread Daniel Verite
David G. Johnston wrote:

> Unaligned format could grow its own fieldsep if it wanted to but it can
> also just use the default provided fieldsep var whose default value is
> pipe.  If it did grow one it would need to understand "not set" in order to
> preserve existing behavior.  We don't have that problem with csv.

fielsep is already owned by the unaligned format. No other format
uses fieldsep currently. Why would it needs to grow its own?
I don't quite see what you mean here.

For csv, Fabien and Peter expressed the opinion that we shouldn't
create another fieldsep-like variable specifically for it, but instead
reuse fieldsep. That's what my latest patch does.

Now it turns out that sharing fieldsep comes with some problems.

1. since a single variable can't have two different default values,
we have either to accept '|' as a default csv separator, or introduce
some tricks to automagically commute the value.

2. when a user does \pset fieldsep ';' there are doubts
on whether this should affect the current mode only, or both,
or even the other one when the user goes back and forth between
formats. Any proposal where a \pset var value affects by
side-effect another pset variable is inconsistent with what has been
done until now with these variables, and I expect that
side-effects are not exactly sought after in general.

3. the command-line cannot express: use character 'X' for unaligned
and 'Y' for csv for the rest of the session. The current patch provides
'--csv' to select both the csv format and a comma separator, but
when combined with -F, the end result depend on the relative position
of the options, which may feel unintuitive or buggy.
Again we could tweak that, but only by being inconsistent
with how we process other options.

Personally I think the benefit of sharing fieldsep is not worth these
problems, but I'm waiting for the discussion to continue with
more opinions.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-04-07 Thread Daniel Verite
Pavel Stehule wrote:

> so we can to have
> 
> \pset format xxx
> 
> and set of local possibly changed defaults
> 
> \pset csv_fieldsep ,
> \pset csv_tuplesonly on
> \pset unaligned_fieldsep |
> \pset unaligned_tuplesonly off

tuples_only (\t) is a single setting that is currently used by all
formats. It makes sense as it is and I don't quite see what we
would gain by "exploding" it.
There's also "footer" that goes in tandem with "tuples_only",
to switch off the footer while keeping the header and title.
Whatever change is considered to "tuples_only", "footer" must
be considered with it.
For the csv format, tuples_only=off is interpreted as "output the header"
and tuples_only=on as "don't output the header". This is consistent
with what other formats do.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-04-07 Thread Daniel Verite
Peter Eisentraut wrote:

> Another thought: Isn't CSV just the same as unaligned output plus some
> quoting?  Could we add a quote character setting and then define --csv
> to be quote-character = " and fieldsep = , ?

Plus footer set to off. So --csv would be like
\pset format unaligned
\pset fieldsep ','
\pset footer off
\pset unaligned_quote_character '"'

I guess we'd add a \csv command that does these together.

There would still be some discomfort with some of the issues
discussed upthread. For instance
  psql -F ';' --csv
is likely to reset fieldsep to ',' having then a different outcome than
  psql --csv -F';'

And there's the point on how to come back from \csv to another
output, given that it has overwritten "footer" that is used across
all formats, and fieldsep used by unaligned.
So a user might need to figure out anyway the
intricaties behind \csv, if it's not an independant format.

On the whole I'm inclined to resubmit the patch with
fieldsep_csv and some minor changes based on the rest
of the discussion.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-04-07 Thread Daniel Verite
Isaac Morland wrote:

> OK, mostly trying to avoid commenting because I doubt I have much to add.
> But. If I ask for CSV and don't specify any overrides, I expect to get
> "C"omma separated values, not some other character. More specifically, if I
> say --csv I expect to get files that are identical with what I would get if
> I used COPY ... CSV. Actually, COPY ... CSV HEADER, given that psql shows
> column headings. This also implies that I expect the quoting and related
> details that are associated with CSV.

Yes, this is what the current patch does. Only the first version
used the same default delimiter as the unaligned mode, but
nobody liked that part.
As for the header, it's driven by how it's already done for other
formats in psql, with the tuples_only setting, which implies 
that column names are output by default as you mention.
Both settings must be changeable, the exact details of how/when
is what's being discussed.

The output might still differ compared to COPY in that line endings
depend on the client-side OS. There's also the minor issue
of a single \ by itself on a line, which gets quoted by COPY
and not by psql.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-04-07 Thread Daniel Verite
Daniel Verite wrote:

> The output might still differ compared to COPY in that line endings
> depend on the client-side OS. There's also the minor issue
> of a single \ by itself on a line, which gets quoted by COPY
> and not by psql.

I meant \. or backslash followed by period.
This sequence means nothing in particular in csv outside
of COPY, but it means end-of-data for COPY.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Proposal: Adding json logging

2018-04-16 Thread Daniel Verite
David Arnold wrote:

> Not claiming this assumption does imply parsing of a *rolling* set
> of log lines with *previously unkown cardinality*. That's expensive
> on computing resources. I don't have actual numbers, but it doesn't
> seem too far fetched, neither. 
> I filed a question to the author of fluent-bit to that extend which
> you can consult here:
> https://github.com/fluent/fluent-bit/issues/564 Let's see what
> Eduardo has to inform us about this...

fluent-bit does not appear to support CSV, as mentioned in
https://github.com/fluent/fluent-bit/issues/459
which got flagged as an enhancement request some time ago.

In CSV a line break inside a field is easy to process for
a parser, because (per https://tools.ietf.org/html/rfc4180):

  "Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes"

So there is no look ahead to do. In a character-by-character loop,
when encountering a line break, either the current field did not
start with a double quote and the line break is part of the content, or it
did
start with a double quote and the line break ends the current record.

What doesn't quite work is to parse CSV with a regex, it's
discussed in some detail here for instance:
https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Proposal: Adding json logging

2018-04-17 Thread Daniel Verite
David Arnold wrote:

> Interesting, does that implicitly mean the whole log event would get
> transmitted as a "line" (with CRLF) in CSV.

To me it's implied by the doc at:
https://www.postgresql.org/docs/current/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

> In the affirmative scenario, this then would work for a true streaming
> aggregator (if CSV was supported).

Assuming a real CSV parser tailing the log, there shouldn't be any trouble
with that.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: [PATCH] vacuumlo: print the number of large objects going to be removed

2019-09-06 Thread Daniel Verite
Michael Paquier wrote:

> Sure.  However do we need to introduce this much complication as a
> goal for this patch though whose goal is just to provide hints about
> the progress of the work done by vacuumlo? 

Yeah, I went off on a tangent when realizing that ~500 lines of C
client-side code in vacuumlo could be turned into ~50 lines of
plpgsql in a block.
That was not meant as on objection to the patch
(besides I followed the plpgsql approach and got disappointed with the
performance of lo_unlink() in a loop compared to the client-side
equivalent, so I won't bother -hackers with this idea anymore, until I
figure out why it's not faster and if I can do something about it).

One comment about the patch:

+   longto_delete = 0;
...
+   to_delete = strtol(PQcmdTuples(res), NULL, 10);

I believe the maximum number of large objects is almost 2^32, and as a
count above 2^31 may not fit into a signed long, shouldn't we use
an unsigned long instead? This would also apply to the preexisting
"deleted" variable.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Create collation reporting the ICU locale display name

2019-09-11 Thread Daniel Verite
 Hi,

The 'locale' or 'lc_collate/lc_ctype' argument of an ICU collation may
have a complicated syntax, especially with non-deterministic
collations, and input mistakes in these names will not necessarily be
detected as such by ICU.

The "display name" of a locale is a simple way to get human-readable
feedback about the characteristics of that locale.
pg_import_system_collations() already push these as comments when
creating locales en masse.

I think it would be nice to have CREATE COLLATION report this
information as feedback in the form of a NOTICE message.
PFA a simple patch implementing that.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/backend/commands/collationcmds.c 
b/src/backend/commands/collationcmds.c
index 919e092483..ad011d149c 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -36,6 +36,10 @@
 #include "utils/syscache.h"
 
 
+#ifdef USE_ICU
+static char *get_icu_locale_comment(const char *localename);
+#endif
+
 typedef struct
 {
char   *localename; /* name of locale, as per "locale -a" */
@@ -232,6 +236,16 @@ DefineCollation(ParseState *pstate, List *names, List 
*parameters, bool if_not_e
if (!OidIsValid(newoid))
return InvalidObjectAddress;
 
+#ifdef USE_ICU
+   if (collprovider == COLLPROVIDER_ICU)
+   {
+   char *display_name = get_icu_locale_comment(collcollate);
+   if (display_name != NULL)
+   ereport(NOTICE,
+   (errmsg("ICU locale: \"%s\"", 
display_name)));
+   }
+#endif
+
/*
 * Check that the locales can be loaded.  NB: 
pg_newlocale_from_collation
 * is only supposed to be called on non-C-equivalent locales.


Re: Create collation reporting the ICU locale display name

2019-09-12 Thread Daniel Verite
Michael Paquier wrote:

> On Wed, Sep 11, 2019 at 04:53:16PM +0200, Daniel Verite wrote:
> > I think it would be nice to have CREATE COLLATION report this
> > information as feedback in the form of a NOTICE message.
> > PFA a simple patch implementing that.
> 
> Why is that better than the descriptions provided with \dO[S]+ in
> psql?

There is no description for collations created outside of
pg_import_system_collations().

Example:

db=# create collation mycoll(provider=icu, locale='fr-FR-u-ks-level1');
NOTICE:  ICU locale: "French (France, colstrength=primary)"

db=# \x auto

db=# \dO+
List of collations
-[ RECORD 1 ]--+--
Schema | public
Name   | mycoll
Collate| fr-FR-u-ks-level1
Ctype  | fr-FR-u-ks-level1
Provider   | icu
Deterministic? | yes
Description| 

The NOTICE above is with the patch. Otherwise, the "display name"
is never shown nor stored anywhere AFAICS.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Create collation reporting the ICU locale display name

2019-09-13 Thread Daniel Verite
Michael Paquier wrote:

> Or could it make sense to provide a system function which returns a
> collation description for at least an ICU-provided one?  We could make
> use of that in psql for example.

If we prefer having a function over the instant feedback effect of
the NOTICE, the function might look like icu_collation_attributes() [1]
from the icu_ext extension. It returns a set of (attribute,value)
tuples, among which the displayname is one of the values.

An advantage of this approach is that you may execute the
function before creating the collation, instead of creating the
collation, realizing there was something wrong in your
locale/lc_collate argument, dropping the collation and trying again.

Another advantage would be the possibility of localizing the
display name, leaving the localization as a choice to the user.
Currently get_icu_locale_comment() forces "en" as the language because
it want results in US-ASCII, but a user-callable function could have the
language code as an optional argument. When not being forced, the
language has a default value obtained by ICU from the environment
(so that would be from where the postmaster is started in our case),
and is also settable with uloc_setDefault().

Example with icu_ext functions:

test=> select icu_set_default_locale('es');
 icu_set_default_locale 

 es

test=> select value from icu_collation_attributes('en-US-u-ka-shifted')
   where attribute='displayname';
   value

 inglés (Estados Unidos, alternate=shifted)

This output tend to reveal mistakes with tags, which is why I thought
to expose it as a NOTICE. It addresses the case of a user
who wouldn't suspect an error, so the "in-your-face" effect is
intentional. With the function approach, the user must be
proactive.

An example of mistake I found myself doing is forgetting the '-u-' before
the collation tags, which doesn't error out but is detected relatively
easily with the display name.

-- wrong
test=> select value from icu_collation_attributes('de-DE-ks-level1')  
   where attribute='displayname';
 value
-
 German (Germany, KS_LEVEL1)

-- right
test=> select value from icu_collation_attributes('de-DE-u-ks-level1')  
   where attribute='displayname';
 value 
---
 German (Germany, colstrength=primary)


[1] https://github.com/dverite/icu_ext#icu_collation_attributes


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Create collation reporting the ICU locale display name

2019-09-14 Thread Daniel Verite
Tom Lane wrote:

> I think that's a useful function, but it's a different function from
> the one first proposed, which was to tell you the properties of a
> collation you already installed (which might not be ICU, even).
> Perhaps we should have both.

The pre-create use case would look  like:
 SELECT * FROM describe_collation(locale_string text, collprovider "char")

Post-creation, one could do:
 SELECT * FROM describe_collation(collcollate, collprovider)
   FROM pg_catalog.pg_collation WHERE oid = :OID;

Possibly it could exists as SELECT * FROM describe_collation(oid)
but that's essentially the same function.
Or I'm missing something about why we'd need two functions.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Create collation reporting the ICU locale display name

2019-09-14 Thread Daniel Verite
Tom Lane wrote:

> > This output tend to reveal mistakes with tags, which is why I thought
> > to expose it as a NOTICE. It addresses the case of a user
> > who wouldn't suspect an error, so the "in-your-face" effect is
> > intentional. With the function approach, the user must be
> > proactive.
> 
> That argument presupposes (a) manual execution of the creation query,
> and (b) that the user pays close attention to the NOTICE output.
> Unfortunately, I think our past over-use of notices has trained
> people to ignore them.

What about DEBUG1 as the level?
Surely we can draw a line somewhere beyond which the benefit of
getting that information surpasses the annoyance factor that
you're foreseeing?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: ICU for global collation

2019-09-17 Thread Daniel Verite
 Hi,

When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).

That's a problem for the backend code that depends on libc functions
that themselves depend on LC_CTYPE, such as the full text search parser
and dictionaries.

For instance, if you start the instance with a C locale
(LC_ALL=C pg_ctl...) , and tries to use FTS in an ICU UTF-8 database,
it doesn't work:

template1=# create database "fr-utf8" 
  template 'template0' encoding UTF8
  locale 'fr'
  collation_provider 'icu';

template1=# \c fr-utf8
You are now connected to database "fr-utf8" as user "daniel".

fr-utf8=# show lc_ctype;
 lc_ctype 
--
 fr
(1 row)

fr-utf8=# select to_tsvector('été');
ERROR:  invalid multibyte character for locale
HINT:  The server's LC_CTYPE locale is probably incompatible with the
database encoding.

If I peek into the "real" LC_CTYPE when connected to this database,
I can see it's "C":

fr-utf8=# create extension plperl;
CREATE EXTENSION

fr-utf8=# create function lc_ctype() returns text as '$ENV{LC_CTYPE};'
  language plperl;
CREATE FUNCTION

fr-utf8=# select lc_ctype();
 lc_ctype 
--
 C


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: insensitive collations

2019-03-04 Thread Daniel Verite
Peter Eisentraut wrote:

[v7-0001-Collations-with-nondeterministic-comparison.patch]

+GenericMatchText(const char *s, int slen, const char *p, int plen, Oid
collation)
 {
+  if (collation && !lc_ctype_is_c(collation) && collation !=
DEFAULT_COLLATION_OID)
+  {
+pg_locale_tlocale = pg_newlocale_from_collation(collation);
+
+if (locale && !locale->deterministic)
+  ereport(ERROR,
+  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+   errmsg("nondeterministic collations are not supported for
LIKE")));
+  }

This test gets side-stepped when pattern_fixed_prefix() in selfuncs.c
returns Pattern_Prefix_Exact, and the code optimizes the operation by
converting it to a bytewise equality test, or a bytewise range check
in the index with Pattern_Type_Prefix.

Here's a reproducer:

===
create collation ciai (locale='und-u-ks-level1', deterministic=false,
provider='icu');

create table w(t text collate "C");

insert into w select md5(i::text) from generate_series(1,1) as i;
insert into w values('abc');

create index indexname on w(t );

select t from w where t like 'ABC' collate ciai;
 t 
---
(0 rows)

select t from w where t like 'ABC%' collate ciai;
 t 
---
(0 rows)

===

For the LIKE operator, I think the fix should be that like_fixed_prefix() 
should always return Pattern_Prefix_None for non-deterministic collations.

For regular expressions, pg_set_regex_collation() is called at some
point when checking for a potential prefix, and since it errors out with
non-deterministic collations, this issue is taken care of already.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: insensitive collations

2019-03-05 Thread Daniel Verite
Peter Eisentraut wrote:

> Older ICU versions (<54) don't support all the locale customization
> options, so many of my new tests in collate.icu.utf8.sql will fail on
> older systems.  What should we do about that?  Have another extra test file?

Maybe stick to the old-style syntax for the regression tests?
The declarations that won't work as expected with older ICU versions
would be:

CREATE COLLATION case_insensitive (provider = icu, locale =
'und-u-ks-level2', deterministic = false);

'und-u-ks-level2' is equivalent to 'und@colStrength=secondary'

CREATE COLLATION ignore_accents (provider = icu, locale =
'und-u-ks-level1-kc-true', deterministic = false);

'und-u-ks-level1-kc-true' => 'und@colStrength=primary;colCaseLevel=yes'


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: insensitive collations

2019-03-07 Thread Daniel Verite
Peter Eisentraut wrote:

> The problem is not the syntax but that the older ICU versions don't
> support the *functionality* of ks-level2 or colStrength=secondary.  If
> you try it, you will simply get a normal case-sensitive behavior.

My bad, I see now that the "old locale extension syntax" was actually
introduced at the same time than the "language tag syntax" in ICU 54:
http://bugs.icu-project.org/trac/ticket/8260

With previous versions, we'd need to call ucol_setAttribute(),
with the attributes and values defined here:
http://icu-project.org/apiref/icu4c/ucol_8h.html
for instance to get colStrength=secondary:
  ucol_setAttribute(coll, UCOL_STRENGTH , UCOL_SECONDARY, &status);
which I've just checked gives the expected result with ICU-4.2.
These attributes are flagged as "Stable: ICU 2.0" up to 
"Stable: ICU 2.8" (for UCOL_NUMERIC_COLLATION ).

So if we really wanted to have these functionalities with pre-54 ICU,
we could but that would mean implementing an interface to pass
to CREATE COLLATION the attributes/values we want to support.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Willing to fix a PQexec() in libpq module

2019-03-19 Thread Daniel Verite
Tom Lane wrote:

> Unfortunately, if the default behavior doesn't change, then there's little
> argument for doing this at all.  The security reasoning behind doing
> anything in this area would be to provide an extra measure of protection
> against SQL-injection attacks on carelessly-written clients, and of course
> it's unlikely that a carelessly-written client would get changed to make
> use of a non-default feature.

A patch introducing an "allow_multiple_queries" GUC to
control this was proposed and eventually rejected for lack of
consensus some time ago (also there were some concerns about
the implementation that might have played against it too):

https://www.postgresql.org/message-id/CALAY4q_eHUx%3D3p1QUOvabibwBvxEWGm-bzORrHA-itB7MBtd5Q%40mail.gmail.com

About the effectiveness of this feature, there's a valid use case in
which it's not the developers who decide to set this GUC, but the DBA
or the organization deploying the application. That applies to
applications that of course do not intentionally use multiple queries
per command.
That would provide a certain level a protection against SQL
injections, without changing the application or libpq or breaking
backward compatibility, being optional.

But both in this thread and the other thread, the reasoning about the
GUC seems to make the premise that applications would
need to be updated or developpers need to be aware of it,
as if they _had_ to issue SET allow_multiple_queries TO off/on,
rather than being submitted to it, as imposed upon them by
postgresql.conf or the database settings.

If we compare this to, say, lo_compat_privileges. An application
typically doesn't get to decide whether it's "on". It's for a
superuser to decide which databases or which users must operate with
this setting to "on".
Why wouldn't that model work for disallowing multiple queries per command?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: psql - add SHOW_ALL_RESULTS option

2019-07-24 Thread Daniel Verite
Fabien COELHO wrote:

> >> I'd go further and suggest that there shouldn't be a variable
> >> controlling this. All results that come in should be processed, period.
> >
> > I agree with that.
> 
> I kind of agree as well, but I was pretty sure that someone would complain 
> if the current behavior was changed.

If queries in a compound statement must be kept silent,
they can be converted to CTEs or DO-blocks to produce the
same behavior without having to configure anything in psql.
That cost on users doesn't seem too bad, compared to introducing
a knob in psql, and presumably maintaining it forever.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql - add SHOW_ALL_RESULTS option

2019-07-25 Thread Daniel Verite
Fabien COELHO wrote:

> Attached a "do it always version", which does the necessary refactoring. 
> There is seldom new code, it is rather moved around, some functions are 
> created for clarity.

Thanks for the update!
FYI you forgot to remove that bit:

--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3737,7 +3737,7 @@ psql_completion(const char *text, int start, int end)
else if (TailMatchesCS("\\set", MatchAny))
{
if (TailMatchesCS("AUTOCOMMIT|ON_ERROR_STOP|QUIET|"
- "SINGLELINE|SINGLESTEP"))
+
"SINGLELINE|SINGLESTEP|SHOW_ALL_RESULTS"))

Also copydml does not  seem to be exercised with combined
queries,  so do we need this chunk:

--- a/src/test/regress/sql/copydml.sql
+++ b/src/test/regress/sql/copydml.sql
@@ -70,10 +70,10 @@ drop rule qqq on copydml_test;
 create function qqq_trig() returns trigger as $$
 begin
 if tg_op in ('INSERT', 'UPDATE') then
-raise notice '% %', tg_op, new.id;
+raise notice '% % %', tg_when, tg_op, new.id;
 return new;
 else
-raise notice '% %', tg_op, old.id;
+raise notice '% % %', tg_when, tg_op, old.id;
 return old;
 end if;


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql - add SHOW_ALL_RESULTS option

2019-07-26 Thread Daniel Verite
Fabien COELHO wrote:

>  sh> /usr/bin/psql
>   psql (12beta2 ...)
>   fabien=# \set FETCH_COUNT 2
>   fabien=# SELECT 1234 \; SELECT 5432 ;
>   fabien=#
> 
>   same thing with pg 11.4, and probably down to every version of postgres
>   since the feature was implemented...
> 
> I think that fixing this should be a separate bug report and patch. I'll 
> try to look at it.

That reminds me that it was already discussed in [1]. I should add the
proposed fix to the next commitfest.


[1]
https://www.postgresql.org/message-id/flat/a0a854b6-563c-4a11-bf1c-d6c6f924004d%40manitou-mail.org


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: [HACKERS] proposal: psql command \graw

2018-01-27 Thread Daniel Verite
Pavel Stehule wrote:

> We are able to generate html/tex/markdown formats on client side. CSV is
> more primitive, but much more important data format, so it should not be a
> problem. But I remember some objections related to code duplication.

While experimenting with adding csv as an output format
(similar to unaligned except that it quotes with CSV rules), I find out
that we can already do this:

\o | client-side-program with arguments
or
\o /path/to/file.csv
COPY (select query) TO STDOUT CSV [HEADER] ;
\o

This can be used to route multiple resultsets into multiple programs
or files without the minor drawbacks of \copy (single line
formatting, no variables interpolation), or
psql -c "COPY (query) TO STDOUT.." >file.csv which is a bit rigid
(single redirect).


OTOH I notice that this simpler form with a right-side argument of \g:
 =# COPY (select 1) TO STDOUT CSV HEADER \g /tmp/file.csv
outputs to the console rather into a file. Not sure why it acts
differently than \o and whether it's intentional.

Anyway I think the above sequence with \o already does everything
that is being proposed as an alternative to \graw, or am I missing
something?


What \o + COPY CSV lacks compared to a csv output format
is the ability to export results from meta-commands in csv,
such as \g in \x mode or \gx, \crosstabview, \d, \l...

The other problem with \o |program + COPY CSV is that it's
not easy to discover that combination. It's more obvious if
we have csv explicitly listed as an output format.

I'll post the patch implementing that in a new thread and we'll see
how it goes.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



csv format for psql

2018-01-30 Thread Daniel Verite
 Hi,


This patch implements csv as an output format in psql
(\pset format csv). It's quite similar to the unaligned format,
except that it applies CSV quoting rules (obviously!) and that
it prints no footer and no title.
As with unaligned, a header with column names is output unless
tuples_only is on. It also supports the fieldsep/fielsep_zero
and recordsep/recordsep_zero settings.

Most of times, the need for CSV is covered by \copy or COPY with
the CSV option, but there are some cases where it would be more
practical to have it as an output format in psql.

* \copy does not interpolate psql variables and is a single-line
command, so making a query fit these contraints can be cumbersome.
It can be got around by defining a temporary view and
\copy from that view, but that doesn't work in a read-only context
such as when connected to a standby.

* the server-side COPY TO STDOUT can also be used from psql,
typically with psql -c "COPY (query) TO STDOUT CSV" > file.csv,
but that's too simple to extract multiple result sets per script.
COPY is also more rigid than psql in the options to delimit
fields and records.

* copy with csv can't help for the output of meta-commands
such as \gx, \crosstabview, \l, \d ... whereas a CSV format within psql
does work with these.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7ea7edc..ebb3d35 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -246,7 +246,7 @@ EOF
   
   
   Use separator as the
-  field separator for unaligned output. This is equivalent to
+  field separator for unaligned and csv outputs. This is equivalent to
   \pset fieldsep or \f.
   
   
@@ -376,7 +376,7 @@ EOF
   
   
   Use separator as the
-  record separator for unaligned output. This is equivalent to
+  record separator for unaligned and csv outputs. This is equivalent to
   \pset recordsep.
   
   
@@ -551,8 +551,8 @@ EOF
   --field-separator-zero
   
   
-  Set the field separator for unaligned output to a zero byte.  This is
-  equvalent to \pset fieldsep_zero.
+  Set the field separator for unaligned and csv outputs to a zero byte.
+  This is equivalent to \pset fieldsep_zero.
   
   
 
@@ -562,8 +562,9 @@ EOF
   --record-separator-zero
   
   
-  Set the record separator for unaligned output to a zero byte.  This is
-  useful for interfacing, for example, with xargs -0.
+  Set the record separator for unaligned and csv outputs to a zero byte.
+  This is useful for interfacing, for example, with
+  xargs -0.
   This is equivalent to \pset recordsep_zero.
   
   
@@ -1918,9 +1919,9 @@ Tue Oct 26 21:40:57 CEST 1999
 
 
 
-Sets the field separator for unaligned query output. The default
-is the vertical bar (|). It is equivalent to
-\pset fieldsep.
+Sets the field separator for unaligned and csv query outputs. The
+default is the vertical bar (|). It is equivalent
+to \pset fieldsep.
 
 
   
@@ -2527,8 +2528,8 @@ lo_import 152801
   fieldsep
   
   
-  Specifies the field separator to be used in unaligned output
-  format. That way one can create, for example, tab- or
+  Specifies the field separator to be used in unaligned and csv output
+  formats. That way one can create, for example, tab- or
   comma-separated output, which other programs might prefer. To
   set a tab as field separator, type \pset fieldsep
   '\t'. The default field separator is
@@ -2541,8 +2542,8 @@ lo_import 152801
   fieldsep_zero
   
   
-  Sets the field separator to use in unaligned output format to a zero
-  byte.
+  Sets the field separator to use in unaligned or csv output formats to
+  a zero byte.
   
   
   
@@ -2565,9 +2566,13 @@ lo_import 152801
   format
   
   
-  Sets the output format to one of unaligned,
-  aligned, wrapped,
-  html, asciidoc,
+  Sets the output format to one of
+  unaligned,
+  aligned,
+  csv,
+  wrapped,
+  html,
+  asciidoc,
   latex (uses tabular),
   latex-longtable, or
   troff-ms.
@@ -2582,6 +2587,12 @@ lo_import 152801
   format).
   
 
+  csv format is similar to
+  unaligned, except that column contents are
+  enclosed in double quotes and quoted when necessary according to the
+  rules of the CSV format, and that no title or footer are printed.
+  
+
   aligned format is the standard, 
human-readable,
   nicely formatte

Re: csv format for psql

2018-01-31 Thread Daniel Verite
Fabien COELHO wrote:

> > This patch implements csv as an output format in psql
> > (\pset format csv).
> 
> Would you consider registering it in the next CF?

Done here:

https://commitfest.postgresql.org/17/1500/

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-01-31 Thread Daniel Verite
Pavel Stehule wrote:

> This format is too important, so some special short or long option can be
> practical (it will be printed in help)
> 
> some like --csv

I guess -C/--csv could be used, like there already is -H/--html.

> I found one issue - PostgreSQL default field separator is "|". Maybe good
> time to use more common "," ?
> 
> Or when field separator was not explicitly defined, then use "," for CSV,
> and "|" for other. Although it can be little bit messy

Currently there's a strong analogy between the unaligned
mode and csv mode. In particular they use the existing pset
variables fieldsep, fieldsep_zero, recordsep, recordsep_zero
in the same way. If we want to make csv special with regard
to the delimiters,  that complicates the user interface
For instance if fieldsep was changed automatically by
\pset format csv, it's not obvious if/when we should switch it
back to its previous state, and how the fieldsep switch done
automatically would mix or conflict with other \pset
commands issued by the user.
Or we need to duplicate these variables. Or duplicate
only fieldsep, having a fieldsep_csv, leaving out the
other variables and not being as close to the unaligned
mode.
These options don't appeal to me much compared
to the simplicity of the current patch.

Also, although the comma is the separator defined by the
RFC4180 and the default for COPY CSV, people also use the
semicolon extensively (because it's what Excel does I guess),
which somehow mitigates the importance of comma as the
default value.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: proposal: alternative psql commands quit and exit

2018-02-01 Thread Daniel Verite
Bruce Momjian wrote:

> One open issue is the existing help display is inaccurate on Windows:
> 
> Use \\? for help or press control-C to clear the input buffer.

! #ifndef WIN32
!   puts(_("Use control-D to quit."));
! #else
!   puts(_("Use control-C to quit."));
! #endif

But Control-C exiting on Windows is a bug, isn't it?
Shouldn't we try to fix it to behave like in Unix
rather than documenting it?

Also, the fact that Control-D can quit in the middle of a
multiline query without any confirmation is a usability problem, because
you can always fat-finger a Ctrl+key. By comparison, bash doesn't
accept it and emits the same error as if a script was improperly
terminated. Example:

$ cat '
> [Hit Ctrl+D here] bash: unexpected EOF while looking for matching `''
bash: syntax error: unexpected end of file
$ 

There's also the issue that, in general, communicating different hints
and advice depending on the host operating system is not ideal.
Because people ask "how do I do such and such in psql?",
they do not ask "how do I do such and such in psql in Windows?".


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: proposal: alternative psql commands quit and exit

2018-02-01 Thread Daniel Verite
Bruce Momjian wrote:


> > Also, the fact that Control-D can quit in the middle of a
> > multiline query without any confirmation is a usability problem, because
> > you can always fat-finger a Ctrl+key. By comparison, bash doesn't
> > accept it and emits the same error as if a script was improperly
> > terminated. Example:
> > 
> > $ cat '
> > > [Hit Ctrl+D here] bash: unexpected EOF while looking for matching `''
> > bash: syntax error: unexpected end of file
> > $ 

BTW, I've missed that Ctrl+D behavior also depends on IGNOREEOF.

From https://www.postgresql.org/docs/current/static/app-psql.html

IGNOREEOF

If set to 1 or less, sending an EOF character (usually Control+D)
to an interactive session of psql will terminate the
application. If set to a larger numeric value, that many
consecutive EOF characters must be typed to make an interactive
session terminate. If the variable is set to a non-numeric value,
it is interpreted as 10. The default is 0.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-02-12 Thread Daniel Verite
Pavel Stehule wrote:

> > I guess -C/--csv could be used, like there already is -H/--html.
> >
> 
> I prefer just long option only. Maybe in future we can use short "C" for
> something else better. There is only few free short commands.

Looking at parse_psql_options(), currently psql has 35 long options
and all of them have a corresponding single-character invocation.
Unless there is a consensus for it, I don't feel like it's time to create
the precedent that some options deserve short forms and others
don't.

Sure at some point we'll run out of letters, but looking back ten
years ago, on Jan 2008 psql had 31 options, so the rate of adding new
ones does not look worrying.

Besides I like the fact that -C can be seen as a drop-in replacement for
-A, because in most cases, it's just a safer version of -A, as it
deals with the situation that the separator might be in the contents,
which is the main weakness of -A.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-02-12 Thread Daniel Verite
Pavel Stehule wrote:

> some like "\pset format csv , header_off

If we did that, we'd need to reconsider the interactions
of this with \t on|off and \pset footer on|off
and how to keep things consistent with the unaligned
format.
I feel like it's not worth the trouble.
We can still add this later if users are confused with
the interface, but that interface is already well
established with the unaligned format.

> 2. if we support csv format, then we can introduce \gcsv xxx | gnuplot 

I thought committers rejected this because they didn't want a
new variant of \g
The csv format makes it possible to not add any new \g-style
metacommand and inject csv into the existing ones.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: ICU for global collation

2019-11-01 Thread Daniel Verite
Peter Eisentraut wrote:

> I looked into this problem.  The way to address this would be adding 
> proper collation support to the text search subsystem.  See the TODO 
> markers in src/backend/tsearch/ts_locale.c for starting points.  These 
> APIs spread out to a lot of places, so it will take some time to finish. 
>  In the meantime, I'm pausing this thread and will set the CF entry as RwF.

Even if the FTS code is improved in that matter, any extension code
with libc functions depending on LC_CTYPE is still going to be
potentially problematic. In particular when it happens to be set
to a different encoding than the database.

Couldn't we simply invent per-database GUC options, as in
ALTER DATABASE myicudb SET libc_lc_ctype TO 'value';
ALTER DATABASE myicudb SET libc_lc_collate TO 'value';

where libc_lc_ctype/libc_lc_collate would specifically set
the values in the LC_CTYPE and LC_COLLATE environment vars
of any backend serving the corresponding database"?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: updating unaccent.rules for Arabic letters

2019-11-04 Thread Daniel Verite
kerbrose khaled wrote:

> I would like to update unaccent.rules file to support Arabic letters. so
> could someone help me or tell me how could I add such contribution. I
> attached the file including the modifications, only the last 4 lines.

The Arabic letters are found in the Unicode block U+0600 to U+06FF 
(https://www.fileformat.info/info/unicode/block/arabic/list.htm)
There has been no coverage of this block until now by the unaccent
module. Since Arabic uses several diacritics [1] , it would be best to
figure out all the transliterations that should go in and let them in
one go (plus coding that in the Python script).

The canonical way to unaccent is normally to apply a Unicode
transformation: NFC -> NFD and remove the non-spacing marks.

I've tentatively did that with each codepoint in the 0600-06FF block
in SQL with icu_transform in icu_ext [2], and it produces the
attached result, with 60 (!) entries, along with Unicode names for
readability.

Does that make sense to people who know Arabic?

For the record, here's the query:

WITH block(cp) AS (select * FROM generate_series(x'600'::int,x'6ff'::int) AS
cp),
  dest AS (select cp, icu_transform(chr(cp), 'any-NFD;[:nonspacing mark:]
any-remove; any-NFC') AS unaccented FROM block)
SELECT
  chr(cp) as "src",
  icu_transform(chr(cp), 'Name') as "srcName",
  dest.unaccented as "dest",
  icu_transform(dest.unaccented, 'Name') as "destName"
FROM dest
WHERE chr(cp) <> dest.unaccented;


[1] https://en.wikipedia.org/wiki/Arabic_diacritics
[2] https://github.com/dverite/icu_ext#icu_transform


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


unaccent-arabic-block.utf8.output
Description: Binary data


Making psql error out on output failures

2019-12-16 Thread Daniel Verite
  Hi,

When exporting data with psql -c "..." >file or select ... \g file inside
psql,
post-creation output errors are silently ignored.
The problem can be seen easily by creating a small ramdisk and
filling it over capacity:

$ sudo mount -t tmpfs -o rw,size =1M tmpfs /mnt/ramdisk

$ psql -d postgres -At \
  -c "select repeat('abc', 100)" > /mnt/ramdisk/file

$ echo $?
0

$ ls -l /mnt/ramdisk/file
-rw-r--r-- 1 daniel daniel 1048576 Dec 16 15:57 /mnt/ramdisk/file

$ df -h /mnt/ramdisk/
Filesystem  Size  Used Avail Use% Mounted on
tmpfs   1.0M  1.0M 0 100% /mnt/ramdisk

The output that should be 3M byte long is truncated as expected, but
we got no error message and no error code from psql, which
is obviously not nice.

The reason is that PrintQuery() and the code below it in
fe_utils/print.c call puts(), putc(), fprintf(),... without checking their
return values or the result of ferror() on the output stream.
If we made them do that and had the printing bail out at the first error,
that would be a painful change, since there are a lot of such calls:
$ egrep -w '(fprintf|fputs|fputc)' fe_utils/print.c | wc -l
326
and the call sites are in functions that have no error reporting paths
anyway.

To start the discussion, here's a minimal patch that checks ferror() in
PrintQueryTuples() to raise an error when the printing is done
(better late than never).
The error message is generic as opposed to using errno, as 
I don't know whether errno has been clobbered at this point.
OTOH, I think that the error indicator on the output stream is not
cleared by successful writes after some other previous writes have failed.

Are there opinions on whether this should be addressed simply like
in the attached, or a large refactoring of print.c to bail out at the first
error would be better, or other ideas on how to proceed?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 9c0d53689e..993484d092 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -718,6 +718,7 @@ static bool
 PrintQueryTuples(const PGresult *results)
 {
 	printQueryOpt my_popt = pset.popt;
+	bool result = true;
 
 	/* one-shot expanded output requested via \gx */
 	if (pset.g_expanded)
@@ -736,6 +737,12 @@ PrintQueryTuples(const PGresult *results)
 
 		printQuery(results, &my_popt, fout, false, pset.logfile);
 
+		if (ferror(fout))
+		{
+			pg_log_error("Error printing tuples");
+			result = false;
+		}
+
 		if (is_pipe)
 		{
 			pclose(fout);
@@ -745,9 +752,16 @@ PrintQueryTuples(const PGresult *results)
 			fclose(fout);
 	}
 	else
+	{
 		printQuery(results, &my_popt, pset.queryFout, false, pset.logfile);
+		if (ferror(pset.queryFout))
+		{
+			pg_log_error("Error printing tuples");
+			result = false;
+		}
+	}
 
-	return true;
+	return result;
 }
 
 


Re: Unicode normalization SQL functions

2020-01-06 Thread Daniel Verite
Peter Eisentraut wrote:

> Also, there is a way to optimize the "is normalized" test for common 
> cases, described in UTR #15.  For that we'll need an additional data 
> file from Unicode.  In order to simplify that, I would like my patch 
> "Add support for automatically updating Unicode derived files" 
> integrated first.

Would that explain that the NFC/NFKC normalization and "is normalized"
check seem abnormally slow with the current patch, or should
it be regarded independently of the other patch?

For instance, testing 1 short ASCII strings:

postgres=# select count(*) from (select md5(i::text) as t from
generate_series(1,1) as i) s where t is nfc normalized ;
 count 
---
 1
(1 row)

Time: 2573,859 ms (00:02,574)

By comparison, the NFD/NFKD case is faster by two orders of magnitude:

postgres=# select count(*) from (select md5(i::text) as t from
generate_series(1,1) as i) s where t is nfd normalized ;
 count 
---
 1
(1 row)

Time: 29,962 ms

Although NFC/NFKC has a recomposition step that NFD/NFKD
doesn't have, such a difference is surprising.

I've tried an alternative implementation based on ICU's
unorm2_isNormalized() /unorm2_normalize() functions (which I'm
currently adding to the icu_ext extension to be exposed in SQL).
With these, the 4 normal forms are in the 20ms ballpark with the above
test case, without a clear difference between composed and decomposed
forms.

Independently of the performance, I've compared the results
of the ICU implementation vs this patch on large series of strings
with all normal forms and could not find any difference.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-09 Thread Daniel Verite
Dent John wrote:

> I’ve made a revision of this patch. 

Some comments:

* the commitfest app did not extract up the patch from the mail,
possibly because it's buried in the MIME structure of the mail
(using plain text instead of HTML messages might help with that).
The patch has no status in http://commitfest.cputube.org/
probably because of this too.


* unnest-refcursor-v3.patch needs a slight rebase because this chunk
in the Makefile fails
-   regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \
+   refcursor.o regexp.o regproc.o ri_triggers.o rowtypes.o ruleutils.o \


* I'm under the impression that refcursor_unnest() is functionally
equivalent to a plpgsql implementation like this:

create function unnest(x refcursor) returns setof record as $$
declare
 r record;
begin
  loop
fetch x into r;
exit when not found;
return next r;
  end loop;
end $$ language plpgsql;

but it would differ in performance, because internally a materialization step
could be avoided, but only when the other patch "Allow FunctionScans to
pipeline results"  gets in?
Or are performance benefits expected right away with this patch?

*
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -5941,7 +5941,7 @@ array_fill_internal(ArrayType *dims, ArrayType *lbs,
 
 
 /*
- * UNNEST
+ * UNNEST (array)
  */

This chunk looks unnecessary?

* some user-facing doc would be needed.

* Is it good to overload "unnest" rather than choosing a specific
function name?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-10 Thread Daniel Verite
Dent John wrote:

> Yes. That’s at least true if unnest(x) is used in the FROM. If it’s used in
> the SELECT, actually it can get the performance benefit right away

At a quick glance, I don't see it called in the select-list  in any
of the regression tests. When trying it, it appears to crash (segfault):

postgres=# begin;
BEGIN

postgres=# declare c cursor for select oid::int as i, relname::text as r from
pg_class;
DECLARE CURSOR

postgres=# select unnest('c'::refcursor);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The build is configured with:
./configure --enable-debug --with-icu --with-perl --enable-depend


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Making psql error out on output failures

2020-01-14 Thread Daniel Verite
David Z wrote:

> $ psql -d postgres  -At -c "select repeat('111', 100)" >
> /mnt/ramdisk/file

The -A option selects the unaligned output format and -t
switches to the "tuples only" mode (no header, no footer).

> Test-2: delete the "file", run the command within psql console,
> $ rm /mnt/ramdisk/file 
> $ psql -d postgres

In this invocation there's no -A and -t, so the beginning of the
output is going to be a left padded column name that is not in the
other output.
The patch is not involved in that difference.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-14 Thread Daniel Verite
Dent John wrote:

> It’s crashing when it’s checking that the returned tuple matches the
> declared return type in rsinfo->setDesc. Seems rsinfo->setDesc gets
> overwritten. So I think I have a memory management problem.

What is the expected result anyway? A single column with a "record"
type? FWIW I notice that with plpgsql, this is not allowed to happen:

CREATE FUNCTION cursor_unnest(x refcursor) returns setof record
as $$
declare
 r record;
begin
  loop
fetch x into r;
exit when not found;
return next r;
  end loop;
end $$ language plpgsql;

begin;

declare c cursor for select oid::int as i, relname::text as r from pg_class;

select cursor_unnest('c');

ERROR:  set-valued function called in context that cannot accept a set
CONTEXT:  PL/pgSQL function cursor_unnest(refcursor) line 8 at RETURN NEXT


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Making psql error out on output failures

2020-01-16 Thread Daniel Verite
David Zhang wrote:

> If I change the error log message like below, where "%m" is used to pass the
> value of strerror(errno), "could not write to output file:" is copied from
> function "WRITE_ERROR_EXIT". 
> -   pg_log_error("Error printing tuples"); 
> +   pg_log_error("could not write to output file: %m"); 
> then the output message is something like below, which, I believe, is more
> consistent with pg_dump. 

The problem is that errno may not be reliable to tell us what was
the problem that leads to ferror(fout) being nonzero, since it isn't
saved at the point of the error and the output code may have called
many libc functions between the first occurrence of the output error
and when pg_log_error() is called.

The linux manpage on errno warns specifically about this:

NOTES
   A common mistake is to do

   if (somecall() == -1) {
   printf("somecall() failed\n");
   if (errno == ...) { ... }
   }

   where errno no longer needs to have the value  it  had  upon  return 
from  somecall()
   (i.e.,  it  may  have been changed by the printf(3)).  If the value of
errno should be
   preserved across a library call, it must be saved:


This other bit from the POSIX spec [1] is relevant:

  "The value of errno shall be defined only after a call to a function
  for which it is explicitly stated to be set and until it is changed
  by the next function call or if the application assigns it a value."

To use errno in a way that complies with the above, the psql code
should be refactored. I don't know if having a more precise error
message justifies that refactoring. I've elaborated a bit about that
upthread with the initial submission. Besides, I'm not even
sure that errno is necessarily set on non-POSIX platforms when fputc
or fputs fails.
That's why this patch uses the safer approach to emit a generic
error message.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/errno.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql - add SHOW_ALL_RESULTS option

2020-01-17 Thread Daniel Verite
Alvaro Herrera wrote:

> if this patch enables other psql features, it might be a good step
> forward.

Yes. For instance if the stored procedures support gets improved to
produce several result sets, how is psql going to benefit from it
while sticking to the old way (PGresult *r = PQexec(query))
of executing queries that discards N-1 out of N result sets?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: psql - add SHOW_ALL_RESULTS option

2020-01-17 Thread Daniel Verite
Tom Lane wrote:

> I'm not really holding my breath for that to happen, considering
> it would involve fundamental breakage of the wire protocol.
> (For example, extended query protocol assumes that Describe
> Portal only needs to describe one result set.  There might be
> more issues, but that one's bad enough.)

I'm not sure that CALL can be used at all with the extended protocol
today  (just like multiple queries per statements, except that for these,
I'm sure).
My interpretation is that the extended protocol deliberately
lefts out the possibility of multiple result sets because it doesn't fit
with how it's designed and if you want to have this, you can just use
the old protocol's Query message. There is no need to break anything
or invent anything but on the contrary to use the older way.

Considering these 3 ways to use libpq to send queries:

1. using old protocol with PQexec: only one resultset.

2. using old protocol with PQsendQuery+looping on PQgetResult:
same as #1 except multiple result sets can be processed

3. using extended protocol: not for multiple result sets, not for copy,
possibly not for other things, but  can use bind parameters, binary format,
pipelining,...

The current patch is about using #2 instead of #1.
They have been patches about doing bits of #3 in some cases
(binary output, maybe parameters too?) and none got eventually in.
ISTM that the current situation is that psql is stuck at #1 since forever
so it's not fully using the capabilities of the protocol, both old and new.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Making psql error out on output failures

2020-01-20 Thread Daniel Verite
David Zhang wrote:

> Yes, I agree with you. For case 2 "select repeat('111', 100) \g 
> /mnt/ramdisk/file", it can be easily fixed with more accurate error 
> message similar to pg_dump, one example could be something like below. 
> But for case 1 "psql -d postgres  -At -c "select repeat('111', 100)" 
> > /mnt/ramdisk/file" , it may require a lot of refactoring work.

I don't quite see why you make that distinction? The relevant bits of
code are common, it's all the code in src/fe_utils/print.c called
from PrintQuery().

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: making the backend's json parser work in frontend code

2020-01-23 Thread Daniel Verite
Robert Haas wrote:

> With the format I proposed, you only have to worry that the
> file name might contain a tab character, because in that format, tab
> is the delimiter

It could be CSV, which has this problem already solved,
is easier to parse than JSON, certainly no less popular,
and is not bound to a specific encoding.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Unicode normalization SQL functions

2020-01-28 Thread Daniel Verite
Peter Eisentraut wrote:

> Here is an updated patch set that now also implements the "quick check" 
> algorithm from UTR #15 for making IS NORMALIZED very fast in many cases, 
> which I had mentioned earlier in the thread.

I found a bug in unicode_is_normalized_quickcheck() which is
triggered when the last codepoint of the string is beyond
U+1. On encountering it, it does:
+   if (is_supplementary_codepoint(ch))
+   p++;
When ch is the last codepoint, it makes p point to
the ending zero, but the subsequent p++ done by
the for loop makes it miss the exit and go into over-reading.

But anyway, what's the reason for skipping the codepoint
following a codepoint outside of the BMP?
I've figured that it comes from porting the Java code in UAX#15:

public int quickCheck(String source) {
short lastCanonicalClass = 0;
int result = YES;
for (int i = 0; i < source.length(); ++i) {
int ch = source.codepointAt(i);
if (Character.isSupplementaryCodePoint(ch)) ++i;
short canonicalClass = getCanonicalClass(ch);
if (lastCanonicalClass > canonicalClass && canonicalClass != 0) {
return NO;}
int check = isAllowed(ch);
if (check == NO) return NO;
if (check == MAYBE) result = MAYBE;
lastCanonicalClass = canonicalClass;
}
return result;
}

source.length() is the length in UTF-16 code units, in which a surrogate
pair counts for 2. This would be why it does
   if (Character.isSupplementaryCodePoint(ch)) ++i;
it's meant to skip the 2nd UTF-16 code of the pair.
As this does not apply to the 32-bit pg_wchar, I think the two lines above
in the C implementation should just be removed.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Making psql error out on output failures

2020-01-28 Thread Daniel Verite
David Zhang wrote:

> The error "No space left on device" can be logged by fprintf() which is 
> redefined as pg_fprintf() when print_aligned_text() is called

Are you sure? I don't find that redefinition. Besides
print_aligned_text() also calls putc and puts.

> Will that be a better solution if redefine fputs similar to fprintf and 
> log the exact error when first time discovered? 

I think we can assume it's not acceptable to have pg_fprintf()
to print anything to stderr, or to set a flag through a global
variable. So even if using pg_fprintf() for all the printing, no matter
how  (through #defines or otherwise),  there's still the problem that the
error needs to be propagated up the call chain to be processed by psql.

> The concern is that if we can't provide a more accurate
> information to the end user when error happens, sometimes the
> end user might got even confused.

It's better to have a more informative message, but I'm for
not having the best be the enemy of the good.
The first concern is that at the moment, we have no error
report at all in the case when the output can be opened
but the error happens later along the writes.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Changes to pg_dump/psql following collation "C" in the catalog

2019-04-04 Thread Daniel Verite
Tom Lane wrote:

> "Daniel Verite"  writes:
> > One consequence of using the "C" collation in the catalog versus
> > the db collation is that pg_dump -t with a regexp may not find
> > the same tables as before. It happens when these conditions are
> > all met:
> > - the collation of the database is not "C"
> > - the regexp has locale-dependant parts
> > - the names to match include characters that are sensitive to
> > locale-dependant matching
> 
> Hm, interesting.
> 
> > It seems that to fix that, we could qualify the references to columns such
> > as "relname" and "schema_name" with COLLATE "default" clauses in the
> > queries that use pattern-matching in client-side tools, AFAICS
> > pg_dump and psql.
> 
> Seems reasonable.  I was initially worried that this might interfere with
> query optimization, but some experimentation says that the planner
> successfully derives prefix index clauses anyway (which is correct,
> because matching a fixed regex prefix doesn't depend on locale).
> 
> It might be better to attach the COLLATE clause to the pattern constant
> instead of the column name; that'd be less likely to break if sent to
> an older server.
> 
> > Before going any further with this idea, is there agreement that it's an
> > issue to address and does this look like the best way to do that?
> 
> That is a question worth asking.  We're going to be forcing people to get
> used to this when working directly in SQL, so I don't know if masking it
> in a subset of tools is really a win or not.

I think psql and pg_dump need to adjust, just like the 3rd party tools
will, at least those that support collation-aware search in the catalog.
PFA a patch that implements the slight changes needed.

I'll add an entry for it in the next CF.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index 5c1732a..69ac6f9 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -808,6 +808,8 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
  * to limit the set of objects returned.  The WHERE clauses are appended
  * to the already-partially-constructed query in buf.  Returns whether
  * any clause was added.
+ * The pattern matching uses the collation of the database through explicit
+ * COLLATE "default" clauses.
  *
  * conn: connection query will be sent to (consulted for escaping rules).
  * buf: output parameter.
@@ -971,17 +973,18 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
 appendPQExpBuffer(buf,
   "(%s OPERATOR(pg_catalog.~) ", namevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
+appendPQExpBufferStr(buf, " COLLATE \"default\"");
 appendPQExpBuffer(buf,
   "\nOR %s OPERATOR(pg_catalog.~) ",
   altnamevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
-appendPQExpBufferStr(buf, ")\n");
+appendPQExpBufferStr(buf, " COLLATE \"default\" )\n");
 			}
 			else
 			{
 appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", namevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
-appendPQExpBufferChar(buf, '\n');
+appendPQExpBufferStr(buf, " COLLATE \"default\"\n");
 			}
 		}
 	}
@@ -997,7 +1000,7 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
 			WHEREAND();
 			appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", schemavar);
 			appendStringLiteralConn(buf, schemabuf.data, conn);
-			appendPQExpBufferChar(buf, '\n');
+			appendPQExpBufferStr(buf, " COLLATE \"default\"\n");
 		}
 	}
 	else


Re: Changes to pg_dump/psql following collation "C" in the catalog

2019-04-05 Thread Daniel Verite
Chapman Flack wrote:

> >> "Daniel Verite"  writes:
> >>> One consequence of using the "C" collation in the catalog versus
> >>> the db collation
> 
> As an intrigued Person Following At Home, I was happy when I found
> this little three-message thread had more context in [1] and [2]. :)
> 
> -Chap
> 
> [1] https://postgr.es/m/15938.1544377...@sss.pgh.pa.us
> [2] https://postgr.es/m/5978.1544030...@sss.pgh.pa.us

Yes. The concrete issue that the patch addresses can be illustrated
with this example:

  psql (12devel)
  Type "help" for help.

  postgres=# show lc_collate ;
   lc_collate  
  -
   fr_FR.UTF-8
  (1 row)

  postgres=# create table année_2018();
  CREATE TABLE

  postgres=# \dt '\\w+_\\d+'
  psql: error: Did not find any relation named "\w+_\d+".

In previous versions it would have found the table with the accent
in the name. With 12devel it doesn't, because the match is done with
the collation of the column, now "C", which does not consider the
accented character to be a letter.
This also affects pg_dump with the -t and -n switches that
accept patterns and I think pretty much all \d* commands
that accept patterns too.

The purpose of the fix is for the patterns to give the same results as
before. It's done by simply adding explicit collate clauses to use the
collation of the database for these comparisons.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Changes to pg_dump/psql following collation "C" in the catalog

2019-04-05 Thread Daniel Verite
Tom Lane wrote:

> Hm, if that's as much as we have to touch, I think there's a good
> argument for squeezing it into v12 rather than waiting.  The point
> here is mostly to avoid a behavior change from pre-v12

Yes. I was mentioning the next CF because ISTM that nowadays
non-committers are expected to file patches in there, committers
picking up patches both in the current and next CF based on
their evaluation of priorities.
But if you plan to process this one shortly, a CF entry is probably
superfluous.

> Just looking at the patch, I wonder whether it doesn't need some
> server-version checks.  At the very least this would break with
> pre-9.1 servers, which lack COLLATE altogether.

PFA a new version adding the clause for only 12 and up, since the
previous versions are not concerned, and as you mention, really old
versions would fail otherwise.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index 5c1732a..fe88d73 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -808,6 +808,8 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
  * to limit the set of objects returned.  The WHERE clauses are appended
  * to the already-partially-constructed query in buf.  Returns whether
  * any clause was added.
+ * The pattern matching uses the collation of the database through explicit
+ * COLLATE "default" clauses when needed (server version 12 and above).
  *
  * conn: connection query will be sent to (consulted for escaping rules).
  * buf: output parameter.
@@ -971,16 +973,22 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
 appendPQExpBuffer(buf,
   "(%s OPERATOR(pg_catalog.~) ", namevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
+if (PQserverVersion(conn) >= 12)
+	appendPQExpBufferStr(buf, " COLLATE \"default\"");
 appendPQExpBuffer(buf,
   "\nOR %s OPERATOR(pg_catalog.~) ",
   altnamevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
+if (PQserverVersion(conn) >= 12)
+	appendPQExpBufferStr(buf, " COLLATE \"default\"");
 appendPQExpBufferStr(buf, ")\n");
 			}
 			else
 			{
 appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", namevar);
 appendStringLiteralConn(buf, namebuf.data, conn);
+if (PQserverVersion(conn) >= 12)
+	appendPQExpBufferStr(buf, " COLLATE \"default\"");
 appendPQExpBufferChar(buf, '\n');
 			}
 		}
@@ -997,6 +1005,8 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
 			WHEREAND();
 			appendPQExpBuffer(buf, "%s OPERATOR(pg_catalog.~) ", schemavar);
 			appendStringLiteralConn(buf, schemabuf.data, conn);
+			if (PQserverVersion(conn) >= 12)
+appendPQExpBufferStr(buf, " COLLATE \"default\"");
 			appendPQExpBufferChar(buf, '\n');
 		}
 	}


Re: Changes to pg_dump/psql following collation "C" in the catalog

2019-04-06 Thread Daniel Verite
Tom Lane wrote:

> > PFA a new version adding the clause for only 12 and up, since the
> > previous versions are not concerned, and as you mention, really old
> > versions would fail otherwise.
> 
> Pushed with some fiddling with the comments, and changing the collation
> names to be schema-qualified for paranoia's sake.

Thanks !

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Cleanup/remove/update references to OID column

2019-04-10 Thread Daniel Verite
Justin Pryzby wrote:

> Cleanup/remove/update references to OID column...
> 
> ..in wake of 578b229718e8f.

Just spotted a couple of other references that need updates:

#1. In catalogs.sgml:

 
  attnum
  int2
  
  
   The number of the column.  Ordinary columns are numbered from 1
   up.  System columns, such as oid,
   have (arbitrary) negative numbers.
  
 

oid should be replaced by xmin or some other system column.


#2. In ddl.sgml, when describing ctid:

 
  The physical location of the row version within its table.  Note that
  although the ctid can be used to
  locate the row version very quickly, a row's
  ctid will change if it is
  updated or moved by VACUUM FULL.  Therefore
  ctid is useless as a long-term row
  identifier.  The OID, or even better a user-defined serial
  number, should be used to identify logical rows.
 

"The OID" used to refer to an entry above in that list, now it's not
clear what it refers to.
"serial number" also sounds somewhat obsolete now that IDENTITY is
supported. The last sentence could be changed to:
 "A primary key should be used to identify logical rows".


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: PostgreSQL pollutes the file system

2019-04-13 Thread Daniel Verite
Andreas Karlsson wrote:

> The Debian packagers already use pg_createcluster for their script which 
> wraps initdb, and while pg_initdb is a bit misleading (it creates a 
> cluster rather than a database) it is not that bad.

But that renaming wouldn't achieve anything in relation to the stated goal,
since initdb is not in the $PATH in Debian/Ubuntu systems.
It's part of the version-specific binaries located in
/usr/lib/postgresql/$VERSION/bin, which are not meant to be called
directly, but by the pg_*cluster* scripts that you mention, or pg_wrapper.

Moreover, aside from package-specific issues, initdb can already be
invoked through "pg_ctl initdb" since 2010 and version 9.0, as
mentioned in:
 https://www.postgresql.org/docs/9.0/app-initdb.html

This evolution was the result of discussions pretty much like
the present thread.
9 years later, who bothers to use or recommend the new form?
AFAICS nobody cares.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Cleanup/remove/update references to OID column

2019-04-15 Thread Daniel Verite
Andres Freund wrote:

> Yes, I was planning to commit that soon-ish. There still seemed
> review / newer versions happening, though, so I was thinking of waiting
> for a bit longer.

You might want to apply this trivial one in the same batch:

index 452f307..7cfb67f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -428,7 +428,7 @@ main(int argc, char **argv)
 
InitDumpOptions(&dopt);
 
-   while ((c = getopt_long(argc, argv,
"abBcCd:E:f:F:h:j:n:N:oOp:RsS:t:T:U:vwWxZ:",
+   while ((c = getopt_long(argc, argv,
"abBcCd:E:f:F:h:j:n:N:Op:RsS:t:T:U:vwWxZ:",
long_options,
&optindex)) != -1)
{
switch (c)

"o" in the options list is a leftover. Leaving it in getopt_long() has the 
effect that pg_dump -o fails (per the default case in the switch),
but it's missing the expected error message (pg_dump: invalid option -- 'o')


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Trouble with FETCH_COUNT and combined queries in psql

2019-04-22 Thread Daniel Verite
  Hi,

When FETCH_COUNT is set, queries combined in a single request don't work
as expected:

 \set FETCH_COUNT 10
 select pg_sleep(2) \; select 1;

No result is displayed, the pg_sleep(2) is not run, and no error
is shown. That's disconcerting.

The sequence that is sent under the hood is:

#1 BEGIN
#2  DECLARE _psql_cursor NO SCROLL CURSOR FOR
select pg_sleep(2) ; select 1;
#3 CLOSE _psql_cursor
#4 ROLLBACK

The root problem is in deciding that a statement can be run
through a cursor if the query text starts with "select" or "values"
(in is_select_command() in common.c), but not knowing about multiple
queries in the buffer, which are not compatible with the cursor thing.

When sending #2, psql expects the PQexec("DECLARE...") to yield a
PGRES_COMMAND_OK, but it gets a PGRES_TUPLES_OK instead. Given
that, it abandons the cursor, rollbacks the transaction (if
it opened it), and clears out the results of the second select
without displaying them.

If there was already a transaction open, the problem is worse because
it doesn't rollback and we're silently missing an SQL statement that
was possibly meant to change the state of the data, as in
 BEGIN; SELECT compute_something() \; select get_results(); END;

Does anyone have thoughts about how to fix this?
ATM I don't see a plausible fix that does not involve the parser
to store the information that it's a multiple-query command and pass
it down somehow to is_select_command().
Or a more modern approach could be to give up on the
cursor-based method in favor of  PQsetSingleRowMode().
That might be too big a change for a bug fix though,


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Trouble with FETCH_COUNT and combined queries in psql

2019-04-23 Thread Daniel Verite
Fabien COELHO wrote:

> I added some stuff to extract embedded "\;" for pgbench "\cset", which has 
> been removed though, but it is easy to add back a detection of "\;", and 
> also to detect select. If the position of the last select is known, the 
> cursor can be declared in the right place, which would also solve the 
> problem.

Thanks, I'll extract the necessary bits from your patch.
I don't plan to go as far as injecting a DECLARE CURSOR inside
the query, but rather just forbid the use of the cursor in
the combined-queries case.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Trouble with FETCH_COUNT and combined queries in psql

2019-04-23 Thread Daniel Verite
Tom Lane wrote:

> Keep in mind that a large part of the reason why the \cset patch got
> bounced was exactly that its detection of \; was impossibly ugly
> and broken.  Don't expect another patch using the same logic to
> get looked on more favorably.

Looking at the end of the discussion about \cset, it seems what
you were against was not much how the detection was done rather
than how and why it was used thereafter.

In the case of the present bug, we just need to know whether there
are any \; query separators in the command string.
If yes, then SendQuery() doesn't get to use the cursor technique to
avoid any risk with that command string, despite FETCH_COUNT>0.

PFA a simple POC patch implementing this.

> Having said that, I did like the idea of maybe going over to
> PQsetSingleRowMode instead of using an explicit cursor.  That
> would represent a net decrease of cruftiness here, instead of
> layering more cruft on top of what's already a mighty ugly hack.

It would also work with queries that start with a CTE, and queries
like (UPDATE/INSERT/DELETE.. RETURNING), that the current way
with the cursor cannot handle. But that looks like a project for PG13,
whereas a fix like the attached could be backpatched.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index bd28444..78641e0 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1405,7 +1405,8 @@ SendQuery(const char *query)
 		results = NULL;			/* PQclear(NULL) does nothing */
 	}
 	else if (pset.fetch_count <= 0 || pset.gexec_flag ||
-			 pset.crosstab_flag || !is_select_command(query))
+			 pset.crosstab_flag || pset.num_semicolons > 0 ||
+			 !is_select_command(query))
 	{
 		/* Default fetch-it-all-and-print mode */
 		instr_time	before,
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index 3ae4470..e5af7a2 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -426,6 +426,7 @@ MainLoop(FILE *source)
 /* execute query unless we're in an inactive \if branch */
 if (conditional_active(cond_stack))
 {
+	pset.num_semicolons = psql_scan_get_escaped_semicolons(scan_state);
 	success = SendQuery(query_buf->data);
 	slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
 	pset.stmt_lineno = 1;
@@ -502,6 +503,7 @@ MainLoop(FILE *source)
 	/* should not see this in inactive branch */
 	Assert(conditional_active(cond_stack));
 
+	pset.num_semicolons = psql_scan_get_escaped_semicolons(scan_state);
 	success = SendQuery(query_buf->data);
 
 	/* transfer query to previous_buf by pointer-swapping */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 5be5091..2755927 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -110,6 +110,8 @@ typedef struct _psqlSettings
 	char	   *inputfile;		/* file being currently processed, if any */
 	uint64		lineno;			/* also for error reporting */
 	uint64		stmt_lineno;	/* line number inside the current statement */
+	int			num_semicolons;	/* number of query separators (\;) found
+   inside the current statement */
 
 	bool		timing;			/* enable timing of all queries */
 
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 850754e..ee32547 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -694,8 +694,15 @@ other			.
 	 * substitution.  We want these before {self}, also.
 	 */
 
-"\\"[;:]		{
-	/* Force a semi-colon or colon into the query buffer */
+"\\";			{
+	/* Count semicolons in compound commands */
+	cur_state->escaped_semicolons++;
+	/* Force a semicolon into the query buffer */
+	psqlscan_emit(cur_state, yytext + 1, 1);
+}
+
+"\\":			{
+	/* Force a colon into the query buffer */
 	psqlscan_emit(cur_state, yytext + 1, 1);
 }
 
@@ -1066,6 +1073,9 @@ psql_scan(PsqlScanState state,
 	/* Set current output target */
 	state->output_buf = query_buf;
 
+	/* Reset number of escaped semicolons seen */
+	state->escaped_semicolons = 0;
+
 	/* Set input source */
 	if (state->buffer_stack != NULL)
 		yy_switch_to_buffer(state->buffer_stack->buf, state->scanner);
@@ -1210,6 +1220,16 @@ psql_scan_reset(PsqlScanState state)
 }
 
 /*
+ * Return the number of escaped semicolons in the lexed string seen by the
+ * previous psql_scan call.
+ */
+int
+psql_scan_get_escaped_semicolons(PsqlScanState state)
+{
+	return state->escaped_semicolons;
+}
+
+/*
  * Reselect this lexer (psqlscan.l) after using another one.
  *
  * Currently and for foreseeable uses, it's sufficient to reset to INITIAL
diff --git a/src/include/fe_utils/psqlscan.h b/src/include/fe_utils/psqlscan.h
index c3a1d58..395ac78 100644
--- a/src/include/fe_utils/psqlscan.h
+++ b/src/include/fe_utils/psqlscan.h
@@ -83,6 +83,8 @@ extern PsqlScanResult psql_scan(PsqlScanState state,
 
 extern void psql_scan_reset(PsqlScanState state);
 
+extern int

RE: psql - add SHOW_ALL_RESULTS option

2019-05-15 Thread Daniel Verite
Fabien COELHO wrote:

> >> IMHO this new setting should be on by default: few people know about \; so
> >> it would not change anything for most, and I do not see why those who use
> >> it would not be interested by the results of all the queries they asked 
> >> for.
> > I agree with your opinion.
> 
> Ok. I did not yet change the default in the attached version, though.

I'd go further and suggest that there shouldn't be a variable
controlling this. All results that come in should be processed, period.
It's not just about \; If the ability of CALL to produce multiple
resultsets gets implemented (it was posted as a POC during v11
development), this will be needed too.

> This attached version does:
>  - ensure that warnings appear just before its
>  - add the entry in psql's help
>  - redefine the function boundary so that timing is cleaner
>  - include somehow improved tests

\errverbose seems to no longer work with the patch:

test=> select 1/0;
psql: ERROR:  division by zero

test=> \errverbose
There is no previous error.

as opposed to this output with PG11:

test=> \errverbose
ERROR:  22012: division by zero
LOCATION:  int4div, int.c:820

\errverbose has probably no regression tests because its output
includes these ever-changing line numbers; hence `make check`
cannot be used to find this regression.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: COPY FROM WHEN condition

2018-11-02 Thread Daniel Verite
Pavel Stehule wrote:

> > SELECT x.a, sum(x.b)
> > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> > numeric, c text, d date, e json) )
> > WHERE x.d >= '2018-11-01'
> >
> >
> Without some special feature this example is not extra useful. It is based
> on copy on server that can use only super user with full rights.

And if superuser, one might use the file data wrapper [1]  to get
the same results without the need for the explicit COPY in the query.

BTW, this brings into question whether the [WHEN condition] clause
should be included in the options of file_fdw, as it supports pretty
much all COPY options.

Also, psql's \copy should gain the option too?


[1] https://www.postgresql.org/docs/current/static/file-fdw.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-11-02 Thread Daniel Verite
Michael Paquier wrote:

> If you can produce a new  version, please feel free to post it.

Here's a rebased version with a couple regression tests
added per the discussions during the previous CF.

Now at https://commitfest.postgresql.org/20/1861/

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68..98147ef 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -672,6 +672,10 @@ COPY count
 
   
CSV Format
+   
+CSV
+in COPY
+   
 

 This format option is used for importing and exporting the Comma
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index eb9d93a..7617c5e 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -152,6 +152,20 @@ EOF
 
 
 
+  --csv
+  
+  
+  
+   CSV
+   in psql
+  
+  Switches to CSV output mode. This is equivalent
+  to \pset format csv.
+  
+  
+
+
+
   -d dbname
   --dbname=dbname
   
@@ -2557,6 +2571,19 @@ lo_import 152801
   
 
   
+  fieldsep_csv
+  
+  
+  Specifies the field separator to be used in the
+  CSV format. When the separator appears in a field
+  value, that field is output inside double quotes according to
+  CSV rules. To set a tab as field separator, type
+  \pset fieldsep_csv '\t'. The default is a comma.
+  
+  
+  
+
+  
   fieldsep_zero
   
   
@@ -2584,12 +2611,11 @@ lo_import 152801
   format
   
   
-  Sets the output format to one of unaligned,
-  aligned, wrapped,
-  html, asciidoc,
+  Sets the output format to one of aligned,
+  asciidoc, csv, 
html,
   latex (uses tabular),
-  latex-longtable, or
-  troff-ms.
+  latex-longtable, troff-ms,
+  unaligned, or wrapped.
   Unique abbreviations are allowed.  (That would mean one letter
   is enough.)
   
@@ -2597,14 +2623,27 @@ lo_import 152801
   unaligned format writes all columns of a 
row on one
   line, separated by the currently active field separator. This
   is useful for creating output that might be intended to be read
-  in by other programs (for example, tab-separated or comma-separated
-  format).
+  in by other programs.
   
 
   aligned format is the standard, 
human-readable,
   nicely formatted text output;  this is the default.
   
 
+ csv format writes columns separated by
+ commas, applying the quoting rules described in RFC 4180.
+ Alternative separators can be selected with
+ \pset fieldsep_csv.
+ The output is compatible with the CSV format of the
+ COPY command. The header with column names
+ is output unless the tuples_only parameter is
+ on. Title and footers are not printed.
+ Each row is terminated by the system-dependent end-of-line character,
+ which is typically a single newline (\n) for
+ Unix-like systems or a carriage return and newline sequence
+ (\r\n) for Microsoft Windows.
+ 
+
   wrapped format is like 
aligned but wraps
   wide data values across lines to make the output fit in the target
   column width.  The target width is determined as described under
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 5b4d54a..8d0ad71 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1941,8 +1941,8 @@ exec_command_pset(PsqlScanState scan_state, bool 
active_branch)
 
int i;
static const char *const my_list[] = {
-   "border", "columns", "expanded", "fieldsep", 
"fieldsep_zero",
-   "footer", "format", "linestyle", "null",
+   "border", "columns", "expanded", "fieldsep", 
"fieldsep_csv",
+   "fieldsep_zero", "footer", "format", 
"linestyle", "null",
"numericlocale", "pager", "pager_min_lines",
"recordsep", "recordsep_zero",
"tableattr", "title", "tuples_only",
@@ -3584,6 +3584,9 @@ _align2string(enum printFormat in)
case PRINT_TROFF_MS:
return "troff-ms";
break;
+   case PRINT_CSV:
+   return "csv";
+   break;
}
return "unknown";
 }
@@ -3639,25 +3642,27 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
{
if (!

Re: COPY FROM WHEN condition

2018-11-03 Thread Daniel Verite
David Fetter wrote:

> It also seems like a violation of separation of concerns to couple
> FEBE to grammar, so there'd need to be some way to do those things
> separately, too.

After re-reading psql/copy.c, I withdraw what I said upthread:
it doesn't appear necessary to add anything to support the WHEN
condition with \copy.

\copy does have a dedicated mini-parser, but it doesn't try to
recognize every option: it's only concerned with getting the bits of
information that are needed to perform the client-side work:
- whether it's a copy from or to
- what exact form and value has the 'filename' argument immediately
 after from or to:
 '' | PROGRAM '' | stdin | stdout | pstdout | pstdout

It doesn't really care what the options are, just where they are
in the buffer, so they can be copied into the COPY SQL statement.

From the code:
 * The documented syntax is:
 *  \copy tablename [(columnlist)] from|to filename [options]
 *  \copy ( query stmt ) to filename [options]

The WHEN clause would be part of the [options], which
are handled as simply as this in parse_slash_copy():

  /* Collect the rest of the line (COPY options) */
  token = strtokx(NULL, "", NULL, NULL,
  0, false, false, pset.encoding);
  if (token)
  result->after_tofrom = pg_strdup(token);

So unless there's something particular in the WHEN clause
expression that could make this strtokx() invocation error out,
this should work directly.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-11-06 Thread Daniel Verite
Michael Paquier wrote:

> Ordering them in alphabetical order is a good idea due to the high
> number of options available, and more would pile up even if this
> separates a bit "aligned" and "unaligned", so I have have separated
> those diffs from the core patch and committed it, leaving the core
> portion of the patch aside for later.

Here's a rebased version following these changes.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68..98147ef 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -672,6 +672,10 @@ COPY count
 
   
CSV Format
+   
+CSV
+in COPY
+   
 

 This format option is used for importing and exporting the Comma
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index a1ca940..2897486 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -152,6 +152,20 @@ EOF
 
 
 
+  --csv
+  
+  
+  
+   CSV
+   in psql
+  
+  Switches to CSV output mode. This is equivalent
+  to \pset format csv.
+  
+  
+
+
+
   -d dbname
   --dbname=dbname
   
@@ -2557,6 +2571,19 @@ lo_import 152801
   
 
   
+  fieldsep_csv
+  
+  
+  Specifies the field separator to be used in the
+  CSV format. When the separator appears in a field
+  value, that field is output inside double quotes according to
+  CSV rules. To set a tab as field separator, type
+  \pset fieldsep_csv '\t'. The default is a comma.
+  
+  
+  
+
+  
   fieldsep_zero
   
   
@@ -2584,11 +2611,16 @@ lo_import 152801
   format
   
   
-  Sets the output format to one of aligned,
-  asciidoc, html,
+  Sets the output format to one of
+  aligned,
+  asciidoc,
+  csv,
+  html,
   latex (uses tabular),
-  latex-longtable, troff-ms,
-  unaligned, or wrapped.
+  latex-longtable,
+  troff-ms,
+  unaligned,
+  or wrapped.
   Unique abbreviations are allowed.  (That would mean one letter
   is enough.)
   
@@ -2596,14 +2628,27 @@ lo_import 152801
   unaligned format writes all columns of a 
row on one
   line, separated by the currently active field separator. This
   is useful for creating output that might be intended to be read
-  in by other programs (for example, tab-separated or comma-separated
-  format).
+  in by other programs.
   
 
   aligned format is the standard, 
human-readable,
   nicely formatted text output;  this is the default.
   
 
+ csv format writes columns separated by
+ commas, applying the quoting rules described in RFC 4180.
+ Alternative separators can be selected with
+ \pset fieldsep_csv.
+ The output is compatible with the CSV format of the
+ COPY command. The header with column names
+ is output unless the tuples_only parameter is
+ on. Title and footers are not printed.
+ Each row is terminated by the system-dependent end-of-line character,
+ which is typically a single newline (\n) for
+ Unix-like systems or a carriage return and newline sequence
+ (\r\n) for Microsoft Windows.
+ 
+
   wrapped format is like 
aligned but wraps
   wide data values across lines to make the output fit in the target
   column width.  The target width is determined as described under
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 0dea54d..ea064ab 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1941,8 +1941,8 @@ exec_command_pset(PsqlScanState scan_state, bool 
active_branch)
 
int i;
static const char *const my_list[] = {
-   "border", "columns", "expanded", "fieldsep", 
"fieldsep_zero",
-   "footer", "format", "linestyle", "null",
+   "border", "columns", "expanded", "fieldsep", 
"fieldsep_csv",
+   "fieldsep_zero", "footer", "format", 
"linestyle", "null",
"numericlocale", "pager", "pager_min_lines",
"recordsep", "recordsep_zero",
"tableattr", "title", "tuples_only",
@@ -3566,6 +3566,9 @@ _align2string(enum printFormat in)
case PRINT_ASCIIDOC:
return "asciidoc";
break;
+   case PRINT_CSV:
+   return "csv";
+   

Doc patch on psql output formats

2018-11-06 Thread Daniel Verite
  Hi,

psql's documentation has this mention about output formats:
 "Unique abbreviations are allowed. (That would mean one letter is enough.)"

but "one letter is enough" is not true since 9.3 that added
"latex-longtable" sharing the same start as "latex", and then 
9.5 added "asciidoc" with the same first letter as "aligned".

When a non-unique abbreviation is used, psql uses the first
match in an arbitrary order defined in do_pset() by
a cascade of pg_strncasecmp().
(the recent commit add9182e reorders them alphabetically
but it turns out that it doesn't change the relative positions
of  "aligned" / "asciidoc", or "latex" / "latex-longtables"
so \pset format a and \pset format l will continue to
select "aligned" and "latex" as before).

Anyway, "Unique abbreviations are allowed" deserves to
be changed as well.

PFA a doc patch to say simply "Abbreviations are allowed".


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index a1ca940..7dd934f 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2589,8 +2589,7 @@ lo_import 152801
   latex (uses tabular),
   latex-longtable, troff-ms,
   unaligned, or wrapped.
-  Unique abbreviations are allowed.  (That would mean one letter
-  is enough.)
+  Abbreviations are allowed.
   
 
   unaligned format writes all columns of a 
row on one


Re: Doc patch on psql output formats

2018-11-06 Thread Daniel Verite
Tom Lane wrote:

> > When a non-unique abbreviation is used, psql uses the first
> > match in an arbitrary order defined in do_pset() by
> > a cascade of pg_strncasecmp().
> 
> Ugh.  Should we not fix the code so that it complains if there's
> not a unique match?  I would bet that the code was also written
> on the assumption that any abbrevation must be unique.

There's a backward compatibility issue, since a script issuing
  \pset format a
would now fail instead of setting the format to "aligned".
The doc patch is meant to describe the current behavior.

OTOH if we don't fail with non-unique matches, there is the risk that
in the future, a new format matching /^a/ will come alphabetically
before "aligned", and will be picked up instead of "aligned".

In both cases using abbreviations in scripts is not a great
idea. Anyway I will look into the changes required in do_pset to
implement the error on multiple matches, if that's the preferred
behavior.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-11-07 Thread Daniel Verite
Michael Paquier wrote:

> -C could also be useful for other things, say compression.

The patch does not grab any short letter, as the consensus
went that way.

> +   pset.popt.topt.fieldSepCsv = pg_strdup(",");
> Let's store that in a variable instead of hardcoding it.

I guess it could go into a #define in psql/settings.h, along with
these:
#define DEFAULT_FIELD_SEP "|"
#define DEFAULT_RECORD_SEP "\n"

> In the regression tests, "col 9" is wanted with a newline?

Yes, since we support column names with embedded newlines (even
though it's hard to think of a legit use case for that) and CSV fields
support embedded newlines too.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-11-09 Thread Daniel Verite
Michael Paquier wrote:

> +/* check for value being non-empty and with an MB length of 1 */
> +if (*value == '\0' || value[PQmblen(value, pset.encoding)] != '\0')
> 
> It seems to me that this can just be replaced with that:
> if (strlen(value) != 1)

The above is meant to accept a multibyte character as the separator,
in which case strlen(value) would be greater than 1.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Alternative to \copy in psql modelled after \g

2018-11-09 Thread Daniel Verite
  Hi,

Currently \copy cannot span multiple lines (like any meta-command)
and cannot use psql variables whereas \g can do both.

The POC patch attached implements two meta-commands \copyfrom
and \copyto that are to COPY what \g is to any other query:

- they take the COPY query already var-expanded from the query buffer,
which must mention FROM STDIN or TO STDOUT.

- they accept an argument declaring the local data source or destination,
either a filename or a program (|command args) or empty for stdin/stdout.

By contrast \copy has a specific parser to extract the data source
or dest from its line of arguments, plus whether it's a COPY FROM or TO,
and build a COPY query from that.

Examples of use

1. $ psql -v filename="/path/data-$(date -I).csv"
COPY (SELECT *
   FROM table
   WHERE ...)
TO STDOUT (FORMAT csv) \copyto :filename

2. -- copy only the first 100 lines
COPY table FROM stdin \copyfrom |head -n 100 /data/datafile.txt

3. $ cat script.sql
COPY table1 FROM stdin;  -- copy inline data
data line
data line
\.

-- copy data from psql's stdin
COPY table2 FROM stdin \copyfrom 

# copy both in-script and out-of-script data
$ psql -f script.sql < table2.data

Comments? Does that look useful as an alternative to \copy ?


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 0dea54d..7398dec 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -65,6 +65,8 @@ static backslashResult exec_command_cd(PsqlScanState scan_state, bool active_bra
 const char *cmd);
 static backslashResult exec_command_conninfo(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_copy(PsqlScanState scan_state, bool active_branch);
+static backslashResult exec_command_copyfrom(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch);
+static backslashResult exec_command_copyto(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_copyright(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_crosstabview(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_d(PsqlScanState scan_state, bool active_branch,
@@ -305,6 +307,10 @@ exec_command(const char *cmd,
 		status = exec_command_conninfo(scan_state, active_branch);
 	else if (pg_strcasecmp(cmd, "copy") == 0)
 		status = exec_command_copy(scan_state, active_branch);
+	else if (pg_strcasecmp(cmd, "copyfrom") == 0)
+		status = exec_command_copyfrom(query_buf, scan_state, active_branch);
+	else if (pg_strcasecmp(cmd, "copyto") == 0)
+		status = exec_command_copyto(query_buf, scan_state, active_branch);
 	else if (strcmp(cmd, "copyright") == 0)
 		status = exec_command_copyright(scan_state, active_branch);
 	else if (strcmp(cmd, "crosstabview") == 0)
@@ -634,6 +640,53 @@ exec_command_copy(PsqlScanState scan_state, bool active_branch)
 }
 
 /*
+ * \copyfrom -- run a COPY FROM command
+ * The COPY query is obtained from the query buffer
+ * The argument is 'filename' as in \copy
+ */
+static backslashResult
+exec_command_copyfrom(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch)
+{
+	bool		success = true;
+
+	if (active_branch)
+	{
+		char	   *opt = psql_scan_slash_option(scan_state,
+ OT_FILEPIPE, NULL, false);
+
+		success = do_copy_query(query_buf->data, opt, copy_from);
+		resetPQExpBuffer(query_buf);
+		free(opt);
+	}
+	else
+		ignore_slash_filepipe(scan_state);
+
+	return success ? PSQL_CMD_SKIP_LINE : PSQL_CMD_ERROR;
+}
+/*
+ * \copyto -- run a COPY TO command
+ */
+static backslashResult
+exec_command_copyto(PQExpBuffer query_buf, PsqlScanState scan_state, bool active_branch)
+{
+	bool		success = true;
+
+	if (active_branch)
+	{
+		char	   *opt = psql_scan_slash_option(scan_state,
+ OT_FILEPIPE, NULL, false);
+
+		success = do_copy_query(query_buf->data, opt, copy_to);
+		resetPQExpBuffer(query_buf);
+		free(opt);
+	}
+	else
+		ignore_slash_filepipe(scan_state);
+
+	return success ? PSQL_CMD_SKIP_LINE : PSQL_CMD_ERROR;
+}
+
+/*
  * \copyright -- print copyright notice
  */
 static backslashResult
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index 555c633..95c5c5e 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -260,28 +260,11 @@ error:
 }
 
 
-/*
- * Execute a \copy command (frontend copy). We have to open a file (or execute
- * a command), then submit a COPY query to the backend and either feed it data
- * from the file or route its response into the file.
- */
-bool
-do_copy(const char *args)
+static
+FILE*
+open_copy_stream(struct copy_options *options)
 {
-	PQExpBufferData query;
-	FILE	   *copystream;
-	struct copy_options *options;
-	bool		success;
-
-	/* parse options */
-	options = parse_slash_copy(args);
-
-	if (!options)
-		return false;
-
-	/* prepare to read or write the target file */
-	if (options->file && !

Re: csv format for psql

2018-11-09 Thread Daniel Verite
Michael Paquier wrote:

> Still what's the point except complicating the code?  We don't care
> about anything fancy in the backend-side ProcessCopyOptions() when
> checking cstate->delim, and having some consistency looks like a good
> thing to me.

The backend has its reasons that don't apply to the psql output
format, mostly import performance according to [1]
It's not that nobody wants delimiter outside of US-ASCII,
as people do ask for that sometimes:

https://www.postgresql.org/message-id/f02ulk%242r3u%241%40news.hub.org
https://github.com/greenplum-db/gpdb/issues/1246

> However there is no option to specify
> an escape character, no option to specify a quote character, and it is
> not possible to force quotes for all values.  Those are huge advantages
> as any output can be made compatible with other CSV variants.  Isn't
> what is presented too limited? 

The guidelines that the patch has been following are those of RFC 4180 [2]
with two exceptions on the field separator that we can define
and the end of lines that are OS-dependant instead of the fixed CRLF
that IETF seems to see as the norm.

The only reference to escaping in the RFC is:
   "If double-quotes are used to enclose fields, then a double-quote
   appearing inside a field must be escaped by preceding it with
   another double quote"

The problem with using non-standard QUOTE or ESCAPE is that it's a
violation of the format that goes further than choosing a separator
different than comma, which is already a pain point.
We can always add these options later if there is demand. I suspect it
will never happen.

I looked at the 2004 archives when CSV was added to COPY, that's
around commit 862b20b38 in case anyone cares to look, but
I couldn't find a discussion on these options, all I could find is they were
present from the start.

But again COPY is concerned with importing the data that preexists,
even if it's weird, whereas a psql output formats are not.


[1] https://www.postgresql.org/message-id/4C9D2BC5.1080006%40optonline.net
[2] https://tools.ietf.org/html/rfc4180


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Alternative to \copy in psql modelled after \g

2018-11-09 Thread Daniel Verite
David G. Johnston wrote:

> Do I understand correctly that you are proposing a slightly less
> verbose alternative of:
> 
> \o :filename
> COPY TO STDOUT
> \o

Not strictly the same because of this:

\o or \out [ filename ]

  “Query results” includes all tables, command responses, and notices
  obtained from the database server, as well as output of various
  backslash commands that query the database (such as \d); but not
  error messages.

If for instance psql received a notification between the two \o it would
end up in the file, which it wouldn't with \copyto.
The same is true for  SELECT... \g file as opposed to \o file

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Alternative to \copy in psql modelled after \g

2018-11-09 Thread Daniel Verite
David G. Johnston wrote:

> POLA, for me at least, would be for \g [filename] to do exactly what
> you are describing with the \copyto feature.

I admit that if we could improve \g to handle COPY, it would be more
elegant than the current proposal adding two meta-commands.

But the copy-workflow and non-copy-workflow are different, and in
order to know which one to start, \g would need to analyze the query
to determine whether it's a COPY FROM, COPY TO or something else.
psql parses queries syntactically, but not semantically AFAIK, and I
suspect we don't want to start doing that, as it breaks a separation
of concerns.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-11-11 Thread Daniel Verite
Michael Paquier wrote:

> - The experience is confusing, as the psql format uses different options
> than the backend to do the same things:
> -- tuples_only instead of HEADER.
> -- fieldsep_csv instead of DELIMITER
> -- null is an equivalent of the one with the same name, which is
> actually consistent.
> -- encoding is also an equivalent of ENCODING.

That conveniently ignores the fact that "\pset format somename"
as a command doesn't take any additional option, contrary to COPY.
We can't do \pset format csv (delimiter=';')
If we choosed "delimiter" as an option name, it would have to exist
within 20 other names in the \pset namespace and then it would be
too vague, whereas "fieldsep_csv" makes it clear what it applies to.
"tuples_only" is preexisting, and I don't see how the comment that it's
not called "header" could be actionable.

Overall, you seem to posit that we should mimic the entire
COPY TO interface to implement 'psql --csv'.
But the starting point is that 'psql --csv' is just a slightly
different (safer) variant of 'psql -At', which is not concerned
at all with being consistent with COPY.
The model of the csv output format is the unaligned output format,
it's just not COPY.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Doc patch on psql output formats

2018-11-14 Thread Daniel Verite
Tom Lane wrote:

> > but "one letter is enough" is not true since 9.3 that added
> > "latex-longtable" sharing the same start as "latex", and then 
> > 9.5 added "asciidoc" with the same first letter as "aligned".
> 
> Yeah, that text has clearly outstayed its welcome.
> 
> > When a non-unique abbreviation is used, psql uses the first
> > match in an arbitrary order defined in do_pset() by
> > a cascade of pg_strncasecmp().
> 
> Ugh.  Should we not fix the code so that it complains if there's
> not a unique match?  I would bet that the code was also written
> on the assumption that any abbrevation must be unique.

Here's a patch making "\pset format" reject ambiguous abbreviations.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index a1ca940..6e6d0f4 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2589,8 +2589,7 @@ lo_import 152801
   latex (uses tabular),
   latex-longtable, troff-ms,
   unaligned, or wrapped.
-  Unique abbreviations are allowed.  (That would mean one letter
-  is enough.)
+  Unique abbreviations are allowed.
   
 
   unaligned format writes all columns of a 
row on one
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 0dea54d..1f23aaf 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3637,28 +3637,52 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
/* set format */
if (strcmp(param, "format") == 0)
{
+   static const struct fmt
+   {
+   const char *name;
+   enum printFormat number;
+   }   formats[] =
+   {
+   {"aligned", PRINT_ALIGNED},
+   {"asciidoc", PRINT_ASCIIDOC},
+   {"html", PRINT_HTML},
+   {"latex", PRINT_LATEX},
+   {"latex-longtable", PRINT_LATEX_LONGTABLE},
+   {"troff-ms", PRINT_TROFF_MS},
+   {"unaligned", PRINT_UNALIGNED},
+   {"wrapped", PRINT_WRAPPED}
+   };
+
if (!value)
;
-   else if (pg_strncasecmp("aligned", value, vallen) == 0)
-   popt->topt.format = PRINT_ALIGNED;
-   else if (pg_strncasecmp("asciidoc", value, vallen) == 0)
-   popt->topt.format = PRINT_ASCIIDOC;
-   else if (pg_strncasecmp("html", value, vallen) == 0)
-   popt->topt.format = PRINT_HTML;
-   else if (pg_strncasecmp("latex", value, vallen) == 0)
-   popt->topt.format = PRINT_LATEX;
-   else if (pg_strncasecmp("latex-longtable", value, vallen) == 0)
-   popt->topt.format = PRINT_LATEX_LONGTABLE;
-   else if (pg_strncasecmp("troff-ms", value, vallen) == 0)
-   popt->topt.format = PRINT_TROFF_MS;
-   else if (pg_strncasecmp("unaligned", value, vallen) == 0)
-   popt->topt.format = PRINT_UNALIGNED;
-   else if (pg_strncasecmp("wrapped", value, vallen) == 0)
-   popt->topt.format = PRINT_WRAPPED;
else
{
-   psql_error("\\pset: allowed formats are aligned, 
asciidoc, html, latex, latex-longtable, troff-ms, unaligned, wrapped\n");
-   return false;
+   int match_count = 0;
+   int first_match = 0;
+
+   /*
+* Count the number of left-anchored matches. Exactly 
one match
+* must be found, otherwise error out.
+*/
+   for (int i = 0; i < lengthof(formats); i++)
+   {
+   if (pg_strncasecmp(formats[i].name, value, 
vallen) == 0)
+   {
+   if (++match_count > 1)
+   {
+   psql_error("\\pset: ambiguous 
abbreviation: \"%s\"\n", value);
+   return false;
+   }
+   first_match = i;
+   }
+   }
+   if (match_count == 0)
+   {
+   psql_error("\\pset: allowed formats are 
aligned, asciidoc, html, latex, latex-longtable, troff-ms, unaligned, 
wrapped\n");
+   return false;
+   }
+  

Re: Doc patch on psql output formats

2018-11-15 Thread Daniel Verite
Tom Lane wrote:

> Pushed.  (I simplified the code a bit by using just one state variable,
> and also made the error message more verbose.)

Thanks!


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-11-26 Thread Daniel Verite
Tom Lane wrote:

> And, in fact, right now *none* of psql's table output formats is both
> unambiguous and reasonably simple/popular to use.  So the astonishing
> thing about this patch, IMO, is that we didn't do it a decade ago.

Yeah, that's what motivated this submission in the first place.

> I feel that if we allow multi-byte characters here, we might as well
> take the training wheels off and just say you can use any separator
> string you want, as long as it doesn't contain double quote, \r, or \n.

The reason behind disallowing multiple characters was the
likeliness of them being mistakes. For instance, this example came
up at some point in the discussion:
  \pset fieldsep_csv ,,
To me the probability that a user has fat-fingered this is pretty high,
and this would silently produce a really bogus file.

Another kind of mistake comes from the difficulty of properly
quoting on the command line:
psql -- csv -P fieldsep_csv='\t'
would be interpreted as a two-character separator despite being
obviously not the user's intention.

About disallowing characters beyond US-ASCII, I can't find a similar
justification. COPY does not allow them, but it's justified (in the
archives) by the fear of being slower when importing, which is not a
concern here.

> We could avoid this self-inflicted confusion by choosing a different
> parameter name.  I'd be good with "csv_fieldsep" or "csvfieldsep".

+1

> Or we could kill both issues by hard-wiring the separator as ','.

Ideally people would understand that they can use -A for any delimiter
but no quoting, or --csv with strict quoting and in that case a fixed
delimiter is fine, since it's going to be safely quoted, its presence in
the data is not a problem. But I'm not too confident that everyone
would understand it that way, even if it's well explained in the doc.

When one is told "please produce CSV files with semi-colons as
separators", it's simpler to just produce that rather than arguing
that these requirements are probably ill-advised.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: Doc patch on psql output formats

2018-11-26 Thread Daniel Verite
Tom Lane wrote:

> As of HEAD, it's impossible to select latex output format
> at all:
> 
> regression=# \pset format latex
> \pset: ambiguous abbreviation "latex" matches both "latex" and
> "latex-longtable"

Oops!

> We could fix that by adding a special case to accept an exact match
> immediately.

Personally I would favor that one, but to me the problem is that
allowing abbreviations doesn't really work well in the long run,
that is, if new format names are going to appear recurringly in the
future.

In interactive mode, tab-completion seems good enough (maybe \pset was
not tab-completed in the past, when abbreviations were initially
implemented?).

In non-interactive mode, anything abbreviated may clash in the future
with another format, so there's a forward-compatibility hazzard that
should motivate to avoid them as well in scripts.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



  1   2   3   4   >