Re: CREATE DATABASE command for non-libc providers

2025-06-10 Thread Daniel Verite
Jeff Davis wrote: > Even if it's not a collatable type, it should use the database > collation rather than going straight to libc. Again, is that something > that can ever be fixed or are we just stuck with libc semantics for > full text search permanently, even if you initialize the clust

Re: CREATE DATABASE command for non-libc providers

2025-06-06 Thread Daniel Verite
Jeff Davis wrote: > I have attached a patch 0001 that > fixes a misleading hint, but it's still not great. +1 for the patch > When using ICU or the builtin provider, it still requires coming up > with some valid locale name for LC_COLLATE and LC_CTYPE No, since the following invocation

Re: Add Pipelining support in psql

2025-04-05 Thread Daniel Verite
Anthonin Bonnefoy wrote: > 0002: Allows ';' to send a query using extended protocol when within a > pipeline by using PQsendQueryParams It's a nice improvement! > with 0 parameters. It is not > possible to send parameters with extended protocol this way and > everything will be propagate

Re: Add partial :-variable expansion to psql \copy

2025-04-01 Thread Daniel Verite
Christoph Berg wrote: > Perhaps this form could be improved by changing `\copy (select) to file` > to something like `select \gcopy (to file)`. That might make :expansion > in the "select" part easier to handle. In this direction (COPY TO), it was already taken care of by commit 6d3ede5f1

Re: Add Pipelining support in psql

2025-03-19 Thread Daniel Verite
Michael Paquier wrote: > Perhaps an \extended command that behaves outside a pipeline makes > sense to force the use of queries without parameters to use the > extended mode, but I cannot get much excited about the concept knowing > all the meta-commands we have now (not talking about the

Re: Add Pipelining support in psql

2025-03-07 Thread Daniel Verite
Jelte Fennema-Nio wrote: > As an example you can copy paste this tiny script: > > \startpipeline > select pg_sleep(5) \bind \g > \endpipeline > > And then it will show these "extra argument ... ignored" warnings > > \startpipeline: extra argument "select" ignored > \startpipeline: extra

Re: Add Pipelining support in psql

2025-03-05 Thread Daniel Verite
Anthonin Bonnefoy wrote: > So if I understand correctly, you want to automatically convert a > simple query into an extended query when we're within a pipeline. That > would be doable with: > > --- a/src/bin/psql/common.c > +++ b/src/bin/psql/common.c > @@ -1668,7 +1668,16 @@ ExecQueryAnd

Re: Add Pipelining support in psql

2025-03-04 Thread Daniel Verite
Anthonin Bonnefoy wrote: > Another possible option would be to directly send the command without > requiring an additional meta-command, like "SELECT 1 \bind". However, > this would make it more painful to introduce new parameters, plus it > makes the \bind and \bind_named inconsistent as

Re: Add Pipelining support in psql

2025-02-28 Thread Daniel Verite
Anthonin Bonnefoy wrote: > > What is the reasoning here behind this restriction? \gx is a wrapper > > of \g with expanded mode on, but it is also possible to call \g with > > expanded=on, bypassing this restriction. > > The issue is that \gx enables expanded mode for the duration of the

Re: pgbench client-side performance issue on large scripts

2025-02-26 Thread Daniel Verite
Tom Lane wrote: > > I got nerd-sniped by this question and spent some time looking into > > it. Thank you for the patch! LGTM. Best regards, -- Daniel Vérité https://postgresql.verite.pro/

Re: pgbench client-side performance issue on large scripts

2025-02-25 Thread Daniel Verite
Tom Lane wrote: > > I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time, > > out of a total time of 6.7 secs. When run with perf, this profile shows up: > > You ran only a single execution of a 50K-line script? This test > case feels a little bit artificial. Having said

pgbench client-side performance issue on large scripts

2025-02-24 Thread Daniel Verite
Hi, On large scripts, pgbench happens to consume a lot of CPU time. For instance, with a script consisting of 5 "SELECT 1;" I see "pgbench -f 50k-select.sql" taking about 5.8 secs of CPU time, out of a total time of 6.7 secs. When run with perf, this profile shows up: 81,10% pgbench pgben

Re: Inconsistent string comparison using modified ICU collations

2025-01-23 Thread Daniel Verite
Oleg Tselebrovskiy wrote: > I've discovered a bug with string comparison using modified ICU > collations > Using a direct comparison and sorting values gives different results > > The easiest way to reproduce is the following: > > postgres=# create collation "en-US-u-kr-latn-dig

Re: UUID v7

2024-12-16 Thread Daniel Verite
Andrey M. Borodin wrote: > I've addressed all items, except formatting a table... Sorry for not following up sooner. To illustrate my point upthread that was left unaddressed, let's say I have a server with an incorrect date in the future. A session generates an uuid postgres=# select

Re: UUID v7

2024-11-30 Thread Daniel Verite
Andrey M. Borodin wrote: > I'm sending amendments addressing your review as a separate step in patch > set. Step 1 of this patch set is identical to v39. Some comments about the implementation of monotonicity: +/* + * Get the current timestamp with nanosecond precision for UUID generati

Re: New "single" COPY format

2024-11-08 Thread Daniel Verite
Aleksander Alekseev wrote: > IMO it should be 'text' we already have with special options e.g. > DELIMITER AS NULL ESCAPE AS NULL. If there are no escape characters > and column delimiters (and no NULLs designations, and what else I > forgot) then your text file just contains one tuple per

Re: New "raw" COPY format

2024-10-16 Thread Daniel Verite
Joel Jacobson wrote: > However, I thinking rejecting such column data seems like the > better alternative, to ensure data exported with COPY TO > can always be imported back using COPY FROM, > for the same format. On the other hand, that might prevent cases where we want to export, for i

Re: Should CSV parsing be stricter about mid-field quotes?

2024-10-10 Thread Daniel Verite
Joel Jacobson wrote: > - No Headers or Metadata: It's not clear why it's necessary to disable the HEADER option for this format? > The format does not support header rows or end-of-data markers; > every line is treated as data. With COPY FROM STDIN followed by inline data in a script,

Re: Fixing backslash dot for COPY FROM...CSV

2024-10-01 Thread Daniel Verite
Tom Lane wrote: > > [ v6-0001-Support-backslash-dot-on-a-line-by-itself-as-vali.patch ] > > I did some more work on the docs and comments, and pushed that. Thanks! > Returning to my upthread thought that > > >>> I think we should fix it so that \. that's not alone on a line > >>> throw

Re: Fixing backslash dot for COPY FROM...CSV

2024-09-30 Thread Daniel Verite
Artur Zakirov wrote: > I've tested the patch and it seems it works as expected. Thanks for looking at this! > It seems it isn't necessary to handle "\." within > "CopyAttributeOutCSV()" (file "src/backend/commands/copyto.c") > anymore. It's still useful to produce CSV data that can be s

Re: Opinion poll: Sending an automated email to a thread when it gets added to the commitfest

2024-08-17 Thread Daniel Verite
Jelte Fennema-Nio wrote: > I'd like to send an automatic mail to a thread whenever it gets added > to a commitfest Instead of sending a specific mail, what about automatically adding a mail header like: X-CommitFest-Entry: to every outgoing mail

Re: Support LIKE with nondeterministic collations

2024-08-01 Thread Daniel Verite
Jeff Davis wrote: > > col LIKE 'smith%' collate "nd" > > > > is equivalent to: > > > > col >= 'smith' collate "nd" AND col < U&'smith\' collate "nd" > > That logic seems to assume something about the collation. If you have a > collation that orders strings by their sha256 hash,

Re: Fixing backslash dot for COPY FROM...CSV

2024-07-31 Thread Daniel Verite
Sutou Kouhei wrote: > BTW, here is a diff after pgindent: PFA a v5 with the cosmetic changes applied. > I also confirmed that the updated server and non-updated > psql compatibility problem (the end-of-data "\." is > inserted). It seems that it's difficult to solve without > introducing

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-23 Thread Daniel Verite
Tom Lane wrote: > > I don't see how we can get by without some kind of versioning here. > > It's probably too late to do that for v17, > > Why? If we agree that that's the way forward, we could certainly > stick some collversion other than "1" into pg_c_utf8's pg_collation > entry. Ther

Re: Built-in CTYPE provider

2024-07-18 Thread Daniel Verite
Noah Misch wrote: > If I'm counting the votes right, you and Tom have voted that the feature's > current state is okay, and I and Laurenz have voted that it's not okay. I > still hope more people will vote, to avoid dealing with the tie. Daniel, > Peter, and Jeremy, you're all listed as

Re: Built-in CTYPE provider

2024-07-05 Thread Daniel Verite
Noah Misch wrote: > > I don't think the builtin locale provider is any different in this respect > > from the other providers: The locale data might change and there is a > > version mechanism to track that. We don't prevent pg_upgrade in scenarios > > like that for other providers. > >

Re: First draft of PG 17 release notes

2024-05-17 Thread Daniel Verite
Bruce Momjian wrote: > I have committed the first draft of the PG 17 release notes; you can > see the results here: > > https://momjian.us/pgsql_docs/release-17.html About the changes in collations: Create a "builtin" collation provider similar to libc's C locale (Jeff Davis

Re: First draft of PG 17 release notes

2024-05-10 Thread Daniel Verite
Bruce Momjian wrote: > have committed the first draft of the PG 17 release notes; you can > see the results here: > > https://momjian.us/pgsql_docs/release-17.html In the psql items, I'd suggest mentioning https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=90f5178

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Daniel Verite
Peter Eisentraut wrote: > However, off the top of my head, this definition has three flaws: (1) > It would make the single-character wildcard effectively an > any-number-of-characters wildcard, but only in some circumstances, which > could be confusing, (2) it would be difficult to com

Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Daniel Verite
Peter Eisentraut wrote: > Yes, certainly, and there is also no indexing support (other than for > exact matches). The ICU docs have this note about prefix matching: https://unicode-org.github.io/icu/userguide/collation/architecture.html#generating-bounds-for-a-sort-key-prefix-matching

Re: Support LIKE with nondeterministic collations

2024-04-30 Thread Daniel Verite
Peter Eisentraut wrote: > This patch adds support for using LIKE with nondeterministic > collations. So you can do things such as > > col LIKE 'foo%' COLLATE case_insensitive Nice! > The pattern is partitioned into substrings at wildcard characters > (so 'foo%bar' is partitioned i

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Daniel Verite
Alexander Lakhin wrote: > >> Now that ExecQueryUsingCursor() is gone, it's not clear, what does > >> the following comment mean:? > >> * We must turn off gexec_flag to avoid infinite recursion. Note that > >> * this allows ExecQueryUsingCursor to be applied to the individual >

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-08 Thread Daniel Verite
Tom Lane wrote: > I've reconsidered after realizing that implementing FETCH_COUNT > atop traditional single-row mode would require either merging > single-row results into a bigger PGresult or persuading psql's > results-printing code to accept an array of PGresults not just > one. Either

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-06 Thread Daniel Verite
Tom Lane wrote: > This is sufficiently weird that I'm starting to come around to > Daniel's original proposal that we just drop the server's recognition > of \. altogether (which would allow removal of some dozens of lines of > complicated and now known-buggy code) FWIW my plan was to not

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Daniel Verite
Tom Lane wrote: > Not sure what to do here. One idea is to install just the psql-side > fix, which should break nothing now that version-2 protocol is dead, > and then wait a few years before introducing the server-side change. > That seems kind of sad though. Wouldn't backpatching solve

Re: Fixing backslash dot for COPY FROM...CSV

2024-04-05 Thread Daniel Verite
Tom Lane wrote: > I've looked over this patch and I generally agree that this is a > reasonable solution. Thanks for reviewing this! > I'm also wondering why the patch adds a test for > "PQprotocolVersion(conn) >= 3" in handleCopyIn. I've removed this in the attached update. > I concu

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-02 Thread Daniel Verite
Tom Lane wrote: > > I should say that I've noticed significant latency improvements with > > FETCH_COUNT retrieving large resultsets, such that it would benefit > > non-interactive use cases. > > Do you have a theory for why that is? It's pretty counterintuitive > that it would help at a

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-02 Thread Daniel Verite
Tom Lane wrote: > I do not buy that psql's FETCH_COUNT mode is a sufficient reason > to add it. FETCH_COUNT mode is not something you'd use > non-interactively I should say that I've noticed significant latency improvements with FETCH_COUNT retrieving large resultsets, such that it would

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-01 Thread Daniel Verite
Laurenz Albe wrote: > Here is the code review for patch number 2: > +static void > +CloseGOutput(FILE *gfile_fout, bool is_pipe) > > It makes sense to factor out this code. > But shouldn't these functions have a prototype at the beginning of the file? Looking at the other static functio

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-04-01 Thread Daniel Verite
Laurenz Albe wrote: > I had a look at patch 0001 (0002 will follow). Thanks for reviewing this! I've implemented the suggested doc changes. A patch update will follow with the next part of the review. > > --- a/src/interfaces/libpq/fe-exec.c > > +++ b/src/interfaces/libpq/fe-exec.c > >

Re: Built-in CTYPE provider

2024-03-27 Thread Daniel Verite
Jeff Davis wrote: > The tests include initcap('123abc') which is '123abc' in the PG_C_UTF8 > collation vs '123Abc' in PG_UNICODE_FAST. > > The reason for the latter behavior is that the Unicode Default Case > Conversion algorithm for toTitlecase() advances to the next Cased > character be

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-02-12 Thread Daniel Verite
Jakub Wartak wrote: > when I run with default pager (more or less): > \set FETCH_COUNT 1000 > WITH data AS (SELECT generate_series(1, 2000) as Total) select > repeat('a',100) || data.Total || repeat('b', 800) as total_pat from > data; > -- it enters pager, a skip couple of pages and t

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-01-30 Thread Daniel Verite
vignesh C wrote: > patching file src/interfaces/libpq/exports.txt > Hunk #1 FAILED at 191. > 1 out of 1 hunk FAILED -- saving rejects to file > src/interfaces/libpq/exports.txt.rej > > Please post an updated version for the same. PFA a rebased version. Best regards, -- Daniel Vérité h

Re: Fixing backslash dot for COPY FROM...CSV

2024-01-24 Thread Daniel Verite
Robert Haas wrote: > Those links unfortunately seem not to be entirely specific to this > issue. Other, related things seem to be discussed there, and it's not > obvious that everyone agrees on what to do, or really that anyone > agrees on what to do. The best link that I found for this ex

Re: Built-in CTYPE provider

2024-01-18 Thread Daniel Verite
Peter Eisentraut wrote: > > If the Postgres default was bytewise sorting+locale-agnostic > > ctype functions directly derived from Unicode data files, > > as opposed to libc/$LANG at initdb time, the main > > annoyance would be that "ORDER BY textcol" would no > > longer be the human-favor

Re: Fixing backslash dot for COPY FROM...CSV

2024-01-16 Thread Daniel Verite
Robert Haas wrote: > Part of my hesitancy, I suppose, is that I don't > understand why we even have this strange convention of making \. > terminate the input in the first place -- I mean, why wouldn't that be > done in some kind of out-of-band way, rather than including a special > marker

Re: Built-in CTYPE provider

2024-01-15 Thread Daniel Verite
Jeff Davis wrote: > New version attached. [v16] Concerning the target category_test, it produces failures with versions of ICU with Unicode < 15. The first one I see with Ubuntu 22.04 (ICU 70.1) is: category_test: Postgres Unicode version:15.1 category_test: ICU Unicode version:

Re: Built-in CTYPE provider

2024-01-12 Thread Daniel Verite
Jeff Davis wrote: > > Jeremy also raised a problem with old versions of psql connecting to > > a > > new server: the \l and \dO won't work. Not sure exactly what to do > > there, but I could work around it by adding a new field rather than > > renaming (though that's not ideal). > > I did

Re: Built-in CTYPE provider

2024-01-10 Thread Daniel Verite
Jeff Davis wrote: > Attached a more complete version that fixes a few bugs [v15 patch] When selecting the builtin provider with initdb, I'm getting the following setup: $ bin/initdb --locale=C.UTF-8 --locale-provider=builtin -D/tmp/pgdata The database cluster will be initialized wit

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-01-02 Thread Daniel Verite
Hi, PFA a rebased version. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite From cd0fe1d517a0e31e031fbbea1e603a715c77ea97 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= Date: Tue, 2 Jan 2024 14:15:48 +0100 Subject: [PATCH v5 1/2] Imple

Re: Fixing backslash dot for COPY FROM...CSV

2023-12-31 Thread Daniel Verite
Hi, The CI patch tester fails on this patch, because it has a label at the end of a C block, which I'm learning is a C23 feature that happens to be supported by gcc 11 [1], but is not portable. PFA an update fixing this, plus removing an obsolete chunk in the COPY documentation that v2 left out

Re: Built-in CTYPE provider

2023-12-22 Thread Daniel Verite
Robert Haas wrote: > For someone who is currently defaulting to es_ES.utf8 or fr_FR.utf8, > a change to C.utf8 would be a much bigger problem, I would > think. Their alphabet isn't in code point order, and so things would > be alphabetized wrongly. > That might be OK if they don't care ab

Re: Fixing backslash dot for COPY FROM...CSV

2023-12-21 Thread Daniel Verite
vignesh C wrote: > Thanks for the updated patch, any reason why this is handled only in csv. > postgres=# copy test1 from '/home/vignesh/postgres/inst/bin/copy1.out'; > COPY 1 > postgres=# select * from test1; > c1 > --- > line1 > (1 row) I believe it's safer to not change anything t

Re: Built-in CTYPE provider

2023-12-20 Thread Daniel Verite
Jeff Davis wrote: > But there are a lot of users for whom neither of those things are true, > and it makes zero sense to order all of the text indexes in the > database according to any one particular locale. I think these users > would prioritize stability and performance for the databas

Re: Fixing backslash dot for COPY FROM...CSV

2023-12-19 Thread Daniel Verite
vignesh C wrote: > I noticed that these tests are passing without applying patch too: > +insert into copytest2(test) values('line1'), ('\.'), ('line2'); > +copy (select test from copytest2 order by test collate "C") to :'filename' > csv; > +-- get the data back in with copy > +truncate co

Fixing backslash dot for COPY FROM...CSV

2023-12-18 Thread Daniel Verite
Hi, PFA a patch that attempts to fix the bug that \. on a line by itself is handled incorrectly by COPY FROM ... CSV. This issue has been discussed several times previously, for instance in [1] and [2], and mentioned in the doc for \copy in commit 42d3125. There's one case that works today: whe

Re: Built-in CTYPE provider

2023-12-13 Thread Daniel Verite
Jeff Davis wrote: > While "full" case mapping sounds more complex, there are actually > very few cases to consider and they are covered in another (small) > data file. That data file covers ~100 code points that convert to > multiple code points when the case changes (e.g. "ß" -> "SS"), 7

Re: Emitting JSON to file using COPY TO

2023-12-08 Thread Daniel Verite
Dave Cramer wrote: > > This argument for leaving 3 as the column count makes sense to me. I > > agree this content is not meant to facilitate interpreting the contents at > > a protocol level. > > > > I'd disagree. From my POV if the data comes back as a JSON Array this is > one object a

Re: Emitting JSON to file using COPY TO

2023-12-08 Thread Daniel Verite
Joe Conway wrote: > copyto_json.007.diff When the source has json fields with non-significant line feeds, the COPY output has these line feeds too, which makes the output incompatible with rule #2 at https://jsonlines.org ("2. Each Line is a Valid JSON Value"). create table j(f json);

Re: Emitting JSON to file using COPY TO

2023-12-07 Thread Daniel Verite
Joe Conway wrote: > The attached should fix the CopyOut response to say one column. I.e. it > ought to look something like: Spending more time with the doc I came to the opinion that in this bit of the protocol, in CopyOutResponse (B) ... Int16 The number of columns in the data to be cop

Re: Emitting JSON to file using COPY TO

2023-12-06 Thread Daniel Verite
Andrew Dunstan wrote: > IMNSHO, we should produce either a single JSON > document (the ARRAY case) or a series of JSON documents, one per row > (the LINES case). "COPY Operations" in the doc says: " The backend sends a CopyOutResponse message to the frontend, followed by zero or mor

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-06 Thread Daniel Verite
Sutou Kouhei wrote: > * 2022-04: Apache Arrow [2] > * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3] > > (FYI: I want to add support for Apache Arrow.) > > There were discussions how to add support for more formats. [3][4] > In these discussions, we got a consensus about making

Re: EXCLUDE COLLATE in CREATE/ALTER TABLE document

2023-12-01 Thread Daniel Verite
shihao zhong wrote: > Thanks for your comments, a new version is attached. In this hunk: @@ -1097,8 +1097,8 @@ WITH ( MODULUS numeric_literal, REM method index_method. The operators are required to be commutative. Each exclude_element - can optionally specify an

Re: proposal: change behavior on collation version mismatch

2023-11-28 Thread Daniel Verite
Jeremy Schneider wrote: > 1) "collation changes are uncommon" (which is relatively correct) > 2) "most users would rather have ease-of-use than 100% safety, since > it's uncommon" > > And I think this led to the current behavior of issuing a warning rather > than an error, There's a tech

Re: [ psql - review request ] review request for \d+ tablename, \d+ indexname indenting

2023-11-22 Thread Daniel Verite
Shlok Kyal wrote: > > The error was corrected and a new diff file was created. > > The diff file was created based on 16 RC1. > > We confirmed that 5 places where errors occurred when performing > > make check were changed to ok. Reviewing the patch, I see these two problems in the curren

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-11-20 Thread Daniel Verite
Hi, Here's a new version to improve the performance of FETCH_COUNT and extend the cases when it can be used. Patch 0001 adds a new mode in libpq to allow the app to retrieve larger chunks of results than the single row of the row-by-row mode. The maximum number of rows per PGresult is set by the

Re: Does UCS_BASIC have the right CTYPE?

2023-10-26 Thread Daniel Verite
Peter Eisentraut wrote: > > That seems to suggest the standard answer should be 'Á' regardless of > > any COLLATE clause (though I could be misreading). I'm a bit confused > > by that... what's the standard-compatible way to specify the locale for > > UPPER()/LOWER()? If there is none, the

Re: Pre-proposal: unicode normalized text

2023-10-17 Thread Daniel Verite
Jeff Davis wrote: > I believe the patch has utility as-is, but I've been brainstorming a > few more ideas that could build on it: > > * Add a per-database option to enforce only storing assigned unicode > code points. There's a problem in the fact that the set of assigned code points is

Re: EBCDIC sorting as a use case for ICU rules

2023-08-30 Thread Daniel Verite
Peter Eisentraut wrote: > Committed with some editing. I moved the existing rules example from > the CREATE COLLATION page into the new section you created, so we have a > simple example followed by the complex example. OK, thanks for pushing this! Best regards, -- Daniel Vérité htt

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-07-07 Thread Daniel Verite
Tom Lane wrote: > This gives me several "-Wincompatible-pointer-types" warnings > [...] > I think what you probably ought to do to avoid all that is to change > the arguments of PrintQueryResult and nearby routines to be "const > PGresult *result" not just "PGresult *result". The const-ne

Re: pg_collation.collversion for C.UTF-8

2023-06-21 Thread Daniel Verite
Thomas Munro wrote: > What could we do that would be helpful here, without affecting users > of the "true" C.UTF-8 for the rest of time? This is a Debian (+ > downstream distro) only problem as far as we know so far, and only > for Debian 11 and older. It seems to include RedHat-based di

EBCDIC sorting as a use case for ICU rules

2023-06-21 Thread Daniel Verite
Hi, In the "Order changes in PG16 since ICU introduction" discussion, one sub-thread [1] was about having a credible use case for tailoring collations with custom rules, a new feature in v16. At a conference this week I was asked if ICU could be able to sort like EBCDIC [2]. It turns out it has

Re: Order changes in PG16 since ICU introduction

2023-06-12 Thread Daniel Verite
Jeff Davis wrote: > I guess where I'm confused is: why would a user actually want their > database collation to be C.UTF-8? It's slower than C, our > implementation doesn't properly version it (as you pointed out), and > the semantics don't seem great ('Z' < 'a'). Because when LC_CTYPE=C,

Re: Order changes in PG16 since ICU introduction

2023-06-09 Thread Daniel Verite
Jeff Davis wrote: > I implemented a compromise where initdb will > change C.UTF-8 to the built-in provider This handling of C.UTF-8 would be felt by users as simply broken. With the v10 patches: $ initdb --locale=C.UTF-8 initdb: using locale provider "builtin" for ICU locale "C.UTF-

Re: Inconsistent results with libc sorting on Windows

2023-06-09 Thread Daniel Verite
Juan José Santamaría Flecha wrote: > Just to make sure we are all seeing the same problem, does the attached > patch fix your test? The problem of the random changes in sorting disappears for all libc locales in pg_collation, so this is very promising. However it persists for the default

Re: Order changes in PG16 since ICU introduction

2023-06-08 Thread Daniel Verite
Jeff Davis wrote: > As I replied in that subthread, that creates a worse problem: if you > only change the provider when the locale is C, then what about when the > locale is *not* C? > > export LANG=en_US.UTF-8 > initdb -D data --locale=fr_FR.UTF-8 > ... >provider:icu >ICU

Re: Order changes in PG16 since ICU introduction

2023-06-08 Thread Daniel Verite
Tatsuo Ishii wrote: > >> Yes it's a special case but when doing initdb --locale=C, a user does > >> not need or want an ICU locale. They want the same thing than what v15 > >> does with the same arguments: a template0 database with > >> datlocprovider='c', datcollate='C', datctype='C', dat

Re: Order changes in PG16 since ICU introduction

2023-06-07 Thread Daniel Verite
Jeff Davis wrote: > The locale "C" is a special case, documented as a non-locale. So, if > LOCALE/--locale apply to ICU, then either ICU needs to handle locale > "C" in the expected way (v8 patch series); or when we see locale "C" we > need to somehow change the provider into something tha

Re: pg_collation.collversion for C.UTF-8

2023-06-07 Thread Daniel Verite
I wrote: > Consider matching '\d' in a regexp. With C.UTF-8 (glibc-2.35), we > only match ASCII characters 0-9, or 10 codepoints. With > "en-US-u-va-posix-x-icu" we match 660 codepoints comprising all the > digit characters in all languages, plus a bunch of variants for > mathematical sym

Re: pg_collation.collversion for C.UTF-8

2023-06-07 Thread Daniel Verite
Jeff Davis wrote: > What about ICU? How should provider=icu locale=C.UTF-8 behave? We > could: > > a. Just pass it to the provider and see what happens (older versions of > ICU would interpret it as en-US-u-va-posix; newer versions would give > the root locale). > > b. Consistently inter

Re: Inconsistent results with libc sorting on Windows

2023-06-07 Thread Daniel Verite
Thomas Munro wrote: > > > Also, it does not occur at all if parallel scan is disabled. > > > > Could this be a clue that it is failing to be transitive? > > That vaguely rang a bell for me... and then I remembered this thread: > > https://www.postgresql.org/message-id/flat/2019120606340

Re: Order changes in PG16 since ICU introduction

2023-06-06 Thread Daniel Verite
Jeff Davis wrote: > New patch series attached. I plan to commit 0001 and 0002 soon, unless > there are objections. > > 0001 causes the "C" and "POSIX" locales to be treated with > memcmp/pg_ascii semantics in ICU, just like in libc. We also > considered a new "none" provider, but it's mor

Inconsistent results with libc sorting on Windows

2023-06-05 Thread Daniel Verite
Hi, While trying pg16beta1 libc collations on Windows, I noticed that UTF-8 text sorts sometimes differently across invocations with the same locales, which is wrong since these collations are deterministic. The OS is Windows 10 Home, version 10.0.19045 Build 19045, self-built 16beta1 with VS Com

Re: pg_collation.collversion for C.UTF-8

2023-06-05 Thread Daniel Verite
Jeff Davis wrote: > > For libc: this change may affect any user who happened to have > > LANG=C.UTF-8 in their environment at initdb time, which is probably a > > lot of users, and some buildfarm members. However, the average risk > > seems to be much lower, because we've gone a long tim

Simplify pg_collation.collversion for Windows libc

2023-06-05 Thread Daniel Verite
Hi, Currently the libc collation version for Windows has two components coming from the NLSVERSIONINFOEX structure [1] dwNLSVersion and dwDefinedVersion So we get version numbers looking like this (with 16 beta1): postgres=# select collversion,count(*) from pg_collation group by collversion; c

Re: Order changes in PG16 since ICU introduction

2023-05-26 Thread Daniel Verite
Jeff Davis wrote: > > #1 > > > > postgres=# create database test1 locale='fr_FR.UTF-8'; > > NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8" > > ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of > > I don't see a problem here. If you specify LOCALE to

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-22 Thread Daniel Verite
Kirk Wolak wrote: > We do NOT do "CSV", we mimic pg_dump. pg_dump uses the text format (as opposed to csv), where \. on a line by itself cannot appear in the data, so there's no problem. The problem is limited to the csv format. Best regards, -- Daniel Vérité https://postgresql.verite.

Re: Order changes in PG16 since ICU introduction

2023-05-22 Thread Daniel Verite
Jeff Davis wrote: > If we special case locale=C, but do nothing for locale=fr_FR, then I'm > not sure we've solved the problem. Andrew Gierth raised the issue here, > which he called "maximally confusing": > > https://postgr.es/m/874jp9f5jo@news-spur.riddles.org.uk > > That's why I f

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-22 Thread Daniel Verite
Joel Jacobson wrote: > Is there a valid reason why \. is needed for COPY FROM filename? > It seems to me it would only be necessary for the COPY FROM STDIN case, > since files have a natural end-of-file and a known file size. Looking at CopyReadLineText() over at [1], I don't see a reason

Re: Order changes in PG16 since ICU introduction

2023-05-19 Thread Daniel Verite
Jeff Davis wrote: > 2) Automatically change the provider to libc when locale=C. > > Almost works, but it's not clear how we handle the case "provider=icu > lc_collate='fr_FR.utf8' locale=C". > > If we change it to "provider=libc lc_collate=C", we've overridden the > specified lc_collate.

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-19 Thread Daniel Verite
Joel Jacobson wrote: > I understand its necessity for STDIN, given that the end of input needs to > be explicitly defined. > However, for files, we have a known file size and the end-of-file can be > detected without the need for special markers. > > Also, is the difference in how server-

Re: Should CSV parsing be stricter about mid-field quotes?

2023-05-18 Thread Daniel Verite
Joel Jacobson wrote: > I've been using that trick myself many times in the past, but thanks to this > deep-dive into this topic, it looks to me like TEXT would be a better format > fit when dealing with unquoted TSV files, or? > > OTOH, one would then need to inspect the TSV file doesn't

Re: Order changes in PG16 since ICU introduction

2023-04-27 Thread Daniel Verite
Jeff Davis wrote: > Attached are a few small patches: > > 0001: don't convert C to en-US-u-va-posix > 0002: handle locale C the same regardless of the provider, as you > suggest above > 0003: make LOCALE (or --locale) apply to everything including ICU Testing this briefly I noticed

Re: Add standard collation UNICODE

2023-04-27 Thread Daniel Verite
Peter Eisentraut wrote: > COLLATE UNICODE > > instead of > > COLLATE "und-x-icu" > > or whatever it is, is pretty useful. > > So, attached is a small patch to add this. This collation has an empty pg_collation.collversion column, instead of being set to the same value as "un

Re: Order changes in PG16 since ICU introduction

2023-04-25 Thread Daniel Verite
Jeff Davis wrote: > > (I'm not sure whether those operations can get redirected to ICU > > today > > or whether they still always go to libc, but we'll surely want to fix > > it eventually if the latter is still true.) > > Those operations do get redirected to ICU today. FTR the full te

Re: pg_collation.collversion for C.UTF-8

2023-04-22 Thread Daniel Verite
Thomas Munro wrote: > It looks like for technical reasons > inside glibc, that couldn't be done before 2.35: > > https://sourceware.org/bugzilla/show_bug.cgi?id=17318 > > That strengthens my opinion that C.UTF-8 (the real C.UTF-8 supplied > by the glibc project) isn't supposed to be vers

pg_collation.collversion for C.UTF-8

2023-04-18 Thread Daniel Verite
Hi, get_collation_actual_version() in pg_locale.c currently excludes C.UTF-8 (and more generally C.*) from versioning, which makes pg_collation.collversion being empty for these collations. char * get_collation_actual_version(char collprovider, const char *collcollate) { if (collpr

Re: TAP tests for psql \g piped into program

2023-03-29 Thread Daniel Verite
Peter Eisentraut wrote: > So for your patch, I would just do the path adjustment ad hoc in-line. > It's just one additional line. Here's the patch updated that way. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite diff --git a/src/bin/psql/t/001_bas

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-03-01 Thread Daniel Verite
I wrote: > Here's a POC patch implementing row-by-row fetching. PFA an updated patch. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index f907f5d4e8..ad5e8a5de9 100644 --- a/src/bin/psql/common.

Re: Allow tailoring of ICU locales with custom rules

2023-02-20 Thread Daniel Verite
Peter Eisentraut wrote: [patch v5] Two quick comments: - pg_dump support need to be added for CREATE COLLATION / DATABASE - there doesn't seem to be a way to add rules to template1. If someone wants to have icu rules and initial contents to their new databases, I think they need to crea

  1   2   3   4   >