Re: constant crashing

2024-04-15 Thread Francisco Olarte
Jack:

On Sun, 14 Apr 2024 at 22:18, jack  wrote:
> The CSV files are being produced by another system, a WIndows app on a 
> Windows machine. I then copy them to a USB key and copy them onto the ubuntu 
> machine. The data is then imported via the COPY command.
>
> COPY master (field01,field02..fieldX) FROM '/data/file.text' DELIMITER E'\t'
> The fields are tab delimited.

Then call them tab-delimited, not CSV. It may be important. You have a
tendency of sending your interpretation of information which hampers
people trying to help you.

As an example, only about three messages above did we see the "signal
11" in the server logs which indicates a probable bug. Signal 11 is
normally due to stray pointers or corrupted memory ( although faulty
ram can lead to it too ), which hints at a bug, not a resource
problem.

> But importing the data works. I can get all the data into a single table 
> without any problems. The issue is only when I start to update the single 
> table. And that is why I started using smaller temporary tables for each CSV 
> file, to do the updates in the smaller tables before I move them all to a 
> single large table.
> After all the data is loaded and updated, I run php programs on the large 
> table to generate reports. All of which works well EXCEPT for performing the 
> updates on the data. And I do not want to use perl or any outside tool. I 
> want it all one in SQL because I am required to document all my steps so that 
> someone else can take over, so everything needs to be as simple as possible.

I suggested perl ( or any similar thing, php should be equally fine )
as it normally makes your importing / debugging much easier / lighter.

On the documentation side, you can have the data and some
transformations in a big sql file, but having some steps in a php file
will have them documented too. But if you want to do text processing
in SQL, go ahead, the only problems are going to be making it harder
to debug and probably harder to document. Now people know the signal
11 stuff and the problem can probably be tracked.

Francisco Olarte.




efficiency random values / sequential ID values in indexes

2024-04-15 Thread Sanjay Minni
Hi

Is there any appreciable difference in using random values or sequential
values in indexes

in a multi tenanted application there is a choice that the single field
ID's value is totally random / UUID or the numbers are created with a
prefix of the tenant. Since all access will be within a tenant only, will
it make any performance difference between using purely random values vs
+.

thanks
Sanjay


Re: Failing streaming replication on PostgreSQL 14

2024-04-15 Thread Ron Johnson
On Mon, Apr 15, 2024 at 2:53 AM Nicolas Seinlet  wrote:

> Hello everyone,
>
> Since I moved some clusters from PostgreSQL 12 to 14, I noticed random
> failures in streaming replication. I say "random" mostly because I haven't
> got the source of the issue.
>
> I'm using the Ubuntu/cyphered ZFS/PostgreSQL combination. I'm using Ubuntu
> LTS (20.04 22.04) and provided ZFS/PostgreSQL with LTS (PostgreSQL 12 on
> Ubuntu 20.04 and 14 on 22.04).
>
> The streaming replication of PostgreSQL is configured with
> `primary_conninfo 'host=main_server port=5432 user=replicant
> password=a_very_secure_password sslmode=require
> application_name=replication_postgresql_app' ` , no replication slot nor
> restore command, and the wal is configured with `full_page_writes = off
> wal_init_zero = off wal_recycle = off`
>
> If this works like a charm on PostgreSQL 12, it's sometimes failing with
> PostgreSQL 14. As we also changed the OS, maybe the issue relies somewhere
> else.
>
> When the issue is detected, the WAL on the primary is correct. A piece of
> the WAL is wrong on the secondary. Only some bytes. Some bytes later, the
> wal is again correct. Stopping PostgreSQL on the secondary, removing the
> wrong WAL file, and restarting PostgreSQL solves the issue.
>
> We've added another secondary and noticed the issue can appear on one of
> the secondaries, not both at the same time.
>
> What can I do to detect the origin of this issue?
>

1. Minor version number?
2. Using replication_slots?
3. Error message(s)?


Re: efficiency random values / sequential ID values in indexes

2024-04-15 Thread Ron Johnson
On Mon, Apr 15, 2024 at 6:05 AM Sanjay Minni  wrote:

> Hi
>
> Is there any appreciable difference in using random values or sequential
> values in indexes
>
> in a multi tenanted application there is a choice that the single field
> ID's value is totally random / UUID or the numbers are created with a
> prefix of the tenant. Since all access will be within a tenant only, will
> it make any performance difference between using purely random values vs
> +.
>

Two benefits of +:

1. In a non-partitioned table, it gives your index "locality of data": all
of customer X's record pointers are in *This* subtree.  Makes buffers more
efficient when a customer runs reports.  Bonus points if you then regularly
CLUSTER using that table.
2. Makes table partitioning by  much easier.  That also
enhances locality of data.

Just make sure that the field ID is BIGINT...


Re: Failing streaming replication on PostgreSQL 14

2024-04-15 Thread Nicolas Seinlet

On Monday, April 15th, 2024 at 14:36, Ron Johnson  
wrote:

> On Mon, Apr 15, 2024 at 2:53 AM Nicolas Seinlet  wrote:
> 

> > Hello everyone,
> > 

> > Since I moved some clusters from PostgreSQL 12 to 14, I noticed random 
> > failures in streaming replication. I say "random" mostly because I haven't 
> > got the source of the issue.
> > 

> > I'm using the Ubuntu/cyphered ZFS/PostgreSQL combination. I'm using Ubuntu 
> > LTS (20.04 22.04) and provided ZFS/PostgreSQL with LTS (PostgreSQL 12 on 
> > Ubuntu 20.04 and 14 on 22.04).
> > 

> > The streaming replication of PostgreSQL is configured with 
> > `primary_conninfo 'host=main_server port=5432 user=replicant 
> > password=a_very_secure_password sslmode=require 
> > application_name=replication_postgresql_app' ` , no replication slot nor 
> > restore command, and the wal is configured with `full_page_writes = off 
> > wal_init_zero = off wal_recycle = off`
> > 

> > If this works like a charm on PostgreSQL 12, it's sometimes failing with 
> > PostgreSQL 14. As we also changed the OS, maybe the issue relies somewhere 
> > else.
> > 

> > When the issue is detected, the WAL on the primary is correct. A piece of 
> > the WAL is wrong on the secondary. Only some bytes. Some bytes later, the 
> > wal is again correct. Stopping PostgreSQL on the secondary, removing the 
> > wrong WAL file, and restarting PostgreSQL solves the issue.
> > 

> > We've added another secondary and noticed the issue can appear on one of 
> > the secondaries, not both at the same time.
> > 

> > What can I do to detect the origin of this issue?
> 

> 

> 1. Minor version number?
> 2. Using replication_slots?
> 3. Error message(s)?
> 

> 


Hi,


1.  PostgreSQL 14.11
2.  No. no replication slot nor restore command. As we've understood the 
replication slot, it's a mechanism to keep on the primary side everything 
needed for the secondary to recover. Will this make the primary acknowledge 
that the secondary received the good wal file?
3.  incorrect resource manager data checksum

Looking at the WAL files with xxd gives the following diff:

The bad one:
006c9160: 0a6e 7514 5030 2e31 0e35 016c 0f07 0009 2f62 6568 6100 7669 6f72 3a6e 
6f72 be6d  .nu.P0.1.5.l/beha.vior:nor.m
006c9180:               
   
006c91a0:               
   
006c91c0: 437a 4263 7500 7273 6f72 3a70 6f69 0302 4503 9023 3237 3665 3720 323b 
223e 5527  CzBcu.rsor:poi..E..#276e7 2;">U'

The good one contains the same 1st and 4th lines, but the 2nd and 3rd lines 
contain the correct values, as if a packet was missed.

Thanks for helping,

Nicolas.


signature.asc
Description: OpenPGP digital signature


re: constant crashing hardware issue and thank you

2024-04-15 Thread jack
It seems that the hardware may in fact be the issue.

I ran the exact same process during the night on a much slower machine (4 core 
Xeon E5-1620 v4 32G Ram) and it worked fine. It did not crash and completed all 
the updates on 113 million records successfully.

I will be bringing the i9 in for a checkup to find the problem.

Thank you all for your help with this issue.

No error message/code for commands after libpq pipeline abortion

2024-04-15 Thread Jan Behrens
Hello,

I am writing a Lua client library for PostgreSQL using libpq that
supports pipelining in an async/effect-based context.

I understand that when an error occurs, all subsequent commands also
fail (where PQgetResult returns PGRES_PIPELINE_ABORTED).

However, only the first command has an error message
via PQresultErrorMessage and an error code via
PQresultErrorField(..., PG_DIAG_SQLSTATE) set.

All other, subsequently obtained results lack an error message or an
SQLSTATE. I would expect something like "ERROR:  command failed due to
previous failure in pipeline", but I don't get anything like that.

Why is that? Is it intentional? Is it documented somewhere?

Further question: How do I deal with it with regard to the user of my
library? Should I make up my own error message? Which error code could
I set? I didn't find any appropriate error code in Appendix A of the
documentation.

https://www.postgresql.org/docs/16/errcodes-appendix.html

My current approach is to just set "*" as an error code, which doesn't
feel good:

https://github.com/JanBeh/neumond/blob/9889c5fc19dbb23dff483e4a1cb2e4ef3ab74085/pgeff.c#L332-L345

Maybe I'm just overlooking something?

Kind regards,
Jan Behrens




Re: (When) can a single SQL statement return multiple result sets?

2024-04-15 Thread Peter Eisentraut

On 11.04.24 01:02, Tom Lane wrote:

And if not, why can't I write a stored procedure
or function that returns multiple result sets?


[ shrug... ] Lack of round tuits, perhaps.  We don't have any
mechanism today whereby a stored procedure could say "please ship
this resultset off to the client, but I want to continue afterwards".
But you can do that in other RDBMSes and probably somebody will be
motivated to make it possible in Postgres.


The development of this feature was the subject of this thread: 
https://www.postgresql.org/message-id/flat/6e747f98-835f-2e05-cde5-86ee444a7...@2ndquadrant.com


But it has not concluded successfully yet.





Re: (When) can a single SQL statement return multiple result sets?

2024-04-15 Thread Jan Behrens
On Wed, 10 Apr 2024 19:02:48 -0400
Tom Lane  wrote:

> Jan Behrens  writes:
> > While writing a PostgreSQL client library for Lua supporting
> > Pipelining (using PQsendQueryParams), I have been wondering if there
> > are any single SQL commands that return multiple result sets.
> 
> Right now, I don't think so.  I believe the current protocol design
> intends to support that, and I think this may trace back to some
> ancient idea at Berkeley that if you select from an inheritance
> hierarchy where the child tables aren't all alike, you should be
> able to see all the child data, which'd require changing tuple
> descriptors midstream.  But our current interpretation of SQL
> SELECT forbids that.

I thought multiple result sets are supported for commands like PQexec,
where "Multiple queries sent in a single PQexec call" are explictly
supported, and which then return multiple result set. This, however,
doesn't apply to pipelining because PQexec is not available in
pipelining mode.

> 
> > Here, "DELETE FROM magic" returns multiple result sets, even though it
> > is only a single SQL statement.
> 
> Right, so it's kind of a case that you have to support.  We're not
> likely to rip out rules anytime soon, even if they're a bit
> deprecated.

As it seems to be a corner case that rarely occurs in practice, I was
considering to simply not support this case in my client library. I
don't know which SQL error code I could return in that case though.
Maybe "0A000" (feature_not_supported) or
"21000" (cardinality_violation). Not sure if either of those is a good
choice. Any better idea?

> 
> > The case outlined above seems to be a somewhat special case. I haven't
> > found any other way to return multiple results (other than sending
> > several semicolon-separated statements, which is not supported by
> > PQsendQueryParams). So is there any (other) case where I reasonably
> > should expect several result sets returned by PQgetResult (before
> > PQgetResult returns NULL)? Wouldn't it make sense to disallow such
> > behavior altogether?
> 
> No.  For one thing, there's too much overlap between what you're
> suggesting and pipelined queries.

To which question was "no" the answer to. I'm not sure if I understand.

> 
>   regards, tom lane
> 

Regards,
Jan Behrens




Re: Failing streaming replication on PostgreSQL 14

2024-04-15 Thread Alvaro Herrera
On 2024-Apr-15, Nicolas Seinlet wrote:

> I'm using the Ubuntu/cyphered ZFS/PostgreSQL combination. I'm using
> Ubuntu LTS (20.04 22.04) and provided ZFS/PostgreSQL with LTS
> (PostgreSQL 12 on Ubuntu 20.04 and 14 on 22.04).

What exactly is "cyphered ZFS"?  Can you reproduce the problem with some
other filesystem?  If it's something very unusual, it might well be a
bug in the filesystem.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)




Re: (When) can a single SQL statement return multiple result sets?

2024-04-15 Thread Merlin Moncure
On Mon, Apr 15, 2024 at 10:24 AM Jan Behrens 
wrote:

> On Wed, 10 Apr 2024 19:02:48 -0400
> Tom Lane  wrote:
> >
> > > Here, "DELETE FROM magic" returns multiple result sets, even though it
> > > is only a single SQL statement.
> >
> > Right, so it's kind of a case that you have to support.  We're not
> > likely to rip out rules anytime soon, even if they're a bit
> > deprecated.
>
> As it seems to be a corner case that rarely occurs in practice, I was
> considering to simply not support this case in my client library. I
> don't know which SQL error code I could return in that case though.
> Maybe "0A000" (feature_not_supported) or
> "21000" (cardinality_violation). Not sure if either of those is a good
> choice. Any better idea?


If you are asking if "rules" can be ignored or error-walled in terms of
your library design, I'd say yes.  100% yes.

The main caveat would then be the proposed multi-resultset stored procedure
feature, which might break the 'one result per semicolon' assumption you
might be chasing as it has some basis in the standard, so I'd be balancing
risk/reward against that feature IMO if I were you.

merlin