After DB upgrade from PG13 to PG15 showing error

2024-08-27 Thread Ram Pratap Maurya
Dear Team,

We have upgraded PostgreSQL DB from version 13 to 15 version .
We are facing issue in PG15 we not enable any parameter related to AUTOVACUUM , 
but still running on PG15 data base.

Below process showing :

2313192 | 02:10:01.283176 || myLava  | active | autovacuum: 
VACUUM ANALYZE public.tstock_movement (to prevent wraparound)

And one more issue we are facing after upgrade lot of alert is coming in DB log 
file , please suggest this is bug in Postgresql-15 version.

2024-08-26 00:00:36.783 IST [702937] WARNING:  oldest xmin is far in the past
2024-08-26 00:00:36.783 IST [702937] HINT:  Close open transactions soon to 
avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, 
or drop stale replication slots.
2024-08-26 00:00:36.784 IST [702937] WARNING:  oldest xmin is far in the past
2024-08-26 00:00:36.784 IST [702937] HINT:  Close open transactions soon to 
avoid wraparound problems.
You might also need to commit or roll back old prepared transactions, 
or drop stale replication slots.

DB  OS -RHEL8.


Regards,
Ram Pratap.


Re: After DB upgrade from PG13 to PG15 showing error

2024-08-27 Thread Kashif Zeeshan
Hi

On Tue, Aug 27, 2024 at 1:50 PM Ram Pratap Maurya <
ram.mau...@lavainternational.in> wrote:

> Dear Team,
>
>
>
> We have upgraded PostgreSQL DB from version 13 to 15 version .
>
> We are facing issue in PG15 we not enable any parameter related to
> AUTOVACUUM , but still running on PG15 data base.
>
>
>
> Below process showing :
>
>
>
> *2313192 | 02:10:01.283176 || myLava  | active |
> autovacuum: VACUUM ANALYZE public.tstock_movement (to prevent wraparound)*
>
>
>
> And one more issue we are facing after upgrade lot of alert is coming in
> DB log file , please suggest this is bug in Postgresql-15 version.
>
>
>
> *2024-08-26 00:00:36.783 IST [702937] WARNING:  oldest xmin is far in the
> past*
>
> *2024-08-26 00:00:36.783 IST [702937] HINT:  Close open transactions soon
> to avoid wraparound problems.*
>
> *You might also need to commit or roll back old prepared
> transactions, or drop stale replication slots.*
>
> *2024-08-26 00:00:36.784 IST [702937] WARNING:  oldest xmin is far in the
> past*
>
> *2024-08-26 00:00:36.784 IST [702937] HINT:  Close open transactions soon
> to avoid wraparound problems.*
>
> *You might also need to commit or roll back old prepared
> transactions, or drop stale replication slots.*
>

These are warnings not error, can you please share the error you are
getting.

Thanks
Kashif Zeeshan

>
>
> *DB  OS -RHEL8.*
>
>
>
>
>
> Regards,
>
> Ram Pratap.
>


Analytics on PostgresQL Advisory

2024-08-27 Thread Shyam Duraiswami
Hi,

We’re developing tools for Verifiable Analytics on shielded sensitive data in 
data ecosystems. While we’re still in the early stages and not selling anything 
yet, our goal is to build something truly valuable.

We’d love to hear how you use Postgres databases with sensitive data to:

• Perform analytics internally or with external partners while maintaining 
strong privacy.

• Integrate strong privacy features (queries on shielded data) into your 
products.

• Collaborate securely with untrusted partners or networks.

If you’re involved in any of these activities, we’d love to chat and learn more 
about your processes. Please feel free to choose a time that works for you.

https://calendly.com/shyamdurai/45?back=1&month=2024-08 ( 
https://calendly.com/shyamdurai/45?back=1&month=2024-08 )

Best

Shyam

Shyam Duraiswami
Co Founder and CEO at Provably ( http://www.provably.ai )

Linkedin ( https://www.linkedin.com/in/shyam-durai/ )

Setup Meeting with Shyam ( https://calendly.com/shyamdurai/ )

Sent via Superhuman ( https://sprh.mn/?vip=sh...@provably.ai )

Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

2024-08-27 Thread David Rowley
On Tue, 27 Aug 2024 at 18:00, Justin Clift  wrote:
> As a general thought, seeing that this might be an actual problem
> should some kind of automated testing be added that checks for
> performance regressions like this?

We normally try to catch these sorts of things with regression tests.
Of course, that requires having a test that would catch a particular
problem, which we don't seem to have for this particular case.  A
performance test would also require testing a particular scenario, so
I don't see why that's better.  A regression test is better suited as
there's no middle ground between pass and fail.

David




Re: Using PQsocketPoll() for PIPELINE mode

2024-08-27 Thread Dominique Devienne
On Wed, Aug 14, 2024 at 2:50 PM Dominique Devienne  wrote:
> Hi. I've now used successfully the new PQsocketPoll() API
> in the context of waiting for notifications, using beta2 and 3.
>
> But now I'm looking into using it in the context of PIPELINE mode.
> Where I suppose both forRead and forWrite are 1, but the return
> code only indicates whether the condition is met. The doc says nothing
> about OR or AND semantic, when both forRead and forWrite are true.
>
> Perhaps it's deemed obvious from the use of select() or poll()?
> Or is one supposed to call it once with forRead=forWrite=1 and
> a timeout, then call it again twice with just one forFlag set and
> a 0 timeout, to know the "details" about which "side" is ready?
>
> Or hasn't this use case been considered for PQsocketPoll(),
> and thus the current return code isn't has precise as it could be?
>
> Thanks for any precisions, --DD

Hi. No answers. Was it wrong timing (vacations) or
is something wrong with my questions? Thanks, --DD




Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

2024-08-27 Thread David Rowley
On Tue, 27 Aug 2024 at 14:03, Tom Lane  wrote:
> Yeah, I got that same result by bisecting.  It seems like it's
> somehow related to the cast to information_schema.sql_identifier:
> we are able to get rid of that normally but seem to fail to do so
> in this query.

In case it saves you a bit of time, I stripped as much of the
unrelated stuff out as I could and got:

create table t (a name, b int);
explain select * from (select a::varchar,b from (select distinct a,b
from t) st) t right join t t2 on t.b=t2.b where t.a='test';

getting rid of the cast or swapping to INNER JOIN rather than RIGHT
JOIN means that qual_is_pushdown_safe() gets a Var rather than a
PlaceHolderVar.

David




Re: Using PQsocketPoll() for PIPELINE mode

2024-08-27 Thread Dominique Devienne
On Tue, Aug 27, 2024 at 12:23 PM Dominique Devienne  wrote:
> On Wed, Aug 14, 2024 at 2:50 PM Dominique Devienne  
> wrote:
> > Hi. I've now used successfully the new PQsocketPoll() API
> > in the context of waiting for notifications, using beta2 and 3.
> >
> > But now I'm looking into using it in the context of PIPELINE mode.
> > Where I suppose both forRead and forWrite are 1, but the return
> > code only indicates whether the condition is met. The doc says nothing
> > about OR or AND semantic, when both forRead and forWrite are true.
> >
> > Perhaps it's deemed obvious from the use of select() or poll()?
> > Or is one supposed to call it once with forRead=forWrite=1 and
> > a timeout, then call it again twice with just one forFlag set and
> > a 0 timeout, to know the "details" about which "side" is ready?
> >
> > Or hasn't this use case been considered for PQsocketPoll(),
> > and thus the current return code isn't has precise as it could be?
> >
> > Thanks for any precisions, --DD
>
> Hi. No answers. Was it wrong timing (vacations) or
> is something wrong with my questions? Thanks, --DD

Looking at https://doxygen.postgresql.org/libpq__pipeline_8c_source.html,
it does indeed seem like `select()` can inform about ready forRead and forWrite
independently, and thus that the current signature of PQsocketPoll()
is not ideal
to be used in the context of pipeline mode, which would be a pity, for
a new API.

I get that the original thinking about PQsocketPoll() was neither for
notifications,
not for pipeline mode, but shouldn't it be? That same example should ideally be
writable in terms of PQsocketPoll(), using a few syscalls, no?

Once again, this is late, although my original questions are now 2 weeks old.
After all, PQsocketPoll() has not been released yet officially. Thanks, --DD

```
fd_set input_mask;
fd_set output_mask;
int sock = PQsocket(conn); [...]
if (select(sock + 1, &input_mask, &output_mask, NULL, NULL) < 0) { error }
// Process any results, so we keep the server's output buffer free
// flowing and it can continue to process input
if (FD_ISSET(sock, &input_mask)) {
PQconsumeInput(conn); [...]
}
// Write more rows and/or the end pipeline message, if needed
if (FD_ISSET(sock, &output_mask)) {
PQflush(conn); [...]
}
```




Re: Using PQsocketPoll() for PIPELINE mode

2024-08-27 Thread Greg Sabino Mullane
On Tue, Aug 27, 2024 at 9:20 AM Dominique Devienne 
wrote:

> Once again, this is late, although my original questions are now 2 weeks
> old.
> After all, PQsocketPoll() has not been released yet officially. Thanks,
> --DD
>

As this is so new, you might have better luck on -hackers than here. I've
not poked at it enough to give an answer yet.

Cheers,
Greg


Re: After DB upgrade from PG13 to PG15 showing error

2024-08-27 Thread Adrian Klaver

On 8/27/24 01:50, Ram Pratap Maurya wrote:

Dear Team,

We have upgraded PostgreSQL DB from version 13 to 15 version .

We are facing issue in PG15 we not enable any parameter related to 
AUTOVACUUM , but still running on PG15 data base.


You need to provide more detail about the above.

What parameters are you trying to change?

Do you get an error?

Did you reload/restart the server after the changes?

What user did you do the changes as?



*2024-08-26 00:00:36.783 IST [702937] WARNING:  oldest xmin is far in 
the past*


*2024-08-26 00:00:36.783 IST [702937] HINT:  Close open transactions 
soon to avoid wraparound problems.*


*    You might also need to commit or roll back old prepared 
transactions, or drop stale replication slots.*


I would say the above is pointing you at possible solutions.

To understand why this is important:

https://www.postgresql.org/docs/16/routine-vacuuming.html

Read the following sections:

25.1.5. Preventing Transaction ID Wraparound Failures

25.1.6. The Autovacuum Daemon



Regards,

Ram Pratap.



--
Adrian Klaver
adrian.kla...@aklaver.com





Strange behaviors with ranges

2024-08-27 Thread Jean-Christophe Boggio

Hello,

I have 2 very confusing behaviors when using ranges.

It all started with this query:

    WITH rangespaliers AS (
    SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM 
paliers JOIN tmp_limitcontrats USING(idcontrat)
--    SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM 
paliers WHERE idcontrat=1003

    )
    ,rangespaliers2 AS (
    select *
    FROM rangespaliers
    WHERE rangep <> NUMRANGE(NULL, NULL) -- ERROR IS HERE
    )
    select * from rangespaliers2;

When I run this query, I get the error "Range lower bound must be less 
than or equal to range upper bound".


(a) If I comment out the line marked "ERROR IS HERE", I don't have an 
error (but I'm missing the filter of course).


(b) Also, if I uncomment line 3 and comment out line 2, I get the 
correct behavior. Very strange thing is that tmp_limitcontrats has only 
one row which contains "idcontrat=1003".


Now, in that table "paliers", the line for idcontrat=1003 has value NULL 
for both qtep1 and qtep2. So the final behavior should be an empty 
result set.


You can reproduce the problem using the attached file:

CREATE TABLE paliers (
    idpalier integer NOT NULL,
    idcontrat integer NOT NULL,
    isdefault boolean NOT NULL,
    name character varying(30),
    qtep1 integer,
    qtep2 integer,
    qtep3 integer,
    qtep4 integer,
    qtep5 integer,
    qtep6 integer,
    qtep7 integer,
    qtep8 integer,
    qtep9 integer,
    qtep10 integer,
    qpp1 double precision,
    qpp2 double precision,
    qpp3 double precision,
    qpp4 double precision,
    qpp5 double precision,
    qpp6 double precision,
    qpp7 double precision,
    qpp8 double precision,
    qpp9 double precision,
    qpp10 double precision,
    idpalier_clonedfrom integer,
    assessonamounts boolean DEFAULT false,
    amountp1 numeric(15,2),
    amountp2 numeric(15,2),
    amountp3 numeric(15,2),
    amountp4 numeric(15,2),
    amountp5 numeric(15,2),
    amountp6 numeric(15,2),
    amountp7 numeric(15,2),
    amountp8 numeric(15,2),
    amountp9 numeric(15,2),
    tauxmini numeric(5,2)
);

copy 
paliers(idpalier,idcontrat,isdefault,name,qtep1,qtep2,qtep3,qtep4,qtep5,qtep6,qtep7,qtep8,qtep9,qtep10,qpp1,qpp2,qpp3,qpp4,qpp5,qpp6,qpp7,qpp8,qpp9,qpp10,idpalier_clonedfrom,assessonamounts,amountp1,amountp2,amountp3,amountp4,amountp5,amountp6,amountp7,amountp8,amountp9,tauxmini)

from '/tmp/paliers.csv'
delimiter ','
csv header;

DROP TABLE IF EXISTS tmp_limitcontrats;
CREATE TABLE tmp_limitcontrats AS
    SELECT 1003 AS idcontrat;

WITH rangespaliers AS (
  SELECT numrange(qtep1+1,qtep2) as rangep FROM paliers JOIN 
tmp_limitcontrats USING(idcontrat)

)
, rangespaliers2 AS (
  SELECT rangep, numrange(null,null)
  FROM rangespaliers
  WHERE rangep <> NUMRANGE(null,null)
)
select * from rangespaliers2;

This fails on PG 16.4 and 15.7

Thanks a lot for your enlightenment.

idpalier,idcontrat,isdefault,name,qtep1,qtep2,qtep3,qtep4,qtep5,qtep6,qtep7,qtep8,qtep9,qtep10,qpp1,qpp2,qpp3,qpp4,qpp5,qpp6,qpp7,qpp8,qpp9,qpp10,idpalier_clonedfrom,assessonamounts,amountp1,amountp2,amountp3,amountp4,amountp5,amountp6,amountp7,amountp8,amountp9,tauxmini
268,217,t,default,,,1,,,f,,
395,573,t,default,5,10,12,13,15,f,,
697,916,t,default,5,10,16,17,18656,f,,
698,916,f,Digital,10,,18,20,,f,,
701,918,t,default,,,3,,612,f,,
702,919,t,default,5,10,11,12,14697,f,,
704,920,t,default,5,10,11,12,14702,f,,
705,921,t,default,,,2,,43,f,,
709,925,t,default,,,2,,568,f,,1.00
710,926,t,default,,,6.5,,668,f,,
711,927,t,default,,,7,,710,f,,
712,928,t,default,,,2,,711,f,,
713,929,t,default,,,10,,711,f,,
714,930,t,default,,,6,,713,f,,
715,931,t,default,,,3,,714,f,,
716,932,t,default,,,8,,713,f,,
717,933,t,default,,,1,,716,f,,
718,934,t,default,,,1,,717,f,,
719,935,t,default,,,1.5,,713,f,,
720,936,t,default,,,15,,713,f,,
721,937,t,default,,,1,,719,f,,
722,938,t,default,,,9,,716,f,,
723,939,t,default,,,2,,722,f,,
724,940,t,default,,,8,,722,f,,
725,941,t,default,,,1,,724,f,,
726,942,t,default,,,1,,725,f,,
727,943,t,default,,,8,,722,f,,
728,944,t,default,,,3,,727,f,,
729,945,t,default,,,8,,722,f,,
730,946,t,default,,,1,,729,f,,
731,947,t,default,,,1,,730,f,,
737,953,t,default,,,12,,668,f

Re: Strange behaviors with ranges

2024-08-27 Thread Adrian Klaver




On 8/27/24 10:29 AM, Jean-Christophe Boggio wrote:

Hello,

I have 2 very confusing behaviors when using ranges.

It all started with this query:

     WITH rangespaliers AS (
     SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM 
paliers JOIN tmp_limitcontrats USING(idcontrat)
--    SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM 
paliers WHERE idcontrat=1003

     )
     ,rangespaliers2 AS (
     select *
     FROM rangespaliers
     WHERE rangep <> NUMRANGE(NULL, NULL) -- ERROR IS HERE
     )
     select * from rangespaliers2;

When I run this query, I get the error "Range lower bound must be less 
than or equal to range upper bound".


(a) If I comment out the line marked "ERROR IS HERE", I don't have an 
error (but I'm missing the filter of course).


(b) Also, if I uncomment line 3 and comment out line 2, I get the 
correct behavior. Very strange thing is that tmp_limitcontrats has only 
one row which contains "idcontrat=1003".


What does:

SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers WHERE 
idcontrat=1003


return?




This fails on PG 16.4 and 15.7

Thanks a lot for your enlightenment.



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Strange behaviors with ranges

2024-08-27 Thread Jean-Christophe Boggio




What does:

SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers 
WHERE idcontrat=1003


return?


It returns:

(,)

(as expected)






Re: Strange behaviors with ranges

2024-08-27 Thread Torsten Förtsch
I guess this query comes back non-empty:

SELECT * FROM paliers JOIN tmp_limitcontrats USING(idcontrat) WHERE
qtep1 >= qtep2

This would then lead somewhere to this expression numrange(3,2)

Check out idpalier=805


On Tue, Aug 27, 2024 at 7:37 PM Adrian Klaver 
wrote:

>
>
> On 8/27/24 10:29 AM, Jean-Christophe Boggio wrote:
> > Hello,
> >
> > I have 2 very confusing behaviors when using ranges.
> >
> > It all started with this query:
> >
> >  WITH rangespaliers AS (
> >  SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM
> > paliers JOIN tmp_limitcontrats USING(idcontrat)
> > --SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM
> > paliers WHERE idcontrat=1003
> >  )
> >  ,rangespaliers2 AS (
> >  select *
> >  FROM rangespaliers
> >  WHERE rangep <> NUMRANGE(NULL, NULL) -- ERROR IS HERE
> >  )
> >  select * from rangespaliers2;
> >
> > When I run this query, I get the error "Range lower bound must be less
> > than or equal to range upper bound".
> >
> > (a) If I comment out the line marked "ERROR IS HERE", I don't have an
> > error (but I'm missing the filter of course).
> >
> > (b) Also, if I uncomment line 3 and comment out line 2, I get the
> > correct behavior. Very strange thing is that tmp_limitcontrats has only
> > one row which contains "idcontrat=1003".
>
> What does:
>
> SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers WHERE
> idcontrat=1003
>
> return?
>
>
> >
> > This fails on PG 16.4 and 15.7
> >
> > Thanks a lot for your enlightenment.
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>


Re: Strange behaviors with ranges

2024-08-27 Thread Paul Jungwirth

On 8/27/24 10:29, Jean-Christophe Boggio wrote:

I have 2 very confusing behaviors when using ranges.

It all started with this query:

     WITH rangespaliers AS (
     SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers JOIN tmp_limitcontrats 
USING(idcontrat)

--    SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers 
WHERE idcontrat=1003
     )
     ,rangespaliers2 AS (
     select *
     FROM rangespaliers
     WHERE rangep <> NUMRANGE(NULL, NULL) -- ERROR IS HERE
     )
     select * from rangespaliers2;

When I run this query, I get the error "Range lower bound must be less than or equal to range upper 
bound".


(a) If I comment out the line marked "ERROR IS HERE", I don't have an error (but I'm missing the 
filter of course).


(b) Also, if I uncomment line 3 and comment out line 2, I get the correct behavior. Very strange 
thing is that tmp_limitcontrats has only one row which contains "idcontrat=1003".


The issue is the order-of-operations used by the planner. If I put EXPLAIN on 
your last query, I see:

 Hash Join  (cost=16.64..109.90 rows=2410 width=64)
   Hash Cond: (tmp_limitcontrats.idcontrat = paliers.idcontrat)
   ->  Seq Scan on tmp_limitcontrats  (cost=0.00..35.50 rows=2550 width=4)
   ->  Hash  (cost=14.27..14.27 rows=189 width=12)
 ->  Seq Scan on paliers  (cost=0.00..14.27 rows=189 width=12)
   Filter: (numrange(((qtep1 + 1))::numeric, (qtep2)::numeric) <> 
'(,)'::numrange)

So we are applying that filter to every row in paliers, not just the one with 
idcontrat = 1003.

Indeed this simplified version also fails:

SELECT numrange(qtep1+1,qtep2) as rangep FROM paliers;

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: Strange behaviors with ranges

2024-08-27 Thread Jean-Christophe BOGGIO

Le 27/08/2024 à 19:51, Torsten Förtsch a écrit :

I guess this query comes back non-empty:

SELECT * FROM paliers JOIN tmp_limitcontrats USING(idcontrat) WHERE 
qtep1 >= qtep2


Yes, it is empty if I keep tmp_limitcontrats to idcontrat=1003

Otherwise, you are right, there are irregular data but not that I'm 
concerned with in that particular case.




Re: Strange behaviors with ranges

2024-08-27 Thread Adrian Klaver




On 8/27/24 11:13 AM, Jean-Christophe BOGGIO wrote:

Le 27/08/2024 à 19:51, Torsten Förtsch a écrit :

I guess this query comes back non-empty:

SELECT * FROM paliers JOIN tmp_limitcontrats USING(idcontrat) WHERE 
qtep1 >= qtep2


Yes, it is empty if I keep tmp_limitcontrats to idcontrat=1003

Otherwise, you are right, there are irregular data but not that I'm 
concerned with in that particular case.




?:
SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers JOIN 
tmp_limitcontrats USING(idcontrat) where qtep1+1 < qtep2


--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Strange behaviors with ranges

2024-08-27 Thread Adrian Klaver




On 8/27/24 11:16 AM, Adrian Klaver wrote:



On 8/27/24 11:13 AM, Jean-Christophe BOGGIO wrote:

Le 27/08/2024 à 19:51, Torsten Förtsch a écrit :

I guess this query comes back non-empty:

SELECT * FROM paliers JOIN tmp_limitcontrats USING(idcontrat) WHERE 
qtep1 >= qtep2


Yes, it is empty if I keep tmp_limitcontrats to idcontrat=1003

Otherwise, you are right, there are irregular data but not that I'm 
concerned with in that particular case.




?:
SELECT numrange( qtep1+1   , qtep2,  '[]') AS rangep FROM paliers JOIN 
tmp_limitcontrats USING(idcontrat) where qtep1+1 < qtep2




Actually that should be:

qtep1+1 <= qtep2

--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Strange behaviors with ranges

2024-08-27 Thread Jean-Christophe BOGGIO

Paul,

Le 27/08/2024 à 20:11, Paul Jungwirth a écrit :
The issue is the order-of-operations used by the planner. If I put 
EXPLAIN on your last query, I see:


 Hash Join  (cost=16.64..109.90 rows=2410 width=64)
   Hash Cond: (tmp_limitcontrats.idcontrat = paliers.idcontrat)
   ->  Seq Scan on tmp_limitcontrats  (cost=0.00..35.50 rows=2550 
width=4)

   ->  Hash  (cost=14.27..14.27 rows=189 width=12)
 ->  Seq Scan on paliers  (cost=0.00..14.27 rows=189 width=12)
   Filter: (numrange(((qtep1 + 1))::numeric, 
(qtep2)::numeric) <> '(,)'::numrange)


So we are applying that filter to every row in paliers, not just the 
one with idcontrat = 1003.


I understand, makes perfect sense. Thanks for the explanation. Have a 
nice day,







tsvector limitations - why and how

2024-08-27 Thread Stanislav Kozlovski
Hey!

I just spent some time extensively documenting a draft e-mail I planned to send 
over here to report what I thought was a bug.

I was playing around with tsvector and full text search for a pet project of 
mine, ingesting books' content.

I was aware of the limitations of 
FTS and tried 
to ensure I didn't hit any - but what I missed was that the maximum allowed 
lexeme position was 16383 and everything above silently gets set to 16383. I 
was searching for a phrase (two words) at the end of the book and couldn't find 
it. After debugging I realized that my phrase's lexemes were being set to 
16383, which was inaccurate.

I acknowledge that this limitation still sets the single lexeme and makes it 
searchable. I commend the authors for its elegant approach of still providing 
functionality beyond the limitations that save space.

The problem I had is that it breaks FOLLOWED BY queries, essentially stopping 
you from being able to match on phrases (more than one word) on large text.

I have a few questions here:


  1.
Why is this still the case?

The earliest I could see this limitation in place was from PG 
8.3, released 
16 years ago (2008). Hardware, and in particular memory and storage, have gone 
a long way since.
Do these limits still make sense?

I saw a 
thread
 from 6 years ago (2018) which asked a similar question, but it was focused on 
hitting the 1MB limit of the tsvector.
There, a member suggested that you'd probably need a whole new type (e.g 
bigtsvector) to make this work. Somebody else shared a project called 
tsvector2 which was a beefed-up 
version.
There was also a 
patch
 that attempted to remove the 1MB limit, but the contributor had it stall and 
never got it merged. I'm not aware if that would have fixed the positional 
limitation.


  1.
Are there any other drawbacks that I'm missing?

My understanding of the 16383 position limit is the following:

  *
Lexemes present beyond the 16383 character mark will still match search 
queries, because they'll be represented as 'lexeme: 16383'
  *
Phrase searches for lexemes beyond the 16383 character mark won't ever find 
anything

Is there anything else I'm missing?

If no - I'd be happy to contribute this piece of text to the docs - 
particularly in the limitations part, as I believe this would help people 
reading it.


  1.
What is the preferred/conventional way to store large text?

If I want to store a whole books' content - like PostgreSQL: Up and Running 
(2012) - I'd need to store it over 30 rows. (it's 300 pages long, 300-page 
books average about 82500 words, English words average about 6.5-4 characters, 
meaning a tsvector will hold the positions of no more than [2520-3277] words).

30 rows sounds fair, but there are two complications:

  *
Splitting the chapters in a logical way is tricky
  *
FOLLOWED BY queries between the rows will not work. This is fine when it's 
separated logically, e.g like a chapter, but not if it's separated within a 
chapter. The example book has 10 chapters, so it'd average around 3 rows a 
chapter.

Are there any unofficial larger tsvector types? Am I doing anything wrong?

Thanks for reading this. I'd appreciate any response.

Best,
Stanislav


Re: tsvector limitations - why and how

2024-08-27 Thread Tom Lane
Stanislav Kozlovski  writes:
> I was aware of the limitations of 
> FTS and tried 
> to ensure I didn't hit any - but what I missed was that the maximum allowed 
> lexeme position was 16383 and everything above silently gets set to 16383. I 
> was searching for a phrase (two words) at the end of the book and couldn't 
> find it. After debugging I realized that my phrase's lexemes were being set 
> to 16383, which was inaccurate.
> ...
> The problem I had is that it breaks FOLLOWED BY queries, essentially stopping 
> you from being able to match on phrases (more than one word) on large text.

Yeah.  FOLLOWED BY didn't exist when the tsvector storage
representation was designed, so the possible inaccuracy of the
lexeme positions wasn't such a big deal.

> Why is this still the case?

Because nobody's done the significant amount of work needed to make
it better.  I think an acceptable patch would have to support both
the current tsvector representation and a "big" version that's able
to handle anything up to the 1GB varlena limit.  (If you were hoping
for documents bigger than that, you'd be needing a couple more
orders of magnitude worth of work.)  We might also find that there
are performance bottlenecks that'd have to be improved, but even just
making the code cope with two representations would be a big patch.

There has been some cursory talk about this, I think, but I don't
believe anyone's actually worked on it since the 2017 patch you
mentioned.  I'm not sure if that patch is worth using as the basis
for a fresh try: it looks like it had some performance issues, and
AFAICS it didn't really improve the lexeme-position limit.

(Wanders away wondering if the expanded-datum infrastructure could
be exploited here...)

regards, tom lane




Ghost data from failed FDW transactions?

2024-08-27 Thread Jacob Biesinger
Hi there!

We have a setup where, for compliance reasons, we hoist a portion of data
from several "tenant" databases into a "root" / common / untenanted DB.
Through the magic of postgres_fdw, row triggers, and distributed
transactions, we automatically hoist the needed columns into the untenanted
DB whenever there are changes in any of the tenant DBs. The root DB is
never written to outside of these triggers, and act as a sort of
cross-tenant index.

I'm scratching my head at a few rows in the root DB, where it seems the
corresponding tenant transaction rolled back, but the root DB transaction
committed -- there is no row in the tenant but the root DOES have one. I
don't have a smoking gun just yet, but this is not the first time we've
seen this issue.

Before I jump into particulars, does this sound like expected behavior? We
run SERIALIZABLE txn level everywhere (set at the cluster level).


Thanks so much in advance for any insights here!


Here's my setup:

## info + cluster-level flags
GCP cloudsql
postgres version: 15.7
default_transaction_isolation: serializable


## The tenant DB:

CREATE TABLE
"devices" (
"orgId" TEXT NOT NULL,
"patientId" TEXT NOT NULL,
"deviceId" TEXT NOT NULL,
"data" JSONB NOT NULL,
PRIMARY KEY ("orgId", "patientId", "deviceId")
);

## The root DB:

CREATE TABLE IF NOT EXISTS "devices" (
"orgId" TEXT NOT NULL,
"patientId" TEXT NOT NULL,
"deviceId" TEXT NOT NULL,
"data" JSONB NOT NULL,
"serialNumber" TEXT NOT NULL GENERATED ALWAYS AS ("data"->>'serialNumber')
STORED,
"status" TEXT NOT NULL GENERATED ALWAYS AS ("data"->>'status') STORED,
PRIMARY KEY ("orgId", "patientId", "deviceId"),

CONSTRAINT "deviceIdMatches" CHECK ("data"->>'id' = "deviceId"),
CONSTRAINT "patientIdMatches" CHECK ("data"->>'patientId' = "patientId"),

-- Prevent duplicate serial numbers that are simultaneously "active"
EXCLUDE ( "serialNumber" WITH = ) WHERE ("status" = 'active')
);


## FDW connection from tenant DB to the root:

CREATE EXTENSION IF NOT EXISTS postgres_fdw;

CREATE SERVER IF NOT EXISTS "fdw_server__root" FOREIGN DATA WRAPPER
postgres_fdw OPTIONS (
host '${instance-ip-address}',
dbname '${root-db}',
updatable 'true',
truncatable 'false',
keep_connections 'off'
);

CREATE USER MAPPING FOR "${remote-user}" SERVER "fdw_server__root" OPTIONS(
user '${remote-user}', password '$${PGPASSWORD}'
);

CREATE SCHEMA IF NOT EXISTS "rootDb";

GRANT USAGE ON SCHEMA "rootDb" TO "${user-name}";
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA "rootDb" TO
"${user-name}";
ALTER DEFAULT PRIVILEGES IN SCHEMA "rootDb" GRANT SELECT, INSERT, UPDATE,
DELETE ON TABLES TO "${user-name}";


IMPORT FOREIGN SCHEMA "public" LIMIT TO (devices)
FROM SERVER "fdw_server__root"
INTO "rootDb";


## Trigger setup on the tenant DB, hoisting rows when modified into the
root table:

-- Set up a trigger which hoists tenant devices into the rootDb
CREATE OR REPLACE FUNCTION hoist_devices() RETURNS TRIGGER
AS $hoist_devices$
BEGIN
IF (TG_OP IN ('UPDATE', 'DELETE')) THEN
DELETE FROM "rootDb"."devices"
WHERE "orgId" = OLD."orgId"
AND "patientId" = OLD."patientId"
AND "deviceId" = OLD."deviceId";
END IF;

IF (TG_OP IN ('INSERT', 'UPDATE')) THEN
INSERT INTO "rootDb"."devices" ("orgId", "patientId", "deviceId", "data")
SELECT
NEW."orgId",
NEW."patientId",
NEW."deviceId",
NEW."data";
END IF;

RETURN NEW;
END;
$hoist_devices$ LANGUAGE plpgsql;

CREATE OR REPLACE TRIGGER hoist_devices_insert_update_delete
AFTER INSERT OR UPDATE OR DELETE ON "devices"
FOR EACH ROW
EXECUTE FUNCTION hoist_devices();



## A particular endpoint attempts insertions like:

BEGIN;

INSERT INTO "devices" ("orgId", "patientId", "deviceId", "data")
VALUES (
'org1',
'patient1',
'device1',
'{"id": "device1", "patientId": "patient1", "serialNumber": "12345",
"status": "active" }'
);

COMMIT;





--
Jake Biesinger


Pgbackrest specifying the default DB necessary/correct way ?

2024-08-27 Thread KK CHN
List,

I have configured the pgbackrest on a live DB server  and  a Repo Server. (
EPAS 16, RHEL 9.4 and Pgbackrest 2.52.1 )

On DB Server I have
##
[root@db1 ~]# cd
[root@db1 ~]# cat /etc/pgbackrest/pgbackrest.conf
[Demo_Repo]
pg1-path=/data/edb/as16/data
pg1-port=5444
pg1-user=enterprisedb
pg-version-force=16
pg1-database=edb*##   Query 1. *

[global]
repo1-host=10.255.0.40
repo1-host-user=postgres
archive-async=y
spool-path=/var/spool/pgbackrest
log-level-console=info
log-level-file=debug
delta=y

[global:archive-get]
process-max=2

[global:archive-push]
process-max=4
##

*## Query 1:  *

  In this DB serverI have other databases  than the default  "edb"
database. Specifying the above line aspg1-database=edb   // I am
not sure this line is necessary  or not ?

( I am just learning and exploring PgbackRest)  found online some reference
configurations so using like this )

 pg1-database=edb // specifying like this, will it block other databases
on this server to get backed up ?   IF yes how can I overcome this ?

I want all databases on this server to be backed up to the remote
repository.


It is a production server and I can't perform a trial and error method here
to understand how it works.


Please shed some light on this .


Thanks ,
Krishane


Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.

2024-08-27 Thread Justin Clift

On 2024-08-27 20:14, David Rowley wrote:
On Tue, 27 Aug 2024 at 18:00, Justin Clift  
wrote:

As a general thought, seeing that this might be an actual problem
should some kind of automated testing be added that checks for
performance regressions like this?


We normally try to catch these sorts of things with regression tests.
Of course, that requires having a test that would catch a particular
problem, which we don't seem to have for this particular case.  A
performance test would also require testing a particular scenario, so
I don't see why that's better.  A regression test is better suited as
there's no middle ground between pass and fail.


Yeah, that's the kind of thing I was thinking.

Any idea who normally does those, and if it would be reasonable to add
test(s) for the internal information tables?

Regards and best wishes,

Justin Clift