Re: oldest WAL files not removed

2021-09-02 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 08:21:37 +0200,  wrote in 
> Hy Kyotaro,
> 
> Thanks for this explanation. I joined the files to be complete.

Thanks, it is informative.  It looks like this if sorted in the file-name order.

01/09/2021  12:1516 777 216 0001008E0061.deleted
01/09/2021  16:3516 777 216 0001008E0086.deleted
01/09/2021  16:3516 777 216 0001008E0087
30/08/2021  20:4016 777 216 0001008E0088
...
31/08/2021  17:5816 777 216 0001008E00CF

> If you looked the files only in timestamp order, with a high odds, the
> "oldest" file is a recycled file to be used in future, and the "newest" file
> is the currently written one.  If so, the reason that the
> oldest-in-timestamp file is still there is it is still waiting to be used.
> Even if you removed the to-be-used-in-future files, such files would
> increase to the same extent according to the setting of min_wal_size.

The reason that 0001008E0088 is still there is it is still
waiting to be used.  The files 0001008E0061/86.deleted
have been already removed in the postgres' view but remain because
someone else is still using it. If they persist too long, they could
be removed manually if possible.

The files 88 to CF look like preallocated, or recycled files. Since
there are 76 files, it seems like min_wal_size is set to 1GB or so. If
you don't need that many files preallocated in your pg_wal directory,
consider reducing min_wal_size.  But note that the preallocated files
won't be gone right away just by doing that,

If you really want to delete that file right away, the preallocated
files are theoretically removable.  You can see the current-writing
file name by the following query then move to somewhere the files with
names larger than the current file in the file-name order, then remove
the files after making sure the system can restart.

=# select pg_walfile_name(pg_current_wal_lsn());

If the system is active, the current file may advance so be careful
not to remove files with its substantial contents. This is why I said
"In any case, no WAL files ought to be manually removed."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




No xmin in pg_database

2021-09-02 Thread Alexander Kass
Hi! I'm Alexander from DataGrip.
We actively use xmin's from pg_catalog tables to incrementally
synchronize our database model.
We have a small number of users that do not have xmin in pg_database
(we've asked them to try `select xmin from pg_database` and got
`column xmin does not exist`).
There is an old ticket https://youtrack.jetbrains.com/issue/DBE-7588
(at that time there was no real need in that xmin so we just removed
it)
Now xmins are back and so are the problems.
The most recent report is about version `PostgreSQL 12.4 (Ubuntu
12.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit`

How can that happen that there is no xmin in pg_database?
Is it a normal behaviour and that is configurable? Or is this a kind
of data corruption?
Can that happen to other tables?

I haven't been able to find answers myself, so I'm asking for help :)
Thanks in advance




Re: No xmin in pg_database

2021-09-02 Thread Laurenz Albe
On Thu, 2021-09-02 at 08:50 +0300, Alexander Kass wrote:
> Hi! I'm Alexander from DataGrip.
> We actively use xmin's from pg_catalog tables to incrementally
> synchronize our database model.
> We have a small number of users that do not have xmin in pg_database
> (we've asked them to try `select xmin from pg_database` and got
> `column xmin does not exist`).
> There is an old ticket https://youtrack.jetbrains.com/issue/DBE-7588
> (at that time there was no real need in that xmin so we just removed
> it)
> Now xmins are back and so are the problems.
> The most recent report is about version `PostgreSQL 12.4 (Ubuntu
> 12.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit`
> 
> How can that happen that there is no xmin in pg_database?
> Is it a normal behaviour and that is configurable? Or is this a kind
> of data corruption?
> Can that happen to other tables?
> 
> I haven't been able to find answers myself, so I'm asking for help :)
> Thanks in advance

All PostgreSQL tables have "xmin", and all catalog tables do as well.

If you use a non-standard table access method, a table might not
have "xmin".  But that cannot apply to catalog tables.

Perhaps that was not really PostgreSQL, but some fork where the
persistence layer was modified?

I am not sure if it is a good idea to rely on "xmin" at all.  These
numbers are recycled when transaction IDs wrap around, and you could
have two entries with the same "xmin" that have a totally different
meaning, because one of the rows is frozen and the other isn't.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: No xmin in pg_database

2021-09-02 Thread Tom Lane
Laurenz Albe  writes:
> On Thu, 2021-09-02 at 08:50 +0300, Alexander Kass wrote:
>> We have a small number of users that do not have xmin in pg_database
>> (we've asked them to try `select xmin from pg_database` and got
>> `column xmin does not exist`).

> All PostgreSQL tables have "xmin", and all catalog tables do as well.

Indeed.  This seems to be evidence of corruption in the pg_attribute
catalog.  If you're really lucky, reindexing pg_attribute might fix
it, though I wonder what other problems there are.  (It's odd though
that identical corruption would happen to different installations.)

regards, tom lane




Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread balasubramanian c r
Hi Team

We have encountered a problem in our testing environment.
I have a scenario where I am running a similarity match for an address
I have created a table with following number of records
1603423

We are using pg_similarity extension in postgresql version is 13.

And I have created GIN index (since i am using pg_similarity) library
jaccard similarity method
when I run the Explain analyze
EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj
nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address
where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad
201017' order by qsim DESC;
QUERY PLAN

Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual
time=12101.194..12101.197 rows=6 loops=1)

Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity
up ghaziabad 201017'::text)) DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93)
(actual time=3516.233..12101.172 rows=6 loops=1)
Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up
ghaziabad 201017'::text)
Rows Removed by Index Recheck: 1039186
Heap Blocks: exact=58018
-> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02
rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up
ghaziabad 201017'::text)
Planning Time: 0.141 ms
Execution Time: 12101.245 ms
(11 rows)
it took 12 seconds
following are my pgconf file settings
shared buffer as 4GB
work_mem 256 MB
maintenence_work_mem 512MB
autovacuum_work_mem 20MB
My index definition is this "address_complete_address_idx1" gin
(complete_address gin_similarity_ops)

It is taking the index correctly. But why it took 12 seconds to process I
really don't understand.

Please help.

Thanks
C.R.Bala


Re: No xmin in pg_database

2021-09-02 Thread Laurenz Albe
On Thu, 2021-09-02 at 12:10 +0300, Alexander Kass wrote:
> As for xmin usage, we have a working scheme. We fetch objects based on
> dbage(xid), starting from the oldest uncommitted transaction of
> previous synchronization.
> Do you think it does not work?

I don't know what exactly you are doing, so I cannot be certain.

age(xmin) will not necessarily tell you how many transactions
ago the row was created...

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: datfrozenxid not dropping after vacuum

2021-09-02 Thread Matthew Tice
Interestingly enough, I hopped on the database system this morning and
found the `datfrozenxid` dropped back down below
`autovacuum_freeze_max_age` around 0200 local time (roughly 18 hours
after the fact).

Looking through the Postgresql logs I don't see anything standing out
at that time.

I still plan on patching to 10.17 tonight.

Matt

On Wed, Sep 1, 2021 at 4:01 PM Matthew Tice  wrote:
>
> Hi Alvaro, thanks for the quick reply.
>
> I'm scheduled to do my patching maintenance at the end of this month -
> but at this point I don't think I'm going to make it.
>
> Other than patching, is there a work around?  For example, in #2 above:
> >The fix for 2) is simpler,
> >simply always remove both the shared and local init files.
>
> I'm not familiar with the differences between 'shared' and 'local'
> init files (I'd imagine I referenced a 'local' file in my original
> post)?
>
>
>
> Thanks!
>
> Matt
>
> On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera  wrote:
> >
> > On 2021-Sep-01, Matthew Tice wrote:
> >
> > [ problem table is pg_database ]
> >
> > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > > standby databases have been patched to 10.17.
> >
> > Hmm, I think there was a bug in the early 10.x versions where advancing
> > the xid age of shared tables would not work correctly for some reason ...
> > Ah yes, this was fixed in 10.5, a mere three years ago:
> >
> > Author: Andres Freund 
> > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 
> > -0700
> > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 
> > -0700
> >
> > Fix bugs in vacuum of shared rels, by keeping their relcache entries 
> > current.
> >
> > When vacuum processes a relation it uses the corresponding relcache
> > entry's relfrozenxid / relminmxid as a cutoff for when to remove
> > tuples etc. Unfortunately for nailed relations (i.e. critical system
> > catalogs) bugs could frequently lead to the corresponding relcache
> > entry being stale.
> >
> > This set of bugs could cause actual data corruption as vacuum would
> > potentially not remove the correct row versions, potentially reviving
> > them at a later point.  After 699bf7d05c some corruptions in this vein
> > were prevented, but the additional error checks could also trigger
> > spuriously. Examples of such errors are:
> >   ERROR: found xmin ... from before relfrozenxid ...
> > and
> >   ERROR: found multixact ... from before relminmxid ...
> > To be caused by this bug the errors have to occur on system catalog
> > tables.
> >
> > The two bugs are:
> >
> > 1) Invalidations for nailed relations were ignored, based on the
> >theory that the relcache entry for such tables doesn't
> >change. Which is largely true, except for fields like relfrozenxid
> >etc.  This means that changes to relations vacuumed in other
> >sessions weren't picked up by already existing sessions.  Luckily
> >autovacuum doesn't have particularly longrunning sessions.
> >
> > 2) For shared *and* nailed relations, the shared relcache init file
> >was never invalidated while running.  That means that for such
> >tables (e.g. pg_authid, pg_database) it's not just already existing
> >sessions that are affected, but even new connections are as well.
> >That explains why the reports usually were about pg_authid et. al.
> >
> > To fix 1), revalidate the rd_rel portion of a relcache entry when
> > invalid. This implies a bit of extra complexity to deal with
> > bootstrapping, but it's not too bad.  The fix for 2) is simpler,
> > simply always remove both the shared and local init files.
> >
> > Author: Andres Freund
> > Reviewed-By: Alvaro Herrera
> > Discussion:
> > 
> > https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de
> > 
> > https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com
> > 
> > https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com
> > 
> > https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com
> > Backpatch: 9.3-
> >
> >
> > --
> > Álvaro Herrera   39°49'30"S 73°17'W
> > "El número de instalaciones de UNIX se ha elevado a 10,
> > y se espera que este número aumente" (UPM, 1972)




jsonpath duplication result

2021-09-02 Thread Charles Paperman
Hi, 

I am looking at the documentation and how jsonpath works and I fall on the 
following example:

(On PostgreSQL 12.4, package for Debian 12.4-3):

SELECT jsonb_path_query(j, '$.**.a.**.b') FROM (SELECT 
'{"a":{"a":{"b":{"c":3}}}'::jsonb as j) as T;

The result:

 jsonb_path_query 
--
 {"c": 3}
 {"c": 3}
(2 lignes)


>From what in understand, it matches twice the last `b` with every `a` before. 
>For instance, adding one more 'a' on the tree:

 SELECT jsonb_path_query(j, '$.**.a.**.b') FROM (SELECT 
'{"a":{"a":{"a":{"b":{"c":3}'::jsonb as j) as T;
 jsonb_path_query 
--
 {"c": 3}
 {"c": 3}
 {"c": 3}

This could be a fine semantic for jsonpath query (one could call that the Bag 
semantic), 
however it implies that in some situation the number of output duplicated can 
become huge.
Actually in the worst case, it could be exponential in the depth of the tree.

On more example (for the fun):

SELECT Count(*) FROM (SELECT jsonb_path_query(j, 
'$.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.a.**.b') FROM 
(SELECT 
'{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"a":{"b":{"c":3}'::jsonb
 as j) as T) as T2;

Takes 24s a returns 20058300 rows.

This semantic for jsonpath is not so classical (other implementation choice to 
return only once each selected element, in the SET semantic spirit) 
and differs from the semantic of xpath, as shown here, which follows the SET 
semantic:

SELECT xpath('//a//a//b', '1');

Returns: 

   xpath

 {1}


We can of course get the appropriate semantic by adding DISTINCT, but the 
initial list of rows is computed nevertheless (a O(n) algorithm exists, so it 
is avoidable).

I also wonder if it is problematic to have simple small query like that that 
can be design to make the database work endlessly?

Best, 

Charles Paperman
Maitre de Conférence à l'université de Lille





Re: datfrozenxid not dropping after vacuum

2021-09-02 Thread Alvaro Herrera
On 2021-Sep-01, Matthew Tice wrote:

> Hi Alvaro, thanks for the quick reply.

Hi. Glad to hear that your problem is now behind.

> I'm scheduled to do my patching maintenance at the end of this month -
> but at this point I don't think I'm going to make it.
> 
> Other than patching, is there a work around?

Hm, in my previous reply I had written a suggestion to vacuum
pg_database in the offending database after deleting the
pg_internal.init file, but evidently I edited it out before sending.
(Probably because I wasn't sure if you need to delete file, connect,
vacuum, or rather connect, delete file, vacuum.)

> For example, in #2 above:
> >The fix for 2) is simpler,
> >simply always remove both the shared and local init files.
> 
> I'm not familiar with the differences between 'shared' and 'local'
> init files (I'd imagine I referenced a 'local' file in my original
> post)?

The global file is in the global/ subdirectory of the data directory,
and the "local" ones are each in the corresponding database directory:

cd $PGDATA
$ find . -name pg_internal.init
./base/12758/pg_internal.init
./base/46212/pg_internal.init
./global/pg_internal.init

$ psql -c "select oid, datname from pg_database"
  oid  |  datname   
---+
 12757 | postgres
 12758 | alvherre
 1 | template1
 12756 | template0
 46212 | regression
(5 filas)

So in the above there are cache files for databases regression and
alvherre, plus the global one.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)




Re: oldest WAL files not removed

2021-09-02 Thread Ninad Shah
Dear Oliver,

Kindly do not remove any WAL file from pg_wal. You should consider checking
out following settings in the postgresql.conf file.

1) wal_keep_segments
- This setting enable retention of last this number of files. e.g. if this
parameter is set to 256, last 256 files will not be deleted.

2) min_wal_size and max_wal_size
- Files beyond max_wal_size limit can be removed. e.g. if max_wal_size is
set to 1GB, last few files that collectively sizes 1GB will be retained.

In any case, if files are still preserved in spite of they are eligible to
get removed, kindly execute "CHECKPOINT" command or wait for next
checkpoint to occur.


Regards,
Ninad Shah


On Wed, 1 Sept 2021 at 16:19,  wrote:

> Hi,
>
>
>
> Looking at WAL folder after a crash, I noticed that new files after
> restarting overwrite the more recent files before the crash and not the
> oldest, which was what I expected.
>
> Is that normal ?  I got only one file marked .deleted. Does that happens
> when a WAL file hase been completed updated in the database and if then
> while all oldest files aren’t marked .deleted after restarting ?
>
>
>
> Example :
>
> Crash occurs Aug 31 22:03 which is the more recent Wal file, the oldest
> file is Aug 30 17:20 (and 105 files between those two)
>
> After restarting Aug 30 17:20 is still there, Aug 31 22:03 disappeared,
> one new file is Sep 1 12:15 marked .deleted (restarting date) and one new
> Sep 1 12:36 which I guess is normal. Right now, I see an new wal file and
> the previous one marked .deleted which is ok.
>
>
>
> Why are the oldest wal files still there ?? Can I remove them ?
>
>
>
> Hope I’m clear enough and thanks for explanations,
>
>
>
> Olivier
>
>
>


Re: Issue with a query while running on a remote host

2021-09-02 Thread Ninad Shah
Hi David/Karsten,

Thank you for your response. This helped me.

This thread can be closed.


Regards,
Ninad Shah

On Tue, 31 Aug 2021 at 13:26, David G. Johnston 
wrote:

> On Tuesday, August 31, 2021, Ninad Shah  wrote:
>
>> Hi Karsten,
>>
>> I apologize for the delayed response.
>>
>> There is no script-related transfer happening here. It creates an issue
>> while using "bash@" inside a column.
>>
>>>
>>>
> That wasn’t what was meant.  Ignore the “why” for the moment, the theory
> is something in the network or OS sees that string of data and fires off a
> rule that causes the data to be filtered.  Period.  The comment about “bash
> script” was just saying that whatever the “something” is might be guessing
> that the text sequence “bash@“ has something to do with bash scripts.  It
> was just a hint.  But regardless of why the false positive exists the
> theory is that there is one happening in the environment externally to any
> PostgreSQL related software.
>
> David J.
>
>


Re: jsonpath duplication result

2021-09-02 Thread David G. Johnston
On Thu, Sep 2, 2021 at 6:06 AM Charles Paperman 
wrote:

> I also wonder if it is problematic to have simple small query like that
> that can be design to make the database work endlessly?
>
>
No, because the "simple small" part of that doesn't matter.  A user with
access has various ways to affect denial-of-service on the server.  If you
don't trust someone from doing this kind of thing intentionally don't
provide them a login.  Accidental issues of this nature should be accounted
for in other ways - like testing.  And the process-oriented nature of the
system helps too since CPU starving becomes difficulty (so memory ends up
being the main concern).

The pathological case shown here is not all that concerning to me either.

That the implementation choice could have been different, and match other
implementations, is interesting to me.  But changing this kind of behavior
is usually not an option.  Though adding new features to accomplish the new
behavior is something to consider.

David J.


Re: No xmin in pg_database

2021-09-02 Thread Alexander Kass
>From the version string I can suggest that it is vanilla postgres from
The PostgreSQL Global Development Group (PGDG).
But we will ask the user.

I've checked PG source code, it checks system attrs (like xmin)
existence in some cache. Maybe that is really corruption.
Does postgres itself rely on xmins of system tables?

//offtop
As for xmin usage, we have a working scheme. We fetch objects based on
dbage(xid), starting from the oldest uncommitted transaction of
previous synchronization.
Do you think it does not work?

On Thu, Sep 2, 2021 at 11:53 AM Laurenz Albe  wrote:
>
> On Thu, 2021-09-02 at 08:50 +0300, Alexander Kass wrote:
> > Hi! I'm Alexander from DataGrip.
> > We actively use xmin's from pg_catalog tables to incrementally
> > synchronize our database model.
> > We have a small number of users that do not have xmin in pg_database
> > (we've asked them to try `select xmin from pg_database` and got
> > `column xmin does not exist`).
> > There is an old ticket https://youtrack.jetbrains.com/issue/DBE-7588
> > (at that time there was no real need in that xmin so we just removed
> > it)
> > Now xmins are back and so are the problems.
> > The most recent report is about version `PostgreSQL 12.4 (Ubuntu
> > 12.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> > 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit`
> >
> > How can that happen that there is no xmin in pg_database?
> > Is it a normal behaviour and that is configurable? Or is this a kind
> > of data corruption?
> > Can that happen to other tables?
> >
> > I haven't been able to find answers myself, so I'm asking for help :)
> > Thanks in advance
>
> All PostgreSQL tables have "xmin", and all catalog tables do as well.
>
> If you use a non-standard table access method, a table might not
> have "xmin".  But that cannot apply to catalog tables.
>
> Perhaps that was not really PostgreSQL, but some fork where the
> persistence layer was modified?
>
> I am not sure if it is a good idea to rely on "xmin" at all.  These
> numbers are recycled when transaction IDs wrap around, and you could
> have two entries with the same "xmin" that have a totally different
> meaning, because one of the rows is frozen and the other isn't.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>




memory consumption of memory for postgres db container

2021-09-02 Thread M Tarkeshwar Rao
Hi all,

We did a stability test for our product, where we continuously inserting data 
into postgres tables. We did this for three days continuously.
Memory utilization of our Postgres container continuously increasing.

[cid:image001.jpg@01D7A035.D9017060]

Can you please provide us the clue why it is not constant and continuously 
increasing?

Regards
Tarkeshwar


Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Ninad Shah
Hi Bala,

Are your statistics updated? Also, have you used the gin
operator(gin_similarity_ops) correctly?

It is fetching just 6 records out of a million, hence, it should not go for
bitmap index scan. As bitmap index scan loads a complete index, and access
relevant pages from the table later by bitmap heap scan.


Regards,
Ninad Shah


On Thu, 2 Sept 2021 at 16:39, balasubramanian c r 
wrote:

> Hi Team
>
> We have encountered a problem in our testing environment.
> I have a scenario where I am running a similarity match for an address
> I have created a table with following number of records
> 1603423
>
> We are using pg_similarity extension in postgresql version is 13.
>
> And I have created GIN index (since i am using pg_similarity) library
> jaccard similarity method
> when I run the Explain analyze
> EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj
> nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address
> where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad
> 201017' order by qsim DESC;
> QUERY PLAN
>
> Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual
> time=12101.194..12101.197 rows=6 loops=1)
>
> Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity
> up ghaziabad 201017'::text)) DESC
> Sort Method: quicksort Memory: 25kB
> -> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93)
> (actual time=3516.233..12101.172 rows=6 loops=1)
> Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity
> up ghaziabad 201017'::text)
> Rows Removed by Index Recheck: 1039186
> Heap Blocks: exact=58018
> -> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02
> rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
> Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up
> ghaziabad 201017'::text)
> Planning Time: 0.141 ms
> Execution Time: 12101.245 ms
> (11 rows)
> it took 12 seconds
> following are my pgconf file settings
> shared buffer as 4GB
> work_mem 256 MB
> maintenence_work_mem 512MB
> autovacuum_work_mem 20MB
> My index definition is this "address_complete_address_idx1" gin
> (complete_address gin_similarity_ops)
>
> It is taking the index correctly. But why it took 12 seconds to process I
> really don't understand.
>
> Please help.
>
> Thanks
> C.R.Bala
>


Re: calling store procedure / insert statement never complete

2021-09-02 Thread Ninad Shah
Kindly consider checking tcp_keepalive in PostgreSQL as well as OS kernel.
By default, it is 2 hours(7200).


Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 11:43, Trang Le  wrote:

> Hi Mladen,
>
> I missed previous emails, so I am creating a new one.
>
> I think the problem is coming from front-end side. when I run store
> procedure, job in pg_locks is done, data is inserted, log table shows that
> job completed with number of insert. However, store procedure in front-end
> side is still show running until the sever closed connection. The same
> problem when I run the query statement only, insert about 190k rows, when
> check target table, data is insert, but in the front-end side is still show
> wait until server close connection.
>
> I use pgamin4 5.3.
>
> Error: Server closed the connection unexpectedly. This probably means the
> server terminated abnormally before or while processing the request.
>
> Could you please double check?
>
> Thanks and regards,
> Trang
>


Re: memory consumption of memory for postgres db container

2021-09-02 Thread Ninad Shah
No need to worry. This isn't really a concern.

No operating system releases memory upon using it, but caches it once the
program finishes. This actually saves time as memory blocks can be
allocated in less time than loading new blocks.
However, if it's required to release memory, it can be performed via a few
commands.


Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 20:05, M Tarkeshwar Rao <
m.tarkeshwar@ericsson.com> wrote:

> Hi all,
>
>
>
> We did a stability test for our product, where we continuously inserting
> data into postgres tables. We did this for three days continuously.
>
> Memory utilization of our Postgres container continuously increasing.
>
>
>
>
>
> Can you please provide us the clue why it is not constant and continuously
> increasing?
>
>
>
> Regards
>
> Tarkeshwar
>


RE: memory consumption of memory for postgres db container

2021-09-02 Thread M Tarkeshwar Rao
Can you please suggest few commands?

Regards
Tarkeshwar

From: Ninad Shah 
Sent: Thursday, September 2, 2021 8:30 PM
To: M Tarkeshwar Rao 
Cc: pgsql-gene...@postgresql.org
Subject: Re: memory consumption of memory for postgres db container

No need to worry. This isn't really a concern.

No operating system releases memory upon using it, but caches it once the 
program finishes. This actually saves time as memory blocks can be allocated in 
less time than loading new blocks.
However, if it's required to release memory, it can be performed via a few 
commands.


Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 20:05, M Tarkeshwar Rao 
mailto:m.tarkeshwar@ericsson.com>> wrote:
Hi all,

We did a stability test for our product, where we continuously inserting data 
into postgres tables. We did this for three days continuously.
Memory utilization of our Postgres container continuously increasing.

[cid:image001.jpg@01D7A03A.E4BE8F00]

Can you please provide us the clue why it is not constant and continuously 
increasing?

Regards
Tarkeshwar


Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Michael Lewis
This is showing many false positives from the index scan that get removed
when the actual values are examined. With such a long search parameter,
that does not seem surprising. I would expect a search on "raj nagar
ghaziabad 201017" or something like that to yield far fewer results from
the index scan. I don't know GIN indexes super well, but I would guess that
including words that are very common will yield false positives that get
filtered out later.

>


Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Vano Beridze
Hello,

I've got 2 VMs with Postgresql 9.5 cluster with streaming replication.
VMs  have Ubuntu 16.04.
I would like to upgrade Ubuntu and Postgresql to newer versions.
Ubuntu 16.04 supports upgrading to 18.04.
What is the safest way to upgrade Postgresql cluster along with it?
The database is not big and I can afford completely shutdown the cluster
during the upgrade.

What would you suggest?

Kind regards,
Vano


Re: jsonpath duplication result

2021-09-02 Thread Charles Paperman
Le 09/02 06:55, David G. Johnston a écrit :
> On Thu, Sep 2, 2021 at 6:06 AM Charles Paperman 
> wrote:
> 
> I also wonder if it is problematic to have simple small query like that
> that can be design to make the database work endlessly?
> 
> 
> 
> No, because the "simple small" part of that doesn't matter.  A user with 
> access
> has various ways to affect denial-of-service on the server.  If you don't 
> trust
> someone from doing this kind of thing intentionally don't provide them a
> login.  Accidental issues of this nature should be accounted for in other ways
> - like testing.  And the process-oriented nature of the system helps too since
> CPU starving becomes difficulty (so memory ends up being the main concern).

I see. Thanks for the precision.
 
> The pathological case shown here is not all that concerning to me either. 
> That the implementation choice could have been different, and match other
> implementations, is interesting to me.  But changing this kind of behavior is
> usually not an option.  Though adding new features to accomplish the new
> behavior is something to consider.

I looked into https://goessner.net/articles/JsonPath/ (which seems to be the 
initial implementation others take as the standard).
And it seems that it follows the BAG semantics (the one of PostgreSQL).
On the SET semantic sides, without looking to much I found jq (which is somehow 
a standard as well)  and jsonpath-python (which claims
to follow the Goessner's one but actually doesn't). My guess is that depending 
on specific details on the implementation, you can fall in
one semantic or the other without thinking about it. It is not something tested 
in the test suits of the packaged I looked or in the initial
Goessner examples.

I can see a reason to prefer the BAG semantic: Goessner provides in the API a 
way to get all the paths and not just extracting
the final node. Then selecting the nodes could be a way to find other kind of 
information that is within the path.
I think that providing the path as a json function in postgresql is definitely 
a features that should 
not be to hard to add and which could be handy.

On the other hand, the Bag semantic means that any descendant predicate will 
increase the complexity (and possibly the set of results).

Looking at the code of `jsonpath_exec` I can see that adding the SET semantic 
as a new feature might be not obvious ... 
 




Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread balasubramanian c r
HI Ninad

Thanks for your reply.
If bitmap index should not be used. Do i need to disable it for the time
being and carry out the test.

The documentation in pg_similarity shows that index can be created on text
column using gin with gin_similarity_ops.
The same way the index is created like
CREATE INDEX on address using GIN(complete_address gin_similarity_ops);

AFAIK I have not seen any other operators other than gin in the
pg_smilarity extension.

Thanks
C.R.Bala

On Thu, Sep 2, 2021 at 8:17 PM Ninad Shah  wrote:

> Hi Bala,
>
> Are your statistics updated? Also, have you used the gin
> operator(gin_similarity_ops) correctly?
>
> It is fetching just 6 records out of a million, hence, it should not go
> for bitmap index scan. As bitmap index scan loads a complete index, and
> access relevant pages from the table later by bitmap heap scan.
>
>
> Regards,
> Ninad Shah
>
>
> On Thu, 2 Sept 2021 at 16:39, balasubramanian c r 
> wrote:
>
>> Hi Team
>>
>> We have encountered a problem in our testing environment.
>> I have a scenario where I am running a similarity match for an address
>> I have created a table with following number of records
>> 1603423
>>
>> We are using pg_similarity extension in postgresql version is 13.
>>
>> And I have created GIN index (since i am using pg_similarity) library
>> jaccard similarity method
>> when I run the Explain analyze
>> EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj
>> nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address
>> where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad
>> 201017' order by qsim DESC;
>> QUERY PLAN
>>
>> Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual
>> time=12101.194..12101.197 rows=6 loops=1)
>>
>> Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity
>> up ghaziabad 201017'::text)) DESC
>> Sort Method: quicksort Memory: 25kB
>> -> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93)
>> (actual time=3516.233..12101.172 rows=6 loops=1)
>> Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity
>> up ghaziabad 201017'::text)
>> Rows Removed by Index Recheck: 1039186
>> Heap Blocks: exact=58018
>> -> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02
>> rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
>> Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up
>> ghaziabad 201017'::text)
>> Planning Time: 0.141 ms
>> Execution Time: 12101.245 ms
>> (11 rows)
>> it took 12 seconds
>> following are my pgconf file settings
>> shared buffer as 4GB
>> work_mem 256 MB
>> maintenence_work_mem 512MB
>> autovacuum_work_mem 20MB
>> My index definition is this "address_complete_address_idx1" gin
>> (complete_address gin_similarity_ops)
>>
>> It is taking the index correctly. But why it took 12 seconds to process I
>> really don't understand.
>>
>> Please help.
>>
>> Thanks
>> C.R.Bala
>>
>


Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Adrian Klaver

On 9/2/21 10:38 AM, Vano Beridze wrote:

Hello,

I've got 2 VMs with Postgresql 9.5 cluster with streaming replication.
VMs  have Ubuntu 16.04.
I would like to upgrade Ubuntu and Postgresql to newer versions.
Ubuntu 16.04 supports upgrading to 18.04.
What is the safest way to upgrade Postgresql cluster along with it?


What repo are you getting Postgres from, Ubuntu or PGDG?

The database is not big and I can afford completely shutdown the cluster 
during the upgrade.


That is going to happen whether you want it or not 16.04 -> 18.04 is 
going to involve a reboot.




What would you suggest?

Kind regards,
Vano



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Nikolay Samokhvalov
Logical way – dump/restore.

Bringing PGDATA physically may lead to corrupted indexes due to glibc
version changes. 16.04 -> 18.04 shouldn't cause it, but it may. You can
check btree index with amcheck and GIN indexes with a patched version of it
(some backporting would be needed). You can find examples here:
https://gitlab.com/-/snippets/2001962

But back to the question, if you do dump/restore, indexes will be freshly
created, on new machine - therefore there is no question about corruption
for them. I'd choose this path in your situation.

On Thu, Sep 2, 2021 at 10:38 AM Vano Beridze  wrote:

> Hello,
>
> I've got 2 VMs with Postgresql 9.5 cluster with streaming replication.
> VMs  have Ubuntu 16.04.
> I would like to upgrade Ubuntu and Postgresql to newer versions.
> Ubuntu 16.04 supports upgrading to 18.04.
> What is the safest way to upgrade Postgresql cluster along with it?
> The database is not big and I can afford completely shutdown the cluster
> during the upgrade.
>
> What would you suggest?
>
> Kind regards,
> Vano
>


Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Vano Beridze
> What repo are you getting Postgres from, Ubuntu or PGDG?
Ubuntu

On Thu, Sep 2, 2021 at 10:13 PM Adrian Klaver 
wrote:

> On 9/2/21 10:38 AM, Vano Beridze wrote:
> > Hello,
> >
> > I've got 2 VMs with Postgresql 9.5 cluster with streaming replication.
> > VMs  have Ubuntu 16.04.
> > I would like to upgrade Ubuntu and Postgresql to newer versions.
> > Ubuntu 16.04 supports upgrading to 18.04.
> > What is the safest way to upgrade Postgresql cluster along with it?
>
> What repo are you getting Postgres from, Ubuntu or PGDG?
>
> > The database is not big and I can afford completely shutdown the cluster
> > during the upgrade.
>
> That is going to happen whether you want it or not 16.04 -> 18.04 is
> going to involve a reboot.
>
> >
> > What would you suggest?
> >
> > Kind regards,
> > Vano
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Adrian Klaver

On 9/2/21 11:34 AM, Vano Beridze wrote:

 > What repo are you getting Postgres from, Ubuntu or PGDG?
Ubuntu


So you are going to get version 10.

Is that what you want?

If you want to move to something newer then this maybe the time to move 
to the PGDG repos, as they allow you to install any of the current 
supported versions of Postgres.





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Vano Beridze
I just want to upgrade to the OS and Postgres that is supported.
If I just stop the service on both VMs and upgrade Ubuntu, then start
Postgres, will it work or I should go with clean install and dump/restore?

On Thu, Sep 2, 2021 at 10:41 PM Adrian Klaver 
wrote:

> On 9/2/21 11:34 AM, Vano Beridze wrote:
> >  > What repo are you getting Postgres from, Ubuntu or PGDG?
> > Ubuntu
>
> So you are going to get version 10.
>
> Is that what you want?
>
> If you want to move to something newer then this maybe the time to move
> to the PGDG repos, as they allow you to install any of the current
> supported versions of Postgres.
>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread David G. Johnston
On Thursday, September 2, 2021, Vano Beridze  wrote:

> I just want to upgrade to the OS and Postgres that is supported.
> If I just stop the service on both VMs and upgrade Ubuntu, then start
> Postgres, will it work or I should go with clean install and dump/restore?
>
>>
>>
Personally I find the concept of upgrading an O/S to be distasteful,
especially when using VMs.  Build out the new environment, migrate the
data, then destroy the old one.  In any case I would hope upgrading to
18.04 wouldn’t automatically upgrade PostgreSQL.

For me, the two main questions are whether to use pg_dump or pg_upgrade,
and whether to upgrade PostgreSQL before or after upgrading the OS.
Upgrading both simultaneously isn’t a valid choice, even if you could
figure out a way to actually make that possible.  Either way a
pg_basebackup and restoring that is probably the simplest way to get the
database itself moved from one server to the other.

David J.


Re: memory consumption of memory for postgres db container

2021-09-02 Thread Ninad Shah
You may checkout using "free -m" and may see how much memory is cache.

If you are using RHEL, you may clear cache using below commands.

sync; echo 1 > /proc/sys/vm/drop_caches; sync

Though this is going to release some memory from the cache, it may fill up
the cache again.


Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 20:41, M Tarkeshwar Rao <
m.tarkeshwar@ericsson.com> wrote:

> Can you please suggest few commands?
>
>
>
> Regards
>
> Tarkeshwar
>
>
>
> *From:* Ninad Shah 
> *Sent:* Thursday, September 2, 2021 8:30 PM
> *To:* M Tarkeshwar Rao 
> *Cc:* pgsql-gene...@postgresql.org
> *Subject:* Re: memory consumption of memory for postgres db container
>
>
>
> No need to worry. This isn't really a concern.
>
>
>
> No operating system releases memory upon using it, but caches it once the
> program finishes. This actually saves time as memory blocks can be
> allocated in less time than loading new blocks.
>
> However, if it's required to release memory, it can be performed via a few
> commands.
>
>
>
>
>
> Regards,
>
> Ninad Shah
>
>
>
> On Thu, 2 Sept 2021 at 20:05, M Tarkeshwar Rao <
> m.tarkeshwar@ericsson.com> wrote:
>
> Hi all,
>
>
>
> We did a stability test for our product, where we continuously inserting
> data into postgres tables. We did this for three days continuously.
>
> Memory utilization of our Postgres container continuously increasing.
>
>
>
>
>
> Can you please provide us the clue why it is not constant and continuously
> increasing?
>
>
>
> Regards
>
> Tarkeshwar
>
>


Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Ninad Shah
Try the pg_tgrm extension. It is a rich set of operators.


Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 23:39, balasubramanian c r 
wrote:

> HI Ninad
>
> Thanks for your reply.
> If bitmap index should not be used. Do i need to disable it for the time
> being and carry out the test.
>
> The documentation in pg_similarity shows that index can be created on text
> column using gin with gin_similarity_ops.
> The same way the index is created like
> CREATE INDEX on address using GIN(complete_address gin_similarity_ops);
>
> AFAIK I have not seen any other operators other than gin in the
> pg_smilarity extension.
>
> Thanks
> C.R.Bala
>
> On Thu, Sep 2, 2021 at 8:17 PM Ninad Shah 
> wrote:
>
>> Hi Bala,
>>
>> Are your statistics updated? Also, have you used the gin
>> operator(gin_similarity_ops) correctly?
>>
>> It is fetching just 6 records out of a million, hence, it should not go
>> for bitmap index scan. As bitmap index scan loads a complete index, and
>> access relevant pages from the table later by bitmap heap scan.
>>
>>
>> Regards,
>> Ninad Shah
>>
>>
>> On Thu, 2 Sept 2021 at 16:39, balasubramanian c r 
>> wrote:
>>
>>> Hi Team
>>>
>>> We have encountered a problem in our testing environment.
>>> I have a scenario where I am running a similarity match for an address
>>> I have created a table with following number of records
>>> 1603423
>>>
>>> We are using pg_similarity extension in postgresql version is 13.
>>>
>>> And I have created GIN index (since i am using pg_similarity) library
>>> jaccard similarity method
>>> when I run the Explain analyze
>>> EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj
>>> nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address
>>> where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad
>>> 201017' order by qsim DESC;
>>> QUERY PLAN
>>>
>>> Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual
>>> time=12101.194..12101.197 rows=6 loops=1)
>>>
>>> Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara
>>> integrity up ghaziabad 201017'::text)) DESC
>>> Sort Method: quicksort Memory: 25kB
>>> -> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93)
>>> (actual time=3516.233..12101.172 rows=6 loops=1)
>>> Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity
>>> up ghaziabad 201017'::text)
>>> Rows Removed by Index Recheck: 1039186
>>> Heap Blocks: exact=58018
>>> -> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02
>>> rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
>>> Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity
>>> up ghaziabad 201017'::text)
>>> Planning Time: 0.141 ms
>>> Execution Time: 12101.245 ms
>>> (11 rows)
>>> it took 12 seconds
>>> following are my pgconf file settings
>>> shared buffer as 4GB
>>> work_mem 256 MB
>>> maintenence_work_mem 512MB
>>> autovacuum_work_mem 20MB
>>> My index definition is this "address_complete_address_idx1" gin
>>> (complete_address gin_similarity_ops)
>>>
>>> It is taking the index correctly. But why it took 12 seconds to process
>>> I really don't understand.
>>>
>>> Please help.
>>>
>>> Thanks
>>> C.R.Bala
>>>
>>


Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Ninad Shah
I see an issue with the operator. GIN index is capable of performing index
scan and index-only scan.


Regards,
Ninad Shah

On Thu, 2 Sept 2021 at 20:57, Michael Lewis  wrote:

> This is showing many false positives from the index scan that get removed
> when the actual values are examined. With such a long search parameter,
> that does not seem surprising. I would expect a search on "raj nagar
> ghaziabad 201017" or something like that to yield far fewer results from
> the index scan. I don't know GIN indexes super well, but I would guess that
> including words that are very common will yield false positives that get
> filtered out later.
>
>>


Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Tom Lane
Michael Lewis  writes:
> This is showing many false positives from the index scan that get removed
> when the actual values are examined. With such a long search parameter,
> that does not seem surprising. I would expect a search on "raj nagar
> ghaziabad 201017" or something like that to yield far fewer results from
> the index scan. I don't know GIN indexes super well, but I would guess that
> including words that are very common will yield false positives that get
> filtered out later.

Yeah, the huge "Rows Removed" number shows that this index is very
poorly adapted to the query.  I don't think the problem is with GIN
per se, but with a poor choice of how to use it.  The given example
looks like what the OP really wants to do is full text search.
If so, a GIN index should be fine as long as you put tsvector/tsquery
filtering in front of it.  If that's not a good characterization of
the goal, it'd help to tell us what the goal is.  (Just saying "I
want to use jaccard similarity" sounds a lot like a man whose only
tool is a hammer, therefore his problem must be a nail, despite
evidence to the contrary.)

regards, tom lane




Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Adrian Klaver

On 9/2/21 11:55 AM, Vano Beridze wrote:

I just want to upgrade to the OS and Postgres that is supported.
If I just stop the service on both VMs and upgrade Ubuntu, then start 
Postgres, will it work or I should go with clean install and dump/restore?


Prudent behavior before you do any upgrading would be to:

1) Take pg_dumpall of cluster just before shutting it down.

2) Copy that dump off the VM's for extra safety.

Then you have choice of:

1) On new upgraded OS there will be a new 10 cluster created that has 
only the template0, template1 and postgres databases. psql -f dump.sql 
postgres per example here 
https://www.postgresql.org/docs/10/app-pg-dumpall.html


2) Or use the pg_upgrade procedure shown here:

https://askubuntu.com/questions/1126847/will-a-release-upgrade-take-care-of-postgresql-upgrade-automatically

I would say if the database(s) are not that big I would go with 1).



On Thu, Sep 2, 2021 at 10:41 PM Adrian Klaver > wrote:


On 9/2/21 11:34 AM, Vano Beridze wrote:
 >  > What repo are you getting Postgres from, Ubuntu or PGDG?
 > Ubuntu

So you are going to get version 10.

Is that what you want?

If you want to move to something newer then this maybe the time to move
to the PGDG repos, as they allow you to install any of the current
supported versions of Postgres.




-- 
Adrian Klaver

adrian.kla...@aklaver.com 




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Upgrade 9.5 cluster on Ubuntu 16.04

2021-09-02 Thread Nikolay Samokhvalov
Forgot to mention: the same concerns about possible index corruption are
relevant for the pg_upgrade option too (because it doesn't rebuild
indexes). So, I'd definitely choose dump/restore if the database is small.
In the case of pg_upgrade, I would rebuild all the indexes during the
maintenance window or just some of them (which ones – amcheck can help to
answer, for 9.5 it should be taken from here:
https://github.com/petergeoghegan/amcheck).

On Thu, Sep 2, 2021 at 11:17 AM Nikolay Samokhvalov 
wrote:

> Logical way – dump/restore.
>
> Bringing PGDATA physically may lead to corrupted indexes due to glibc
> version changes. 16.04 -> 18.04 shouldn't cause it, but it may. You can
> check btree index with amcheck and GIN indexes with a patched version of it
> (some backporting would be needed). You can find examples here:
> https://gitlab.com/-/snippets/2001962
>
> But back to the question, if you do dump/restore, indexes will be freshly
> created, on new machine - therefore there is no question about corruption
> for them. I'd choose this path in your situation.
>
> On Thu, Sep 2, 2021 at 10:38 AM Vano Beridze  wrote:
>
>> Hello,
>>
>> I've got 2 VMs with Postgresql 9.5 cluster with streaming replication.
>> VMs  have Ubuntu 16.04.
>> I would like to upgrade Ubuntu and Postgresql to newer versions.
>> Ubuntu 16.04 supports upgrading to 18.04.
>> What is the safest way to upgrade Postgresql cluster along with it?
>> The database is not big and I can afford completely shutdown the cluster
>> during the upgrade.
>>
>> What would you suggest?
>>
>> Kind regards,
>> Vano
>>
>


Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread balasubramanian c r
HI Tom/Ninad
My bad I didn't explain my use case properly.
The use case is to find the best string similarity for a given address
against the list of addresses in the table.
Initially I tried a similarity function provided by the pg_trgm extension.
But the similarity scores were not satisfactory.
Later I explored the pg_similarity extension which had an exhaustive list
of functions which supported indexes primarily on GIN.
After trying multiple functions like levenshtein, cosine, qgram, jaro, jaro
winkler and jaccard similarity functions
jaccard similarity functions were providing better accurate results.

Hence we decided to use that for our analysis.

Now with a small amount of data the query response time is better. But with
such 1.6 million it is taking a long time.
Even the documentation owner of pg_similarity provided an example on how to
create an index using GIN. We followed exactly the same process.
But still when the data is huge we don't know why it is taking time to scan
through the records.

Thanks
C.R.Bala


On Fri, Sep 3, 2021 at 1:14 AM Tom Lane  wrote:

> Michael Lewis  writes:
> > This is showing many false positives from the index scan that get removed
> > when the actual values are examined. With such a long search parameter,
> > that does not seem surprising. I would expect a search on "raj nagar
> > ghaziabad 201017" or something like that to yield far fewer results from
> > the index scan. I don't know GIN indexes super well, but I would guess
> that
> > including words that are very common will yield false positives that get
> > filtered out later.
>
> Yeah, the huge "Rows Removed" number shows that this index is very
> poorly adapted to the query.  I don't think the problem is with GIN
> per se, but with a poor choice of how to use it.  The given example
> looks like what the OP really wants to do is full text search.
> If so, a GIN index should be fine as long as you put tsvector/tsquery
> filtering in front of it.  If that's not a good characterization of
> the goal, it'd help to tell us what the goal is.  (Just saying "I
> want to use jaccard similarity" sounds a lot like a man whose only
> tool is a hammer, therefore his problem must be a nail, despite
> evidence to the contrary.)
>
> regards, tom lane
>


Query takes around 15 to 20 min over 20Lakh rows

2021-09-02 Thread Shubham Mittal
Hi ,

*Please help in optimizing this query. I need to actually generate reports
daily using this query.. It takes almost 15 to 20 min to execute this query
due to joins.. *
*Here common_details is a jsonB column.*

SELECT T.order_id,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'srType' :: text
  AS
   product,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'mobileNumber' :: text
  AS
   msisdn,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'alternateNumber' :: text
   AS
   alternate_number,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'circle' :: text
  AS
   parent_circle,
   T.circle,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'srNumber' :: text
  AS
   complaint_number,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'caseType' :: text
  AS
   complaint_type,
   ( ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
   text )
 ->> 'status' :: text )
  AS
   status,
   T.status
 AS
   task_status,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'subType' :: text
   AS
   SUBTYPE,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'subSubType' :: text
  AS
   subsubtype,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'source' :: text
  AS
   source,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'custType' :: text
  AS
   customer_type,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'custClass' :: text
   AS
   customer_class,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'custValue' :: text
   AS
   customer_value,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'accountNumber' :: text
   AS
   account_number,
   To_char(( ( ( T.common_details -> 'commonDetails' :: text ) ->
   'bundle' ::
   text )
 ->> 'slaDt' :: text ) :: timestamp, 'DD/MM/
HH24:MI:SS') AS
   sladt,
   To_char(( ( ( T.common_details -> 'commonDetails' :: text ) ->
   'bundle' ::
   text )
 ->> 'srDt' :: text ) :: timestamp, 'DD/MM/
HH24:MI:SS')  AS
   sr_date,
   CASE Lower(T.status)
 WHEN 'reopen' THEN NULL
 ELSE To_char(( totalTimeJoin.modified_date ), 'DD/MM/
HH24:MI:SS')
   END
   AS
   resolutiondatetime,
   To_char(reopenJoin.modified_date, 'DD/MM/ HH24:MI:SS')
  AS
   reopen_date,
   T.dynamic_data ->> 'resolution_code' :: text
 AS
   rc,
   T.dynamic_data ->> 'fault_found_code' :: text
AS
   ffc,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'servingCellId' :: text
   AS
   serving_cell_id,
   Coalesce(( ( ( T.common_details -> 'commonDetails' :: text ) ->
'bundle'
:: text )
  ->> 'servingSiteId' :: text ),
   (
   ( (
   T.common_details -> 'commonDetails' :: text ) -> 'bundle' :: text )
   ->> 'producthandsetType'
::
   text )
   )

  AS
   servingsiteid,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'customerLat' :: text
   AS
   customer_lat,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'customerLng' :: text
   AS
   customer_long,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'thanksCustomer' :: text
  AS
   thanks_flag,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'custValue' :: text
   AS
   black_flag,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'caseType' :: text
  AS
   sr_ftr,
   T.dynamic_data ->> 'dsl_connection' :: text
AS
   dsl,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'appInstalled' :: text
  AS
   app_installed,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'voiceMOU' :: text
  AS
   voice_mou,
   ( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
 text )
   ->> 'dataConsumed' :: text
  AS
   data_mou,
   ( T.common_details -> 'commonD

Re: Query takes around 15 to 20 min over 20Lakh rows

2021-09-02 Thread Michael Lewis
What is T and how many rows are in there? How many rows in task_history?
What indexes exist? Are you confident you want 2 million rows in that
result set? What version is this on? What pg_settings have been changed
from defaults?


Re: Query takes around 15 to 20 min over 20Lakh rows

2021-09-02 Thread David G. Johnston
On Thu, Sep 2, 2021 at 3:16 PM Shubham Mittal 
wrote:

> Hi ,
>
> *Please help in optimizing this query. I need to actually generate reports
> daily using this query.. It takes almost 15 to 20 min to execute this query
> due to joins.. *
> *Here common_details is a jsonB column.*
>
> SELECT T.order_id,
>( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
>  text )
>->> 'srType' :: text
> AS
>product,
>( ( T.common_details -> 'commonDetails' :: text ) -> 'bundle' ::
>  text )
>->> 'mobileNumber' :: text
> AS
>msisdn,
>

Use jsonb_populate_recordset (or one of its siblings) to get rid of as many
of these key-based value extraction operations as possible and build a
table from the contents of the jsonb.

Possibly into a temporary table to which you add indexes.

David J.


Re: Query takes around 15 to 20 min over 20Lakh rows

2021-09-02 Thread Tom Lane
"David G. Johnston"  writes:
> On Thu, Sep 2, 2021 at 3:16 PM Shubham Mittal 
> wrote:
>> *Please help in optimizing this query. I need to actually generate reports
>> daily using this query.. It takes almost 15 to 20 min to execute this query
>> due to joins.. *

> Use jsonb_populate_recordset (or one of its siblings) to get rid of as many
> of these key-based value extraction operations as possible and build a
> table from the contents of the jsonb.

While those duplicative extractions sure look inefficient, it's not
clear from the (lack of) given facts whether that's the main cost,
or whether the complicated FROM clause is producing a bad plan.
I'd suggest first looking at EXPLAIN ANALYZE output to verify which
plan step(s) are slow.  If it's the final output step that's expensive,
then yes the next step is to optimize the extractions.

Otherwise, see

https://wiki.postgresql.org/wiki/Slow_Query_Questions

regards, tom lane




gen_random_uuid key collision

2021-09-02 Thread jesusthefrog
Hello, I'm hoping someone might be able to shed a little light on a strange
situation I encountered recently.

I work with a postgres instance which has dozens (probably hundreds) of
tables which each have a column defined as "uuid primary key default
gen_random_uuid()".

Most of the time this is fine, but one specific table has recently started
repeatedly having inserts fail because of a unique constraint violation
involving the primary key. The table only has ~10,000 rows, but I'll
sometimes see two or three collisions in a single day. No other table (even
those with many, many more rows) exhibit this issue.

We're running postgres 12, so I believe the gen_random_uuid function is
provided by the pgcrypto extension, but either way it'll be the same for
that entire database instance, so I can't explain why only one table would
be having problems if it were due to a bug in the function. Also, since I
believe it just uses openssl (which we have linked) to generate random
bytes, the chance of a bug should be very low.

Anyone have any thoughts on this?

-- 
-BEGIN GEEK CODE BLOCK-
  Version: 3.12
  GIT d- s+ a- C L+++ S++ B+ P++> E++ W+++
  N o? K- !w O- M- V? PS++ PE- Y+ PGP t+ 5+++
  X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
--END GEEK CODE BLOCK--


Re: gen_random_uuid key collision

2021-09-02 Thread Adrian Klaver

On 9/2/21 4:25 PM, jesusthefrog wrote:
Hello, I'm hoping someone might be able to shed a little light on a 
strange situation I encountered recently.




Most of the time this is fine, but one specific table has recently 
started repeatedly having inserts fail because of a unique constraint 
violation involving the primary key. The table only has ~10,000 rows, 
but I'll sometimes see two or three collisions in a single day. No other 
table (even those with many, many more rows) exhibit this issue.


What is the table schema as returned by \d  in psql?


Anyone have any thoughts on this?



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: gen_random_uuid key collision

2021-09-02 Thread jesusthefrog
On Thu, Sep 2, 2021 at 7:35 PM Adrian Klaver 
wrote:

> What is the table schema as returned by \d  in psql?
>
>
The tables are in various schemas; that one is in one called
"access_control", but we always set the search path explicitly to (in this
case) "access_control, public".
Anyway, if if were a problem with finding the function, I would be seeing a
different error .The function successfully runs, it is just (apparently)
occasionally generating the same UUID multiple times.

-- 
-BEGIN GEEK CODE BLOCK-
  Version: 3.12
  GIT d- s+ a- C L+++ S++ B+ P++> E++ W+++
  N o? K- !w O- M- V? PS++ PE- Y+ PGP t+ 5+++
  X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
--END GEEK CODE BLOCK--


Re: gen_random_uuid key collision

2021-09-02 Thread Tom Lane
jesusthefrog  writes:
> I work with a postgres instance which has dozens (probably hundreds) of
> tables which each have a column defined as "uuid primary key default
> gen_random_uuid()".

> Most of the time this is fine, but one specific table has recently started
> repeatedly having inserts fail because of a unique constraint violation
> involving the primary key. The table only has ~10,000 rows, but I'll
> sometimes see two or three collisions in a single day. No other table (even
> those with many, many more rows) exhibit this issue.

That is pretty weird, all right.  The only idea that comes to mind
immediately is that maybe that table's pkey index is corrupt and needs
to be reindexed.  This isn't a great theory, because I don't see why
a corrupt index would lead to bogus unique-constraint errors rather
than missed ones.  But at least it squares with the observation that
only that table is having issues.

BTW, are you *entirely* certain that your application never inserts
non-default values into that column?

regards, tom lane




Re: gen_random_uuid key collision

2021-09-02 Thread Adrian Klaver

On 9/2/21 4:38 PM, jesusthefrog wrote:
On Thu, Sep 2, 2021 at 7:35 PM Adrian Klaver > wrote:


What is the table schema as returned by \d  in psql?


The tables are in various schemas; that one is in one called 


But only one is generating errors. Schema refers to an object's 
definition as well as a namespace. So what does:


\d 

return?

"access_control", but we always set the search path explicitly to (in 
this case) "access_control, public".
Anyway, if if were a problem with finding the function, I would be 
seeing a different error .The function successfully runs, it is just 
(apparently) occasionally generating the same UUID multiple times.


--
-BEGIN GEEK CODE BLOCK-
   Version: 3.12
   GIT d- s+ a- C L+++ S++ B+ P++> E++ W+++
   N o? K- !w O- M- V? PS++ PE- Y+ PGP t+ 5+++
   X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
--END GEEK CODE BLOCK--



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: gen_random_uuid key collision

2021-09-02 Thread jesusthefrog
On Thu, Sep 2, 2021 at 7:47 PM Tom Lane  wrote:

>
> BTW, are you *entirely* certain that your application never inserts
> non-default values into that column?
>
> regards, tom lane
>

Yes, I double checked that we never attempt to bind a value for that
column. I'll have a go at just rebuilding the pkey index and see if that
helps.

-- 
-BEGIN GEEK CODE BLOCK-
  Version: 3.12
  GIT d- s+ a- C L+++ S++ B+ P++> E++ W+++
  N o? K- !w O- M- V? PS++ PE- Y+ PGP t+ 5+++
  X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
--END GEEK CODE BLOCK--


Re: gen_random_uuid key collision

2021-09-02 Thread jesusthefrog
On Thu, Sep 2, 2021 at 8:05 PM Adrian Klaver 
wrote:

> But only one is generating errors. Schema refers to an object's
> definition as well as a namespace. So what does:
>
> \d 
>
> return?
>

I see what you mean. I don't have access to the instance at the moment so
I'd have to take a look tomorrow.
What, specifically, would be interesting in that output? When I looked at
it this morning, I didn't see anything which looked out of the ordinary to
me.

-- 
-BEGIN GEEK CODE BLOCK-
  Version: 3.12
  GIT d- s+ a- C L+++ S++ B+ P++> E++ W+++
  N o? K- !w O- M- V? PS++ PE- Y+ PGP t+ 5+++
  X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
--END GEEK CODE BLOCK--


Re: gen_random_uuid key collision

2021-09-02 Thread Adrian Klaver

On 9/2/21 5:38 PM, jesusthefrog wrote:


On Thu, Sep 2, 2021 at 8:05 PM Adrian Klaver > wrote:


But only one is generating errors. Schema refers to an object's
definition as well as a namespace. So what does:

\d 

return?


I see what you mean. I don't have access to the instance at the moment 
so I'd have to take a look tomorrow.
What, specifically, would be interesting in that output? When I looked 


I have no idea, probably more about eliminating possibilities then anything.

at it this morning, I didn't see anything which looked out of the 
ordinary to me.


--
-BEGIN GEEK CODE BLOCK-
   Version: 3.12
   GIT d- s+ a- C L+++ S++ B+ P++> E++ W+++
   N o? K- !w O- M- V? PS++ PE- Y+ PGP t+ 5+++
   X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
--END GEEK CODE BLOCK--



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: gen_random_uuid key collision

2021-09-02 Thread Peter Geoghegan
On Thu, Sep 2, 2021 at 4:48 PM Tom Lane  wrote:
> That is pretty weird, all right.  The only idea that comes to mind
> immediately is that maybe that table's pkey index is corrupt and needs
> to be reindexed.  This isn't a great theory, because I don't see why
> a corrupt index would lead to bogus unique-constraint errors rather
> than missed ones.  But at least it squares with the observation that
> only that table is having issues.

This is easy enough to check using the contrib/amcheck extension.

jesusthefrog could try this, and report back what they see:

CREATE EXTENSION IF NOT EXISTS amcheck
SELECT bt_index_check('my_uuid_index', true);

If that doesn't show any errors, then there is a chance that this will:

SELECT bt_index_parent_check('my_uuid_index', true);

Note that the parent variant takes a disruptive lock that will block
write DML. You might prefer to just use the first query if this is
running in a production environment.

--
Peter Geoghegan




Re: gen_random_uuid key collision

2021-09-02 Thread Mark Dilger



> On Sep 2, 2021, at 4:25 PM, jesusthefrog  wrote:
> 
> Anyone have any thoughts on this?

I agree with Peter's suggestion upthread to run amcheck on the index, but if 
that comes back with no corruption, can you verify that there are no rules or 
triggers that might cause multiple copies of the rows to be inserted?  
Likewise, can you verify that you have no replication subscriptions that could 
be putting duplicates into the table?

Another idea that seems unlikely given your lack of trouble with other tables 
is that you might check whether you have any functions that reset the seed for 
your random generator.  I haven't looked specifically at your uuid generator, 
and I don't know if it gets nondeterministic randomness from /dev/random or 
similar, but deterministic random generators can be made to produce the same 
sequence again if the seed it reset.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







is there any memory leak with postgres version(12.x) - https://stackoverflow.com/questions/3849543/force-postgresql-to-release-allocated-memory

2021-09-02 Thread M Tarkeshwar Rao
Hi All,

Is there any memory leak in Postgres version 12.7 or any other 12.x?
It is mentioned in 
https://stackoverflow.com/questions/3849543/force-postgresql-to-release-allocated-memory.

Regards
Tarkeshwar


Re: memory consumption of memory for postgres db container

2021-09-02 Thread Stephan Knauss

On 02.09.2021 16:35, M Tarkeshwar Rao wrote:


We did a stability test for our product, where we continuously 
inserting data into postgres tables. We did this for three days 
continuously.


Memory utilization of our Postgres container continuously increasing.

I am not certain whether it is related, but you might want to check. I 
had a similar issue where the anonymous memory was sort of leaking with 
a rate of 2 GiB per hour.


it was resolved by doing a "set jit=off". You can see the full thread 
including some hints on how to check where memory is allocated here:


https://www.postgresql.org/message-id/flat/9f73e655-14b8-feaf-bd66-c0f506224b9e%40stephans-server.de

Stephan