Re: How to do faster DML

2024-02-04 Thread Peter J. Holzer
On 2024-02-04 02:14:20 +0530, Lok P wrote:
> However , as we have ~5billion rows in the base table and out of that , we 
> were
> expecting almost half i.e. ~2billion would be duplicates.

That's way more than I expected from your original description. And it
of course raises the question whether it's a good idea to just throw
away all that data or if you need to keep that in a normalized way.

> And you said, doing the inserts using the "order by CTID Offset"
> approach must cause one full sequential scan of the whole table for
> loading each chunk/10M of rows and that would take a long time I
> believe.
> 
> I am still trying to understand the other approach which you suggested. Not
> able to understand "you can select where index_col > last order by index_col 
> limit 10M," .
> However, to get the max ID value of the last 10M loaded rows in target, do you
> say that having an PK index created on that target table column(ID) will
> help,

Yes. Getting the maximum value from an index is a very fast operation.
You just have to traverse down the right edge of the tree (or you may
even be able to access the right-most leaf page directly).

>  and we can save the max (ID) value subsequently in another table to fetch
> and keep loading from the source table (as ID>Max_ID stored in temp table)?

Another table or a variable in a script (personally, if I need to do
something repeatedly, I usually write a script in the scripting language
I feel most comfortable in (which has been Python for the last 7 or 8
years, Perl before that) which gives you variables, loops, conditionals
and - above all - repeatability.

> OR
> Would it be better to do it in one shot only , but by setting a higher value 
> of
> some parameters like "maintenance_work_mem" or "max_parallel_workers"?

Hard to say. Normally, processing in fewer. bigger chunks is faster. But
RAM is much faster than disk (even with SSDs), so it might be faster to
make work_mem as large as you can and then use a chunk size which just
fits inside work_mem is faster. Of course finding that sweet spot takes
experimentation, hence time, and it may make little sense to experiment
for 20 hours just to save 40 minutes.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: How to do faster DML

2024-02-04 Thread Alban Hertroys


> On 3 Feb 2024, at 13:20, Lok P  wrote:
> 
> Hello All,
> A non partitioned table having ~4.8 billion rows in it and having data size 
> as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got 
> approx ~1billion+ duplicate rows inserted in it and we want to get the 
> duplicate data removed for this table and create a PK/unique constraint back 
> so as to not have the duplicate values in future. We are struggling to do the 
> same.

~4.8 billion rows of which ~1 billion are duplicates… Wait a minute…

Did you verify that your ID column is larger than 32-bits?
Because if that’s a 32 bit integer, the range of values it can hold is about 
4.3 billion, after which it wraps around.

With ~4.8 billion rows that would result in about ~0.5 billion repeated ID 
values, giving you the reported ~1 billion duplicate ID's.

If that’s the case, your duplicates obviously aren’t really duplicates and you 
require a different type of solution.


> Teammates suggested doing this using CTAS method, i.e. create a new table 
> with the unique record set and then drop the main table. Something as below
> 
> create table TAB1_New
> as
> SELECT  * from TAB1 A
> where ID in
>   (select min(ID) from TAB1
>   group by ID having count(ID)>=1 );
> 
> But for the above to work faster , they mentioned to have an index created on 
> the column using which the duplicate check will be performed i.e ID column. 
> So, creating the index itself took ~2hrs+ and the index size now shows as 
> ~116GB.
> 
> Create index idx1 on TAB1(ID)

Are your duplicates exact duplicates? Or is there an order of preference among 
them?
And if so, what really makes those rows unique?

That matters for solutions on how to deduplicate these rows.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: How to do faster DML

2024-02-04 Thread Francisco Olarte
Lok:

On Sat, 3 Feb 2024 at 21:44, Lok P  wrote:

> On Sun, Feb 4, 2024 at 12:50 AM Francisco Olarte  
> wrote:

> However , as we have ~5billion rows in the base table and out of that , we 
> were expecting almost half i.e. ~2billion would be duplicates. And you said, 
> doing the inserts using the "order by CTID Offset" approach must cause one 
> full sequential scan of the whole table for loading each chunk/10M of rows 
> and that would take a long time I believe.

I did not say "MUST CAUSE". In fact I said I believe it would not.

What I am gonna say ( now ) is test it. Make 1k, 10k, 100k, 1M tables
in a scratch database, explain and test your things there w/ & w/o
index etc.. Not all needed, but testing 100k & 1M in 1k batches could
show you missing quadratic behaviour. Explain would show you
unexpected sorts or scans.

> I am still trying to understand the other approach which you suggested. Not 
> able to understand "you can select where index_col > last order by index_col 
> limit 10M," .
> However, to get the max ID value of the last 10M loaded rows in target, do 
> you say that having an PK index created on that target table column(ID) will 
> help, and we can save the max (ID) value subsequently in another table to 
> fetch and keep loading from the source table (as ID>Max_ID stored in temp 
> table)?

I am a programmer by trade. When faced with problems like these,
unless I find a trivial solution, I tend to make auxiliary programs as
it is much easier for me to make a thousand lines of Perl ( or
python/C/C++ Java ) or a couple hundreds of plpgsql ( if I want to
avoid roundtrips ) than trying to debug complicated SQL only
workflows.

For your kind of problem I would make a program to read the rows and
insert them. As an example, lets say you have a non-unique index on ID
and are going to use the on-conflict-do-nothing route ( I do believe
this would be slow due to the need of having an index on the target
table to support it, but let's assume it is ok ). To do that I may
just do a loop, starting with last_id=-1(some id less than any other
id), selecting a chunk of rows with id>=last_id ordered by id and
inserting them. After doing that I may notice that I do not need the
index if the sort order is right, drop the index and the on-conflict
and just do, for every row, if(id>last_id) insert before storing
last_id=id.

Anyway, not knowing the real table characteristics and current usage
patterns I cannot recomend anything concrete.

> Would it be better to do it in one shot only , but by setting a higher value 
> of some parameters like "maintenance_work_mem" or "max_parallel_workers"?

It depends on a lot of unknown ( to us ) things.

Francisco Olarte.




Re: How to do faster DML

2024-02-04 Thread Marcos Pegoraro
>
> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000
> offset 20_000_000;
>

You can use min/max values grouping them by 10_000_000 records, so you
don´t need that offset, then generate commands and run them.

select format('insert into mytable2 select * from mytable1 where i between
%s and %s;', max(i), min(i)) from t group by i/10_000_000;


Re: How to do faster DML

2024-02-04 Thread Dennis White
I'm surprised no one has mentioned perhaps it's a good idea to partition
this table while adding the pk. By your own statements the table is
difficult to work with as is. Without partitioning the table, row inserts
would need to walk the pk index and could be a factor. If this is static
table then perhaps that's ok but if not...

Anyway I don't recall what type the ID was or how it's set but i suggest
you seriously investigate using it to partition this table into manageable
smaller tables.
Best of luck.

On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro  wrote:

> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000
>> offset 20_000_000;
>>
>
> You can use min/max values grouping them by 10_000_000 records, so you
> don´t need that offset, then generate commands and run them.
>
> select format('insert into mytable2 select * from mytable1 where i between
> %s and %s;', max(i), min(i)) from t group by i/10_000_000;
>


Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 8:14 PM Dennis White  wrote:

> I'm surprised no one has mentioned perhaps it's a good idea to partition
> this table while adding the pk. By your own statements the table is
> difficult to work with as is. Without partitioning the table, row inserts
> would need to walk the pk index and could be a factor. If this is static
> table then perhaps that's ok but if not...
>
> Anyway I don't recall what type the ID was or how it's set but i suggest
> you seriously investigate using it to partition this table into manageable
> smaller tables.
> Best of luck.
>
> On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro  wrote:
>
>> insert into mytable2 select * from mytable1 order by ctid limit
>>> 10_000_000 offset 20_000_000;
>>>
>>
>> You can use min/max values grouping them by 10_000_000 records, so you
>> don´t need that offset, then generate commands and run them.
>>
>> select format('insert into mytable2 select * from mytable1 where i
>> between %s and %s;', max(i), min(i)) from t group by i/10_000_000;
>>
>
Thank you so much.

You are correct. It was seemingly difficult to operate on this table. Every
read query is struggling and so partitioning is something we must have to
think of. And hoping that, postgres will be able to handle this scale, with
proper partitioning and indexing strategy.

I have a few related questions.

1)Even after partitioning the target table , to speed up the data load on
this table , Is there an option to disable the primary and foreign keys and
re-enable them post data load finishes. Will that be a good idea or will it
be cumbersome/resource intensive to re-enable the constraints , after
persisting all the data in the table?

2)I understand there is no limitation theoretically on the number or size
of partitions a table can have in postgres. But I want to know from experts
here, from their real life experience, if there exists any such thing which
we should take care of before deciding the partitioning strategy, so as to
have the soft limit (both with respect to size and number of partitions)
obeyed.
Note:- Currently this table will be around ~1TB in size and will hold
Approx ~3billion rows(post removal of duplicates). But as per business need
it may grow up to ~45 billion rows in future.

3)As the size of the table or each partition is going to be very large and
this will be a read intensive application, compressing the historical
partition will help us save the storage space and will also help the read
queries performance. So, Can you please throw some light on the compression
strategy which we should follow here (considering a daily range partition
table based on transaction_date as partition key)?

Regards
Lok


Re: How to do faster DML

2024-02-04 Thread Greg Sabino Mullane
Partitioning will depend a lot on how you access the data. Is it always
using that transaction_date column in the where clause? Can you share the
table definition via \d? We've talked about this table quite a bit, but not
actually seen what it looks like. We are flying blind a little bit. You
mentioned your queries are slow, but can you share an explain analyze on
one of these slow queries?

45 billion rows is quite manageable. How many rows are in each day? You may
want to do a larger bucket than 24 hours per partition.

1)Even after partitioning the target table , to speed up the data load on
> this table , Is there an option to disable the primary and foreign keys and
> re-enable them post data load finishes. Will that be a good idea
>

No.

3)As the size of the table or each partition is going to be very large and
> this will be a read intensive application, compressing the historical
> partition will help us save the storage space and will also help the read
> queries performance


I am not sure what this means. If you are not going to need the data
anymore, dump the data to deep storage and drop the partition.

Cheers,
Greg


>


Re: How to do faster DML

2024-02-04 Thread Ron Johnson
On Sun, Feb 4, 2024 at 10:30 AM Lok P  wrote:

> On Sun, Feb 4, 2024 at 8:14 PM Dennis White 
> wrote:
>
>> I'm surprised no one has mentioned perhaps it's a good idea to partition
>> this table while adding the pk. By your own statements the table is
>> difficult to work with as is. Without partitioning the table, row inserts
>> would need to walk the pk index and could be a factor. If this is static
>> table then perhaps that's ok but if not...
>>
>> Anyway I don't recall what type the ID was or how it's set but i suggest
>> you seriously investigate using it to partition this table into manageable
>> smaller tables.
>> Best of luck.
>>
>> On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro  wrote:
>>
>>> insert into mytable2 select * from mytable1 order by ctid limit
 10_000_000 offset 20_000_000;

>>>
>>> You can use min/max values grouping them by 10_000_000 records, so you
>>> don´t need that offset, then generate commands and run them.
>>>
>>> select format('insert into mytable2 select * from mytable1 where i
>>> between %s and %s;', max(i), min(i)) from t group by i/10_000_000;
>>>
>>
> Thank you so much.
>
> You are correct. It was seemingly difficult to operate on this table.
> Every read query is struggling and so partitioning is something we must
> have to think of. And hoping that, postgres will be able to handle this
> scale, with proper partitioning and indexing strategy.
>
> I have a few related questions.
>
> 1)Even after partitioning the target table , to speed up the data load on
> this table , Is there an option to disable the primary and foreign keys and
> re-enable them post data load finishes. Will that be a good idea or will it
> be cumbersome/resource intensive to re-enable the constraints , after
> persisting all the data in the table?
>

1. Load the children before attaching them to the parent.
2. Create the child indices, PK and FKs before attaching to the parent.
3. Do step 2 in multiple parallel jobs.  (cron is your friend.)
4. Attach the children to the "naked" (no PK, no FK, no indices) parent.
5. Use the ONLY keyword on CREATE INDEX and ALTER TABLE to add the indices,
PK and FK to the parent *after* step 4.


2)I understand there is no limitation theoretically on the number or size
> of partitions a table can have in postgres. But I want to know from experts
> here, from their real life experience, if there exists any such thing which
> we should take care of before deciding the partitioning strategy, so as to
> have the soft limit (both with respect to size and number of partitions)
> obeyed.
> Note:- Currently this table will be around ~1TB in size and will hold
> Approx ~3billion rows(post removal of duplicates). But as per business need
> it may grow up to ~45 billion rows in future.
>

I'd keep it under 200 partitions, but my experience might be outdated.


> 3)As the size of the table or each partition is going to be very large and
> this will be a read intensive application,
>

By PK?  If so, partition by PK.


> compressing the historical partition will help us save the storage space
> and will also help the read queries performance. So, Can you please throw
> some light on the compression strategy which we should follow here
> (considering a daily range partition table based on transaction_date as
> partition key)?
>
> Regards
> Lok
>
>
>
>


Re: Yum Update nothing provides libarmadillo.so.12()(64bit) needed by gdal36-libs-3.6.4-6PGDG.rhel9.x86_64 from pgdg-common

2024-02-04 Thread Marcelo Marques
BUG #18328: yum update nothing provides libarmadillo.so.12()(64bit) needed
by gdal36-libs-3.6.4-6PGDG.rhel9.x86



On Fri, Feb 2, 2024 at 11:05 AM Marcelo Marques 
wrote:

> *PROBLEM*
>
> *yum update nothing provides libarmadillo.so.12()(64bit) needed by
> gdal36-libs-3.6.4-6PGDG.rhel9.x86_64 from pgdg-common  *
>
> [root@rhel9-pg14 ~]# yum update --exclude=postgis33_14
> --exclude=postgresql14*
> Updating Subscription Management repositories.
> EPEL_9
>
>26 kB/s | 2.3 kB
> 00:00
> Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
>
>50 kB/s | 4.5 kB
> 00:00
> Red Hat CodeReady Linux Builder for RHEL 9 x86_64 (RPMs)
>
>49 kB/s | 4.5 kB
> 00:00
> Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
>
> 40 kB/s | 4.1 kB
>   00:00
> Error:
>  Problem: cannot install the best update candidate for package
> gdal36-libs-3.6.4-5PGDG.rhel9.x86_64
>  * - nothing provides libarmadillo.so.12()(64bit) needed by
> gdal36-libs-3.6.4-6PGDG.rhel9.x86_64 from pgdg-common*
> (try to add '--skip-broken' to skip uninstallable packages or '--nobest'
> to use not only best candidate packages)
> [root@rhel9-pg14 ~]#
>
> *NOTE:*  --skip-broken still returns the same error message, and --nobest
> just skips the gdal36-libs update
>
> *REMARKS*
> similar issue https://postgrespro.com/list/thread-id/2679095
>
> *"Pushed GDAL 3.6.4-6 RPMS to the repos built against armadillo 12 for RHEL 
> 9, 8 and Fedora."*
>
> *RHEL 9.3 DETAILS*
>
> [root@rhel9-pg14 ~]# cat /etc/redhat-release
> Red Hat Enterprise Linux release 9.3 (Plow)
> [root@rhel9-pg14 ~]#
>
> [root@rhel9-pg14 ~]# rpm -qa | grep postgres | sort
> postgresql14-14.10-1PGDG.rhel9.x86_64
> postgresql14-contrib-14.10-1PGDG.rhel9.x86_64
> postgresql14-libs-14.10-1PGDG.rhel9.x86_64
> postgresql14-server-14.10-1PGDG.rhel9.x86_64
>
> [root@rhel9-pg14 ~]# rpm -qa | grep postgis | sort
> postgis34_14-3.4.0-1PGDG.rhel9.x86_64
> [root@rhel9-pg14 ~]#
>
> [root@rhel9-pg14 ~]# subscription-manager repos --list-enabled
> +--+
> Available Repositories in /etc/yum.repos.d/redhat.repo
> +--+
> Repo ID:   rhel-9-for-x86_64-appstream-rpms
> Repo Name: Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs)
> Repo URL:
> https://red-inf-cap-p01.esri.com/pulp/content/ESRI/LEP_Patch/cv_patch/content/dist/rhel9/$releasever/x86_64/appstream/os
> Enabled:   1
>
> Repo ID:   ESRI_EPEL_9_EPEL_9
> Repo Name: EPEL_9
> Repo URL:
> https://red-inf-cap-p01.esri.com/pulp/content/ESRI/LEP_Patch/cv_patch/custom/EPEL_9/EPEL_9
> Enabled:   1
>
> Repo ID:   rhel-9-for-x86_64-baseos-rpms
> Repo Name: Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs)
> Repo URL:
> https://red-inf-cap-p01.esri.com/pulp/content/ESRI/LEP_Patch/cv_patch/content/dist/rhel9/$releasever/x86_64/baseos/os
> Enabled:   1
>
> Repo ID:   codeready-builder-for-rhel-9-x86_64-rpms
> Repo Name: Red Hat CodeReady Linux Builder for RHEL 9 x86_64 (RPMs)
> Repo URL:
> https://red-inf-cap-p01.esri.com/pulp/content/ESRI/LEP_Patch/cv_patch/content/dist/rhel9/$releasever/x86_64/codeready-builder/os
> Enabled:   1
>
> [root@rhel9-pg14 ~]# ls -la /etc/yum.repos.d/
> total 52
> drwxr-xr-x.   2 root root85 Jan 31 16:13 .
> drwxr-xr-x. 136 root root  8192 Feb  2 10:33 ..
> -rw-r--r--.   1 root root 13067 Dec 25 15:11 pgdg-redhat-all.repo
> -rw-r--r--.   1 root root 14013 Oct 17 04:10 pgdg-redhat-all.repo.old
> -rw-r--r--.   1 root root  4657 Jan 31 16:09 redhat.repo
> [root@rhel9-pg14 ~]#
>
> Thanks,
> Marcelo Marques
> Principal Product Engineer, Esri, www.esri.com
>
>


Re: Yum Update nothing provides libarmadillo.so.12()(64bit) needed by gdal36-libs-3.6.4-6PGDG.rhel9.x86_64 from pgdg-common

2024-02-04 Thread Adrian Klaver

On 2/4/24 10:42, Marcelo Marques wrote:


  BUG #18328: yum update nothing provides libarmadillo.so.12()(64bit)
  needed by gdal36-libs-3.6.4-6PGDG.rhel9.x86
  



The RPM package is separate from the core code. The bug list you filed 
to above is not for packaging issues.


For that you need to visit:

https://redmine.postgresql.org/projects/pgrpms/

You will need to set up a community account first to access the Redmine 
issue tracker.




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 9:18 PM Greg Sabino Mullane 
wrote:

> Partitioning will depend a lot on how you access the data. Is it always
> using that transaction_date column in the where clause? Can you share the
> table definition via \d? We've talked about this table quite a bit, but not
> actually seen what it looks like. We are flying blind a little bit. You
> mentioned your queries are slow, but can you share an explain analyze on
> one of these slow queries?
>
> 45 billion rows is quite manageable. How many rows are in each day? You
> may want to do a larger bucket than 24 hours per partition.
>
> 1)Even after partitioning the target table , to speed up the data load on
>> this table , Is there an option to disable the primary and foreign keys and
>> re-enable them post data load finishes. Will that be a good idea
>>
>
> No.
>
> 3)As the size of the table or each partition is going to be very large and
>> this will be a read intensive application, compressing the historical
>> partition will help us save the storage space and will also help the read
>> queries performance
>
>
> I am not sure what this means. If you are not going to need the data
> anymore, dump the data to deep storage and drop the partition.
>
> Cheers,
> Greg
>
>

Thank you.

The table has ~127 columns of different data types , combinations of
Numeric, varchar, date etc. And is having current size ~1TB holding
~3billion rows currently and the row size is ~300bytes.

Currently it has lesser volume , but in future the daily transaction per
day which will be inserted into this table will be Max ~500million
rows/day. And the plan is to persist at least ~3months of transaction data
which will be around 45billion rows in total. And if all works well , we
may need to persist ~6 months worth of data in this database in future and
that will be ~90 billion.

This table will always be queried on the transaction_date column as one of
the filters criteria. But the querying/search criteria can span from a day
to a month worth of transaction date data.

When you said "*You may want to do a larger bucket than 24 hours per
partition.*", do you mean to say partition by weekly or so? Currently as
per math i.e. 1TB of storage for ~3billion rows. So the daily range
partition size( to hold ~500million transactions/day) will be around
~100-150GB. Won't that be too much data for a single partition to operate
on, and increasing the granularity further(say weekly) will make the
partition more bulkier?

What I mean was, we will definitely need the data for querying purposes by
the users, but just to keep the storage space incontrol (and to help the
read queries), we were thinking of having the historical partitions
compressed. And for that , if any compression strategy should we follow on
postgres?

With regards to loading data to the table faster, wondering why you said '
*NO*' to load the data first and enabling/Creating the Primary key and
Foreign key constraint later approach. Because this table is a child table
and the parent is already having data in it, loading data to this table in
presence of PK and FK makes it too slow as it tries to validate those for
each set of rows. So we were thinking if doing it at a later stage at
oneshot will be a faster approach. Please suggest.

I will try to collect some SELECT query and post the explain analyze.
Currently we are trying to get rid of the duplicates.

Regards
Lok

>


Re: How to do faster DML

2024-02-04 Thread Lok P
On Sun, Feb 4, 2024 at 9:25 PM Ron Johnson  wrote:

> 1. Load the children before attaching them to the parent.
> 2. Create the child indices, PK and FKs before attaching to the parent.
> 3. Do step 2 in multiple parallel jobs.  (cron is your friend.)
> 4. Attach the children to the "naked" (no PK, no FK, no indices) parent.
> 5. Use the ONLY keyword on CREATE INDEX and ALTER TABLE to add the
> indices, PK and FK to the parent *after* step 4.
>
>>
>>
>>
>>
Thank You.

Can you please help me to understand these steps a bit  more accurately.

 Say we have a parent table already having data in it and also a primary
key defined.We will load the child table first , by dropping the Primary
key and Foreign keys, so that data load will be faster.

Then we will create the primary key index on the child table. When you said
using multiple parallel jobs, do you mean creating the PK index on each
partition separately from different sessions rather than creating on the
table using a single statement or some other faster way to create the PK
index?

Now the last step is attaching the PK and FK of the above child table to
the parent. This will validate each and every row for the uniqueness and
also to ensure the foreign key is present in the parent table. Won't this
take a longer time in this step?


Question on partitioning

2024-02-04 Thread veem v
Hello All,
In postgresql, Is it possible to partition an existing nonpartitioned table
having data already residing in it and indexes and constraints defined in
it, without the need of manually moving the data around, to make it faster?
Similarly merging multiple partitions to one partition or splitting a
single partition into multiple partitions?
Appreciate any reference doc or sample syntax for the same.

Thanks and Regards
Veem


select from composite type

2024-02-04 Thread Lorusso Domenico
Hello guys,
I'm trying to find out the equivalent behaviour of unnest, when I've got
just a composite type.

This is the statement is something like that (but more complex):

 _sqlUpdate text=$$
with s as (
select * from ($1)
)
update myView as q set
(attribute_fullname, modify_user_id)
=(s.attribute_fullname, $2)
where  s.product_code=q.product_code
and s.attribute_uid=q.attribute_uid
$$;

And I use this command.
execute _sqlUpdate using  attribute_list[_i], _modify_user_id;

attribute_list is  an array of composite type (with 20 fields).

If I use the unnest with the entire array the sql works, but I can't find a
way to treat the single record as a row of  a table.


Any suggestions?

Bye and thanks'

Domenico


Re: How to do faster DML

2024-02-04 Thread Ron Johnson
On Sat, Feb 3, 2024 at 11:09 AM Lok P  wrote:
 [snip]

> show maintenance_work_mem; - 4155MB
> show work_mem; - 8MB
> show shared_buffers ; -22029684
>

Those are pretty small values.  What are your server specs?


Re: Question on partitioning

2024-02-04 Thread Ron Johnson
On Sun, Feb 4, 2024 at 4:40 PM veem v  wrote:

> Hello All,
> In postgresql, Is it possible to partition an existing nonpartitioned
> table having data already residing in it and indexes and constraints
> defined in it, without the need of manually moving the data around, to make
> it faster? Similarly merging multiple partitions to one partition or
> splitting a single partition into multiple partitions?
>

That depends on how you partition the table, and what the PK is.  Works
great when the PK is constantly increasing (think "sequence" or
"timestamp", and you partition by PK ranges; not so great when you want to
partition by a different column.


Re: select from composite type

2024-02-04 Thread Adrian Klaver

On 2/4/24 14:50, Lorusso Domenico wrote:

Hello guys,
I'm trying to find out the equivalent behaviour of unnest, when I've got 
just a composite type.


This is the statement is something like that (but more complex):

  _sqlUpdate text=$$
with s as (
select * from ($1)
)
update myView as q set
(attribute_fullname, modify_user_id)
=(s.attribute_fullname, $2)
where  s.product_code=q.product_code
and s.attribute_uid=q.attribute_uid
$$;

And I use this command.
execute _sqlUpdate using  attribute_list[_i], _modify_user_id;


Is the above in a plpgsql function?



attribute_list is  an array of composite type (with 20 fields).


I am trying to wrap my head around "array of composite type". Please 
provide an example.





If I use the unnest with the entire array the sql works, but I can't 
find a way to treat the single record as a row of  a table.



Any suggestions?

Bye and thanks'

Domenico


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: select from composite type

2024-02-04 Thread David G. Johnston
On Sun, Feb 4, 2024 at 5:39 PM Adrian Klaver 
wrote:

>
> >
> > attribute_list is  an array of composite type (with 20 fields).
>
> I am trying to wrap my head around "array of composite type". Please
> provide an example.
>

ARRAY[ (1,2)::point, (3,4)::point ]::point[]

The main problem is the concept of writing "from($1)" in any query makes no
sense, you cannot parameterize a from clause directly like that.  You have
to put the value somewhere an expression is directly allowed.

David J.


bug/confusion with pg_log_standby_snapshot()

2024-02-04 Thread Pete O'Such
Is pg_log_standby_snapshot() expected to cause a WAL segment to be emitted
in an otherwise idle system?  In my lab setup, the primary did not, despite
invoking pg_log_standby_snapshot() on it, even when several times the
archive_timeout value passed after using that function.

The setup was all Postgres 16.1, with a primary replicating to standby via
log-shipping, and the standby (trying to) have a logical replication
publication subscribed by a third Postgres server.

In two separate trial runs, long waits happened on an idle system after
invoking pg_log_standby_snapshot().  The function did not result in a new
WAL segment being shipped to the standby, so on the third server the CREATE
SUBSCRIPTION command hung until I eventually restarted the primary.  At
that point CREATE SUBSCRIPTION completed and everything began working.

The documentation says:
> If the primary is idle, creating a logical slot on standby may
> take noticeable time. This can be sped up by calling the
> pg_log_standby_snapshot function on the primary.
https://www.postgresql.org/docs/current/logicaldecoding-explanation.html#LOGICALDECODING-REPLICATION-SLOTS

Using the function wasn't enough. Should I have done more to trigger WAL
emission?  If so, directly stating that in the documentation would have
helped me, and maybe others.

Thanks, Pete