Re: ANALYZE on partitioned tables vs on individual partitions

2024-08-07 Thread Michael Harris
Many thanks David for the comprehensive response.

> I think the complaint was about no autovacuum on the partitioned
> table, not the partitions.

Yes, exactly.

One other piece of information: these tables contain a lot of columns, of which
only 4 are normally used for WHERE clauses or joins. The table I was
experimenting
with has 150 columns, 156026832 rows and occupies 166GB.

I found that running an ANALYZE specifying only those 4 columns only took
5 minutes, compared to the 30 minutes for the whole table.

That was a bit of a surprise as I imagined actually reading the table would take
most of the time and would be the same regardless of the number of columns
being analyzed, but I guess that is wrong.

Regards, Mike


On Wed, 7 Aug 2024 at 15:23, David Rowley  wrote:
>
> On Wed, 7 Aug 2024 at 16:44, Christophe Pettus  wrote:
> > Child partitions should be autovacuumed and autoanalyzed just like any 
> > other table; they are not prohibited from autovacuum in any way by default. 
> >  It's probably a good idea to investigate why they are not being picked up 
> > by autovacuum.  If they are created by a bulk load process, it's not a bad 
> > idea to do a VACUUM ANALYZE on them once the bulk load is complete.
>
> I think the complaint was about no autovacuum on the partitioned
> table, not the partitions.  This is expected as we don't track the
> counters (in particular n_mod_since_analyze) shown in
> pg_stat_all_tables at the partitioned table level, so the trigger
> points that normally cause autovacuum to analyze or vacuum a table
> just won't be triggered for a partitioned table.  For VACUUM, that's
> fine as, as you mentioned, no rows are stored. But for analyze, that
> does present a problem.
>
> To name the aspects of planning that rely on statistics of the
> partitioned table, basically anything above the Append or MergeAppend
> which joins the partitioned results together. So that doesn't include
> the scans of each partition and any quals that are pushed down to the
> scan level as those are able to use the partition level statistics.
> However, it does include things like joins, group by, distinct as
> those require n_distinct estimates for the partitioned table. It's not
> all bad though as the row estimates for each individual partition will
> be totalled up through the Append / MergeAppend simply by adding up
> the row estimates for each Append / MergeAppend child plan. So, it's
> really only an estimation problem for any node that comes after a join
> node or a group by node as the output rows for those nodes will depend
> on a good n_distinct estimate for the partitioned table.
>
> Partition-wise joins and aggregates do change things a bit as those
> features do permit moving those operations below the Append / Merge
> Append, in which case the statistics for the individual partition can
> be used.
>
> You could consider manually setting the n_distinct_inherited estimates
> for the columns that you join on or group by in the partitioned table.
> You might find that you're able to choose a suitable value for that if
> you review the documentation for that setting. In particular, please
> review what is mentioned about using negative numbers for that
> setting. You may be able to choose a value that scales correctly with
> the row estimate that doesn't get outdated as you add more rows to the
> partitions. You'll need to determine that based on the data you're
> storing.
>
> David




Re: data checksums

2024-08-07 Thread Daniel Gustafsson
> On 6 Aug 2024, at 18:29, Christophe Pettus  wrote:
>> On Aug 6, 2024, at 08:11, bruno vieira da silva  wrote:

>> the pg doc 
>> mentions a considerable performance penality, how considerable it is?
> 
> That line is probably somewhat out of date at this point.  We haven't seen a 
> significant slowdown in enabling them on any modern hardware.  I always turn 
> them on, except on the type of filesystems/hardware mentioned above.

The last in-depth analysis of data checksums (and hint bits) overhead that I
can remember is from 2019:

  https://www.postgresql.org/message-id/20190330192543.GH4719%40development

Hardware advances in the last five years may very well have made these findings
irrelevant however.

--
Daniel Gustafsson





Re: ANALYZE on partitioned tables vs on individual partitions

2024-08-07 Thread David Rowley
On Wed, 7 Aug 2024 at 19:20, Michael Harris  wrote:
> I found that running an ANALYZE specifying only those 4 columns only took
> 5 minutes, compared to the 30 minutes for the whole table.
>
> That was a bit of a surprise as I imagined actually reading the table would 
> take
> most of the time and would be the same regardless of the number of columns
> being analyzed, but I guess that is wrong.

ANALYZE does do sampling of the data in the table.  It would only read
all of the rows for fairly small tables.  The docs in [1] mention
this:

"For large tables, ANALYZE takes a random sample of the table
contents, rather than examining every row. This allows even very large
tables to be analyzed in a small amount of time."

I think the reason it's taking so long is not because of it performing
ANALYZE on the partitioned table which results in gathering statistics
for the partitioned table which means proportionately (based on the
size of the partition) sampling rows from each partition, it's more
likely due to the fact that each partition is also analysed and the
statistics for each of those is updated. There is no "ANALYZE ONLY"
command similar to "FROM ONLY" in SELECT queries.

You could probably do some sampling of the pg_stat_progress_analyze
view to figure out what's taking the most time. If you find that the
majority of the time is spent analysing the partitions and not the
partitioned table then maybe we should expand ANALYZE to add the ONLY
option...

David

[1] https://www.postgresql.org/docs/current/sql-analyze.html
[2] https://www.postgresql.org/docs/current/progress-reporting.html




Column type modification in big tables

2024-08-07 Thread Lok P
Hello all,
We have a postgres table which is a range partitions on a timestamp column
having total size ~3TB holding a total ~5billion rows spanning across ~150
daily partitions and there are ~140+columns in the table. Also this table
is a child to another partition table. And we have partition creation
handled through pg_partman extension on this table.

We have a requirement of modifying the existing column lengths as below .
So doing it directly through a single alter command will probably scan and
rewrite the whole table which may take hours of run time.

So trying to understand from experts what is the best possible way to
tackle such changes in postgres database? And if any side effects we may
see considering this table being child to another one and also dependency
with pg_partman extension.

two of the columns from varchar(20) to numeric(3)
one of the columns from varchar(20) to varchar(2)
one of the columns from Number(10,2) to Numeric(8,2)

Regards
Lok


Re: Column type modification in big tables

2024-08-07 Thread sud
On Wed, Aug 7, 2024 at 4:39 PM Lok P  wrote:

> Hello all,
> We have a postgres table which is a range partitions on a timestamp column
> having total size ~3TB holding a total ~5billion rows spanning across ~150
> daily partitions and there are ~140+columns in the table. Also this table
> is a child to another partition table. And we have partition creation
> handled through pg_partman extension on this table.
>
> We have a requirement of modifying the existing column lengths as below .
> So doing it directly through a single alter command will probably scan and
> rewrite the whole table which may take hours of run time.
>
> So trying to understand from experts what is the best possible way to
> tackle such changes in postgres database? And if any side effects we may
> see considering this table being child to another one and also dependency
> with pg_partman extension.
>
> two of the columns from varchar(20) to numeric(3)
> one of the columns from varchar(20) to varchar(2)
> one of the columns from Number(10,2) to Numeric(8,2)
>
>
>

Others may correct but i think, If you don't have the FK defined on these
columns you can do below.


--Alter table add column which will be very fast within seconds as it will
just add it to the data dictionary.

ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2
varchar2(3);


*-- Back populate the data partition wise and commit, if it's really needed*

UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
new_column2 = CAST(old_column2 AS varchar2(3)) ;
commit;
.


*--Alter table drop old columns which will be very fast within seconds as
it will just drop it from the data dictionary.*
ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2;


Re: Column type modification in big tables

2024-08-07 Thread Lok P
On Wed, Aug 7, 2024 at 4:51 PM sud  wrote:

>
>
> Others may correct but i think, If you don't have the FK defined on these
> columns you can do below.
>
>
> --Alter table add column which will be very fast within seconds as it will
> just add it to the data dictionary.
>
> ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2
> varchar2(3);
>
>
> *-- Back populate the data partition wise and commit, if it's really
> needed*
>
> UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
> new_column2 = CAST(old_column2 AS varchar2(3)) ;
> commit;
> UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
> new_column2 = CAST(old_column2 AS varchar2(3)) ;
> commit;
> UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
> new_column2 = CAST(old_column2 AS varchar2(3)) ;
> commit;
> .
>
>
> *--Alter table drop old columns which will be very fast within seconds as
> it will just drop it from the data dictionary.*
> ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2;
>



Thank you so much.

I understand this will be the fastest possible way to achieve the column
modification.

But talking about the dropped column which will be sitting in the table and
consuming storage space, Is it fine to leave as is or auto vacuum will
remove the column values behind the scene and also anyway , once those
partitions will be purged they will be by default purged. Is this
understanding correct?

 And also will this have any impact on the partition maintenance which is
currently done by pg_partman as because the template table is now different
internally(not from outside though). Will it cause conflict because of
those dropped columns from the main table?


Re: data checksums

2024-08-07 Thread Ron Johnson
On Wed, Aug 7, 2024 at 3:41 AM Daniel Gustafsson  wrote:

> > On 6 Aug 2024, at 18:29, Christophe Pettus  wrote:
> >> On Aug 6, 2024, at 08:11, bruno vieira da silva 
> wrote:
>
> >> the pg doc
> >> mentions a considerable performance penality, how considerable it is?
> >
> > That line is probably somewhat out of date at this point.  We haven't
> seen a significant slowdown in enabling them on any modern hardware.  I
> always turn them on, except on the type of filesystems/hardware mentioned
> above.
>
> The last in-depth analysis of data checksums (and hint bits) overhead that
> I
> can remember is from 2019:
>
>
> https://www.postgresql.org/message-id/20190330192543.GH4719%40development


A quote from that post:
"I have not investigated the exact reasons, but my hypothesis it's about
the amount of WAL generated during the initial CREATE INDEX (because it
probably ends up setting the hint bits), which puts additional pressure
on the storage."

Presuming that hypothesis is true: how often do "you" run CREATE INDEX (or
VACUUM FULL or CLUSTER)?  I certainly don't run them very often.

-- 
Death to America, and butter sauce!
Iraq lobster...


Re: Column type modification in big tables

2024-08-07 Thread sud
On Wed, Aug 7, 2024 at 5:00 PM Lok P  wrote:

>
>
> On Wed, Aug 7, 2024 at 4:51 PM sud  wrote:
>
>>
>>
>> Others may correct but i think, If you don't have the FK defined on these
>> columns you can do below.
>>
>>
>> --Alter table add column which will be very fast within seconds as it
>> will just add it to the data dictionary.
>>
>> ALTER TABLE tab1 ADD COLUMN new_column1 NUMERIC(3),   new_column2
>> varchar2(3);
>>
>>
>> *-- Back populate the data partition wise and commit, if it's really
>> needed*
>>
>> UPDATE tab1_part1 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>> commit;
>> UPDATE tab1_part2 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>> commit;
>> UPDATE tab1_part3 SET new_column1 = CAST(old_column1 AS NUMERIC(3)),
>> new_column2 = CAST(old_column2 AS varchar2(3)) ;
>> commit;
>> .
>>
>>
>> *--Alter table drop old columns which will be very fast within seconds as
>> it will just drop it from the data dictionary.*
>> ALTER TABLE your_table DROP COLUMN old_column1, DROP COLUMN old_column2;
>>
>
>
>
> Thank you so much.
>
> I understand this will be the fastest possible way to achieve the column
> modification.
>
> But talking about the dropped column which will be sitting in the table
> and consuming storage space, Is it fine to leave as is or auto vacuum will
> remove the column values behind the scene and also anyway , once those
> partitions will be purged they will be by default purged. Is this
> understanding correct?
>
>  And also will this have any impact on the partition maintenance which is
> currently done by pg_partman as because the template table is now different
> internally(not from outside though). Will it cause conflict because of
> those dropped columns from the main table?
>

I think leaving the table as is after the dropping column will be fine for
you because your regular partition maintenance/drop will slowly purge the
historical partitions and eventually they will be removed. But if you
update those new columns with the old column values, then autovacuum should
also take care of removing the rows with older column values (which are
dead actually) .

Not sure if pg_partman will cause any issue ,as because the table now has
the column data type/length changed. Others may confirm.


Re: Standard of data storage and transformation

2024-08-07 Thread yudhi s
On Wed, Aug 7, 2024 at 3:13 AM Ron Johnson  wrote:

> On Tue, Aug 6, 2024 at 5:07 PM yudhi s 
> wrote:
>
>> Hi All,
>> We are having a use case in which we are having transaction data for
>> multiple customers in one of the Postgres databases(version 15.4) and we
>> are consuming it from multiple sources(batch file processing, kafka event
>> processing etc). It's currently stored in normalized form postgres
>> with constraints, indexes, partitions defined. This postgres database is
>> holding the transaction data for around a month or so. There are use cases
>> of running online transaction search reports which will be mostly real time
>> reporting and also some daily transaction batch reports based on customers
>> and also month end reports for customers. In target state it will hold
>> Approx. ~400 million transactions/day which can be billions of rows across
>> multiple related parent/child tables.
>>
>> There is another requirement to send these customer transaction data to
>> an olap system which is in a snowflake database and there it will be
>> persisted for many years. The lag between the data in postgres/oltp and in
>> snowflake will be ~1hr. And any reporting api can query postgres for <1
>> month worth of transaction data and if it needs to scan for >1month worth
>> of transaction data, it will point to the snowflake database.
>>
>> Now the question which we are wondering is , should we send the data as
>> is in normalized table form to snowflake and then there we
>> transform/flatten the data to support the reporting use case or should we
>> first flatten or transform the data in postgres itself and make it as
>> another structure( for example creating materialized views on top of base
>> table) and only then move that data to the snowflake? What is the
>> appropriate standard and downside if we do anything different.
>>
>
> Some thoughts:
> 0) How big are the records?
> 1) Materialized views add disk space overhead.
> 2) Materialized views are for when you query the same static over and over
> again.
> 3) IIUC, you'll be moving the data from PG to Snowflake just once.
> 4) Writing an MV to disk and then reading it only once doubles the IO
> requirements.
> 5) Thus, my first thought would be to extract the data from PG using a
> denormalizing "plain" view.
> 5a) If you can't make that Fast Enough, then obviously you must pull the
> normalized data from PG and denorm it elsewhere.  You know your situation
> better than us.
> 6) Indices will be critical: not too many, but not too few.
> 7) Obviously consider partitioning, but note that too many partitions can
> make query planning MUCH slower.
> 7a) 31 days cut into hours means 744 partitions.  That's a LOT.
> 7b) Partitioning every THREE hours means only 248 child tables.  A lot,
> but much more manageable.
> 7c) That might well kill reporting performance, though, if it's for
> example one customer across many partitions.
> 8) You (hopefully) know what kind of queries will be run.  Maybe partition
> by customer (or whatever) range and THEN by an hour range.
> 8a) You'd have to simultaneously run multiple extract jobs (on for each
> "customer" range), but that might not be too onerous, since then each hour
> partition would be smaller.
> 9) Testing.  Nothing beats full-scale testing.
>
>
>
The table has ~100+ columns but I think the denormalized or the flatten
table which we are planning to create will mostly have a large number of
columns in it as that will be based on the columns from multiple tables
joined together. However, the plan was to have the cooked data ready so as
not to do the additional work in target or downstream. So I was thinking of
a physical model for persisting the transformed data(using MV) rather than
using a simple view which will have performance overhead.

Because what is happening is , after the data moves to snowflake , people
try to create their own version of the transformed table on top of
these normalized tables(which we call as refiners) and then query those
from UI or for reporting. And some people say we should avoid those
downstream refiners and should do it here in source/postgres.Also the plan
was to move the data from postgres once every hour.


Re: Windows installation problem at post-install step

2024-08-07 Thread Thomas Munro
Thanks.  The log didn't offer any more clues, and my colleague David R
has Windows and knows how to work its debugger so we sat down together
and chased this down (thanks David!).

1.  It is indeed calling abort(), but it's not a PANIC or Assert() in
PostgreSQL, it's an assertion inside Windows' own setlocale():

minkernel\crts\ucrt\src\appcrt\convert\mbstowcs.cpp(245) : Assertion
failed: (pwcs == nullptr && sizeInWords == 0) || (pwcs != nullptr &&
sizeInWords > 0)

2.  It is indeed confused about the encoding of the string
"Turkish_Türkiye.1254" itself, and works fine if you use "tr-TR".

3.  It doesn't happen on 15, because 16 added a key ingredient:

commit bf03cfd162176d543da79f9398131abc251ddbb9
Author: Peter Eisentraut 
Date:   Tue Jan 3 14:21:40 2023 +0100

Windows support in pg_import_system_collations

That causes it to spin through a bunch of system locales and switch to
them, and the first one is "aa".  After it calls:

setlocale(2, "aa");

... then the next call to restore the previous locale is something like:

setlocale(2, "Turkish_T\252rkiye.1254");

(That \252 == 0xfc probably depends on your system's default
encoding.)  It doesn't like that name anymore, and aborts.  A minimal
program with just those two lines shows that.

It appears that after switching to "aa", it interprets the string
passed to the next call to setlocale() as some other encoding
(probably UTF-8, I dunno).  I don't know why it doesn't fail and
return NULL, but there is a more general point that it's a bit bonkers
to use non-ASCII byte sequences in the library calls that are used to
control how non-ASCII byte sequences are interpreted.  Maybe it can be
done if you're careful, but in particular a naive save-and-restore
sequence just won't work.

I guess a save-and-restore done with wsetlocale() could fix that.  But
I decline to work on that, we need less Windows kludgery in the tree,
not more.  I think a better answer is "don't do that".

Really, we *have* to chase all these non-BCP-47 locales out of the
installer (I hope you can work on that?), out of PostgreSQL (testers
wanted[1]), and out of the world's existing clusters (maybe with
Dave's pg_upgrade idea, someone would need to write a patch, or maybe
someone could write a stand-alone locale migration program that just
connects to a cluster and (using some authoritative source, that's the
key bit to research) and replaces bad old names with nice new ones).

[1] 
https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zdbb+sxds59pv...@mail.gmail.com




Re: ANALYZE on partitioned tables vs on individual partitions

2024-08-07 Thread Michael Harris
> You could probably do some sampling of the pg_stat_progress_analyze
> view to figure out what's taking the most time.

I did another run, sampling the pg_stat_progress_analyze every 30s.

For the first 4 minutes it was working on the partitioned table.

After that it began analyzing all the partitions, which took approx 29 minutes.

I think you are correct - an ONLY option for ANALYZE would be a huge
benefit. In my use case, the autovacuum processes are keeping the partitions
analyzed so there would seem to be little benefit to including them in
the manual
table level ANALYZE.

Regards
Mike

On Wed, 7 Aug 2024 at 18:09, David Rowley  wrote:
>
> On Wed, 7 Aug 2024 at 19:20, Michael Harris  wrote:
> > I found that running an ANALYZE specifying only those 4 columns only took
> > 5 minutes, compared to the 30 minutes for the whole table.
> >
> > That was a bit of a surprise as I imagined actually reading the table would 
> > take
> > most of the time and would be the same regardless of the number of columns
> > being analyzed, but I guess that is wrong.
>
> ANALYZE does do sampling of the data in the table.  It would only read
> all of the rows for fairly small tables.  The docs in [1] mention
> this:
>
> "For large tables, ANALYZE takes a random sample of the table
> contents, rather than examining every row. This allows even very large
> tables to be analyzed in a small amount of time."
>
> I think the reason it's taking so long is not because of it performing
> ANALYZE on the partitioned table which results in gathering statistics
> for the partitioned table which means proportionately (based on the
> size of the partition) sampling rows from each partition, it's more
> likely due to the fact that each partition is also analysed and the
> statistics for each of those is updated. There is no "ANALYZE ONLY"
> command similar to "FROM ONLY" in SELECT queries.
>
> You could probably do some sampling of the pg_stat_progress_analyze
> view to figure out what's taking the most time. If you find that the
> majority of the time is spent analysing the partitions and not the
> partitioned table then maybe we should expand ANALYZE to add the ONLY
> option...
>
> David
>
> [1] https://www.postgresql.org/docs/current/sql-analyze.html
> [2] https://www.postgresql.org/docs/current/progress-reporting.html




How can I get a query-based subtotal in a select using group by rollup ?

2024-08-07 Thread dfgpostgres
pg 15.3 on linux

I don't even know if the title makes sense, I think it's better explained
with an example

This is the table...

dvdb=# \d dispatch_tracker
  Table "regr.dispatch_tracker"
   Column|   Type   | Collation | Nullable |
Default
-+--+---+--+-
 regression_name | character varying|   |  |
 domain_name | character varying|   |  |
 dispatch_status | character varying|   |  |
 total_tests | integer  |   |  |
 tests_running   | integer  |   |  |
 tests_completed | integer  |   |  |
 tests_passed| integer  |   |  |
 tests_failed| integer  |   |  |

And this query

select
  d.domain_name,
  d.total_tests,
  d.tests_completed,
  d.tests_passed,
  d.tests_failed,
  round(d.tests_completed*100.0/d.total_tests) as perc_tot_compl,
  round(d.tests_passed*100.0/d.tests_completed) as perc_compl_pass,
  round(d.tests_failed*100.0/d.tests_completed) as perc_compl_fail,
  round(d.tests_passed*100.0/d.total_tests) as perc_pass
from
  (select
domain_name,
sum(total_tests) as total_tests,
sum(tests_completed) as tests_completed,
sum(tests_passed) as tests_passed,
sum(tests_failed) as tests_failed,
  from
regr.dispatch_tracker rt where
 
rt.regression_name='2024_08_02_10_32_53_soundwave__er_common_regression__CL2017473_z1_soundwave_adm'
  group by rollup(rt.domain_name) order by rt.domain_name ASC NULLS LAST) d;

... generates something like this...

 domain_name | total_tests | tests_completed | tests_passed | tests_failed
| perc_tot_compl | perc_compl_pass | perc_compl_fail | perc_pass
-+-+-+--+--++-+-+---
 ACP |  87 |  82 |1 |   81
| 94 |   1 |  99 | 1
 CDP |  28 |  27 |0 |   27
| 96 |   0 | 100 | 0
 COH | 102 |  67 |0 |   67
| 66 |   0 | 100 | 0
 DCN | 181 | 180 |5 |  175
| 99 |   3 |  97 | 3
 DFD | 458 | 292 |   25 |  267
| 64 |   9 |  91 | 5
 DFT |1302 | 830 |0 |  830
| 64 |   0 | 100 | 0
 GDP | 413 | 308 |   29 |  279
| 75 |   9 |  91 | 7
 GFX |  96 |  72 |1 |   71
| 75 |   1 |  99 | 1
 INT |   9 |   2 |0 |2
| 22 |   0 | 100 | 0
 IPU |  24 | |  |
 || | |
 IPU_SANITY  |   2 | |  |
 || | |
 OSS |  43 |  43 |0 |   43
|100 |   0 | 100 | 0
 PWR | 535 | 207 |1 |  206
| 39 |   0 | 100 | 0
 SEC | 172 | 128 |3 |  125
| 74 |   2 |  98 | 2
 UMSCH   |  16 | |  |
 || | |
 VPE | 130 | 125 |1 |  124
| 96 |   1 |  99 | 1
 |3598 |2363 |   66 | 2297
| 66 |   3 |  97 | 2
(17 rows)

Now, I want to add a new column that's a subset of the "total_tests"
value.  Specifically, I want to add a column called "dispatched" which is
just the number of total_tests which have column "dispatch_status" = 'Y'

This blows up, but shows the intent...

select
  d.domain_name,
  d.total_tests,
  d.tests_completed,
  d.tests_passed,
  d.tests_failed,
  d.dispatched,
  round(d.tests_completed*100.0/d.total_tests) as perc_tot_compl,
  round(d.tests_passed*100.0/d.tests_completed) as perc_compl_pass,
  round(d.tests_failed*100.0/d.tests_completed) as perc_compl_fail,
  round(d.tests_passed*100.0

Re: How can I get a query-based subtotal in a select using group by rollup ?

2024-08-07 Thread David G. Johnston
On Wednesday, August 7, 2024, dfgpostgres  wrote:

>
>   (select
> domain_name,
> sum(total_tests) as total_tests,
> sum(tests_completed) as tests_completed,
> sum(tests_passed) as tests_passed,
> sum(tests_failed) as tests_failed,
> (select count(*) from dispatch_tracker where
> regression_name=rt.regression_name and domain_name=rt.domain_name and
> dispatch_status='Y') as dispatched
>   from
> regr.dispatch_tracker rt where  rt.regression_name='2024_08_
> 02_10_32_53_soundwave__er_common_regression__CL2017473_z1_soundwave_adm'
>   group by rollup(rt.domain_name) order by rt.domain_name ASC NULLS LAST)
> d;
>

Either add regression_name to the group by as the error hints at you, or
since you are already grouping implicitly by that (by virtue of the where
clause filter) and domain_name just count the number of dispatch_status=Y
in the group:  count(*) filter (where dispatch_status = ‘Y’)

The option that avoids the subquery is arguably better.  Though I’d
probably still include the regression_name in the output anyway - why hide
what you are filtering on.

David J.


Re: Windows installation problem at post-install step

2024-08-07 Thread Sandeep Thakkar
On Thu, Aug 8, 2024 at 6:10 AM Thomas Munro  wrote:

> Thanks.  The log didn't offer any more clues, and my colleague David R
> has Windows and knows how to work its debugger so we sat down together
> and chased this down (thanks David!).
>
> 1.  It is indeed calling abort(), but it's not a PANIC or Assert() in
> PostgreSQL, it's an assertion inside Windows' own setlocale():
>
> minkernel\crts\ucrt\src\appcrt\convert\mbstowcs.cpp(245) : Assertion
> failed: (pwcs == nullptr && sizeInWords == 0) || (pwcs != nullptr &&
> sizeInWords > 0)
>
> 2.  It is indeed confused about the encoding of the string
> "Turkish_Türkiye.1254" itself, and works fine if you use "tr-TR".
>
> 3.  It doesn't happen on 15, because 16 added a key ingredient:
>
> commit bf03cfd162176d543da79f9398131abc251ddbb9
> Author: Peter Eisentraut 
> Date:   Tue Jan 3 14:21:40 2023 +0100
>
> Windows support in pg_import_system_collations
>
> That causes it to spin through a bunch of system locales and switch to
> them, and the first one is "aa".  After it calls:
>
> setlocale(2, "aa");
>
> ... then the next call to restore the previous locale is something like:
>
> setlocale(2, "Turkish_T\252rkiye.1254");
>
> (That \252 == 0xfc probably depends on your system's default
> encoding.)  It doesn't like that name anymore, and aborts.  A minimal
> program with just those two lines shows that.
>
> It appears that after switching to "aa", it interprets the string
> passed to the next call to setlocale() as some other encoding
> (probably UTF-8, I dunno).  I don't know why it doesn't fail and
> return NULL, but there is a more general point that it's a bit bonkers
> to use non-ASCII byte sequences in the library calls that are used to
> control how non-ASCII byte sequences are interpreted.  Maybe it can be
> done if you're careful, but in particular a naive save-and-restore
> sequence just won't work.
>
> I guess a save-and-restore done with wsetlocale() could fix that.  But
> I decline to work on that, we need less Windows kludgery in the tree,
> not more.  I think a better answer is "don't do that".
>
> Really, we *have* to chase all these non-BCP-47 locales out of the
> installer (I hope you can work on that?),


yeah, It seems getlocales.cpp needs to be changed to achieve it.
I'll look into it

out of PostgreSQL (testers
> wanted[1]), and out of the world's existing clusters (maybe with
> Dave's pg_upgrade idea, someone would need to write a patch, or maybe
> someone could write a stand-alone locale migration program that just
> connects to a cluster and (using some authoritative source, that's the
> key bit to research) and replaces bad old names with nice new ones).
>
> [1]
> https://www.postgresql.org/message-id/flat/CA+hUKGJ=XThErgAQRoqfCy1bKPxXVuF0=2zdbb+sxds59pv...@mail.gmail.com
>


-- 
Sandeep Thakkar