When to store data that could be derived
Hi all As I understand it, a general rule of thumb is that you should never create a physical column if the data could be derived from existing columns. A possible reason for breaking this rule is for performance reasons. I have a situation where I am considering breaking the rule, but I am not experienced enough in SQL to know if my reason is valid. I would appreciate it if someone could glance at my 'before' and 'after' scenarios and see if, from a 'gut-feel' point of view, I should proceed. I have a VIEW constructed as follows - CREATE VIEW view_name AS [select statement 1] UNION ALL [select statement 2] etc. This is one of the select statements. I will give the 'after' scenario first - SELECT 'arec' AS tran_type, a.row_id AS tran_row_id, a.tran_number AS tran_number, a.cust_row_id AS cust_row_id, CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_date WHEN a.tran_type = 'cb_rec' THEN w.tran_date END AS tran_date, CASE WHEN a.tran_type = 'ar_rec' THEN y.text WHEN a.tran_type = 'cb_rec' THEN w.text END AS text, 0 - a.arec_cust AS amount_cust, 0 - a.arec_local AS amount_local FROM ar_rec_subtran a LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id WHERE CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = '1' The two columns a.arec_cust and a.arec_local *can* be derived from other columns, and in fact that is how it is working at the moment, so here is the 'before' scenario - SELECT 'arec' AS tran_type, a.row_id AS tran_row_id, a.tran_number AS tran_number, a.cust_row_id AS cust_row_id, CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_date WHEN a.tran_type = 'cb_rec' THEN w.tran_date END AS tran_date, CASE WHEN a.tran_type = 'ar_rec' THEN y.text WHEN a.tran_type = 'cb_rec' THEN w.text END AS text, ROUND(0 - (ROUND(a.arec_amount / CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate END * a.cust_exch_rate, u.scale)), 2) AS amount_cust, ROUND(0 - (ROUND(a.arec_amount / CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate END, s.scale)), 2) AS amount_local FROM ar_rec_subtran a LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id LEFT JOIN ar_customers v ON v.row_id = a.cust_row_id LEFT JOIN adm_currencies u ON u.row_id = v.currency_id LEFT JOIN adm_params t ON t.row_id = 1 LEFT JOIN adm_currencies s ON s.row_id = t.local_curr_id WHERE CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = '1' As you can see, complexity has increased and there are four additional JOINs. I am expecting the VIEW to be used extensively for query purposes, and my gut-feel says that the second one is likely to lead to performance problems in a system with a lot of data and a lot of users. I am not looking for an answer - I know that I should create dummy data and run some timing tests. I was just wondering if someone more experienced would wince when they look at the second SELECT, or if they would shrug and think that it looks fine. Any input will be appreciated. Frank Millman
Re: When to store data that could be derived
On 2019-03-24 9:25 AM, Ron wrote: On 3/24/19 1:42 AM, Frank wrote: Hi all As I understand it, a general rule of thumb is that you should never create a physical column if the data could be derived from existing columns. A possible reason for breaking this rule is for performance reasons. I have a situation where I am considering breaking the rule, but I am not experienced enough in SQL to know if my reason is valid. I would appreciate it if someone could glance at my 'before' and 'after' scenarios and see if, from a 'gut-feel' point of view, I should proceed. [snip] Sure the second query joins a lot of tables, but is pretty straightforward. What REALLY worries me is whether or not the query optimiser would look at the WHERE CASE, run away screaming and then make it use sequential scans. Thus, even query #1 would be slow. I had not realised that. I hope someone else chimes in on this. Is this a historical data set that's never updated, or current data that's constantly added to? It is the latter - current data constantly added to. Frank
Re: When to store data that could be derived
On 2019-03-24 11:11 AM, Tony Shelver wrote: Not the answer you are looking for, but... I'd suggest trying to create a non-trivial set of dummy data to test your assumptions before deciding on a route. It's saved my (professional) life a few times over the years when dealing with untested designs and new (to us) technology. Some years ago we were implementing an identity management system for a large US bank, with SQL Server as the store, with a planned integration to the id / access / permissions of some 300+ systems, targeting 30k plus users. Requirements kept changing as we added new systems to the initial mix, which the Id management package couldn't handle out the box, so we had to implement a custom design. We had to choose between 2 database designs, one being fully 'normalized' (considering that everything was an object') and one where we made some assumptions and fixed some table structures in the interest of performance. Eventually we spent a few days adding non-trivial amounts of test data to the proposed designs and it became quickly became very apparent that option 1 was unworkable once we got beyond 10 systems or so. Good advice - much appreciated. Frank
Re: When to store data that could be derived
On 2019-03-24 2:41 PM, Peter J. Holzer wrote: On 2019-03-24 10:05:02 +0200, Frank wrote: Many thanks to Peter et al for their valuable insights. I have learned a lot. > So the important part here is not whether data is added, but whether > data is changed. Sure, new transactions are added all the time. But is > it expected that the data used to derive amount_cust and amount_local > (e.g. the exchange rate) is changed retroactively, and if it is should > the computed amount change? (I'm a bit worried about the join with the > customers table here - what happens when a customer moves their > headquarters to a country with a different currency?) I think I have got both of those covered. I store the exchange rates in physical columns on the transaction, so the compute expressions will always return the same values. I have separate tables for 'organisations' (O) and for 'customers' (C). C has a foreign key reference to O, and most static data such as addresses and contact details are stored on O. So if a customer moved, I would create a new C record with the new currency, and flag the old C record as inactive. They can happily co-exist, so receipts can be processed against the old C record until it is paid up. I have started doing some volume tests, and at this stage, for the kind of volumes I am concerned about, it looks as if performance is a non-issue. I generated about 22000 invoices and 22000 receipts, over 12 customers and 6 months. Invoices and receipts are stored in separate tables, and a VIEW presents them as a single table. Using the VIEW, I selected all transactions for a given customer for a given month. It returned 620 rows and (on my slow desktop computer) it took 20ms. I can live with that. I will generate some higher volumes overnight, and see if it makes a big difference. If you do not hear from me, you can consider it 'problem solved' :-) Again, thanks to all. Frank
Re: When to store data that could be derived
On 2019-03-25 4:06 PM, Ron wrote: On 3/25/19 8:15 AM, Frank wrote: It would be interesting to see what the query planner tries to do with this: WHERE CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = '1' I have attached the schema showing the full VIEW definition, and the result of the following EXPLAIN - EXPLAIN SELECT * FROM ccc.ar_trans WHERE cust_row_id = 4 AND tran_date BETWEEN '2015-06-01' AND '2015-06-30'. Because I have used 'WHERE tran_date' in the query, and tran_date is also derived from a CASE expression, I imagine that will also add some complication. I am running PostgreSQL 11.1 on Fedora 29. Frank CREATE OR REPLACE VIEW ccc.ar_trans AS SELECT 'ar_inv'::text AS tran_type, a.row_id AS tran_row_id, a.tran_number, a.cust_row_id, a.tran_date, a.text, a.inv_net_cust + a.inv_tax_cust AS amount_cust, a.inv_net_local + a.inv_tax_local AS amount_local FROM ccc.ar_tran_inv a WHERE a.posted = true UNION ALL SELECT 'ar_crn'::text AS tran_type, a.row_id AS tran_row_id, a.tran_number, a.cust_row_id, a.tran_date, a.text, 0::numeric - (a.crn_net_cust + a.crn_tax_cust) AS amount_cust, 0::numeric - (a.crn_net_local + a.crn_tax_local) AS amount_local FROM ccc.ar_tran_crn a WHERE a.posted = true UNION ALL SELECT 'arec'::text AS tran_type, a.row_id AS tran_row_id, a.tran_number, a.cust_row_id, CASE WHEN a.tran_type::text = 'ar_rec'::text THEN y.tran_date WHEN a.tran_type::text = 'cb_rec'::text THEN w.tran_date ELSE NULL::date END AS tran_date, CASE WHEN a.tran_type::text = 'ar_rec'::text THEN y.text WHEN a.tran_type::text = 'cb_rec'::text THEN w.text ELSE NULL::character varying END AS text, 0::numeric - a.arec_cust AS amount_cust, round(0::numeric - round(a.arec_amount / CASE WHEN a.tran_type::text = 'ar_rec'::text THEN y.tran_exch_rate WHEN a.tran_type::text = 'cb_rec'::text THEN w.tran_exch_rate ELSE NULL::numeric END, u.scale), 2) AS amount_local FROM ccc.ar_rec_subtran a LEFT JOIN ccc.ar_tran_rec_det z ON z.row_id = a.tran_det_row_id LEFT JOIN ccc.ar_tran_rec y ON y.row_id = z.tran_row_id LEFT JOIN ccc.cb_tran_rec_det x ON x.row_id = a.tran_det_row_id LEFT JOIN ccc.cb_tran_rec w ON w.row_id = x.tran_row_id LEFT JOIN ccc.adm_params v ON v.row_id = 1 LEFT JOIN ccc.adm_currencies u ON u.row_id = v.local_curr_id WHERE CASE WHEN a.tran_type::text = 'ar_rec'::text THEN y.posted WHEN a.tran_type::text = 'cb_rec'::text THEN w.posted ELSE NULL::boolean END = true UNION ALL SELECT 'ar_disc'::text AS tran_type, a.row_id AS tran_row_id, a.tran_number, a.cust_row_id, a.tran_date, a.text, 0::numeric - (a.disc_net_cust + a.disc_tax_cust) AS amount_cust, 0::numeric - (a.disc_net_local + a.disc_tax_local) AS amount_local FROM ccc.ar_tran_disc a WHERE a.posted = true EXPLAIN SELECT * FROM ccc.ar_trans WHERE cust_row_id = 4 AND tran_date BETWEEN '2015-06-01' AND '2015-06-30'. QUERY PLAN --- Append (cost=0.00..2545.77 rows=404 width=132) -> Seq Scan on ar_tran_inv a (cost=0.00..780.70 rows=300 width=125) Filter: (posted AND (tran_date >= '2015-06-01'::date) AND (tran_date <= '2015-06-30'::date) AND (cust_row_id = 4)) -> Seq Scan on ar_tran_crn a_1 (cost=0.00..14.91 rows=1 width=172) Filter: (posted AND (tran_date >= '2015-06-01'::date) AND (tran_date &l
Re: When to store data that could be derived
On 2019-03-25 5:11 PM, Frank wrote: On 2019-03-25 4:06 PM, Ron wrote: On 3/25/19 8:15 AM, Frank wrote: It would be interesting to see what the query planner tries to do with this: WHERE CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = '1' I have attached the schema showing the full VIEW definition, and the result of the following EXPLAIN - EXPLAIN SELECT * FROM ccc.ar_trans WHERE cust_row_id = 4 AND tran_date BETWEEN '2015-06-01' AND '2015-06-30'. Because I have used 'WHERE tran_date' in the query, and tran_date is also derived from a CASE expression, I imagine that will also add some complication. I am running PostgreSQL 11.1 on Fedora 29. Frank On reflection, I have not been consistent with my use of indexes, and I think that will affect the query plan. There are at least two issues - 1. The first table defined in the VIEW is ccc.ar_tran_inv. It has the following index - "arinv_cust_date" btree (cust_row_id NULLS FIRST, tran_date NULLS FIRST) WHERE deleted_id = 0 I have not used "WHERE deleted_id = 0" when constructing the VIEW, but I have used "WHERE posted = '1'". I don't think the index can be used with this setup. 2. The complicated table in the VIEW is ccc.ar_rec_subtran. Various columns such as tran_date and posted are retrieved via CASE expressions from two underlying tables. Those tables have certain indexes defined, but I cannot see how they can be utilised from my current setup. I think I should spend some time tidying this up before you try to make sense of the query plan. Any tips on how to improve it will be appreciated. Frank
Re: When to store data that could be derived
On 2019-03-25 5:44 PM, Frank wrote: On reflection, I have not been consistent with my use of indexes, and I think that will affect the query plan. There are at least two issues - 1. The first table defined in the VIEW is ccc.ar_tran_inv. It has the following index - "arinv_cust_date" btree (cust_row_id NULLS FIRST, tran_date NULLS FIRST) WHERE deleted_id = 0 I have not used "WHERE deleted_id = 0" when constructing the VIEW, but I have used "WHERE posted = '1'". I don't think the index can be used with this setup. 2. The complicated table in the VIEW is ccc.ar_rec_subtran. Various columns such as tran_date and posted are retrieved via CASE expressions from two underlying tables. Those tables have certain indexes defined, but I cannot see how they can be utilised from my current setup. I think I should spend some time tidying this up before you try to make sense of the query plan. Any tips on how to improve it will be appreciated. I have spent a lot of time testing various permutations and trying to understand them using EXPLAIN. My original concern was the overhead of calculating derived data. I now realise that it is more important to get the indexes right, as that has a much bigger impact on performance. The VIEW that I have been using for testing is actually masking the problem. The view combines four tables, three of which are straightforward, easy to index, and fast. The fourth one is complex, difficult to index, and comparatively slow. So I forgot about the VIEW and concentrated on the complex table. I now understand the caveats I received earlier in this thread. It seems impossible to make use of the indexes on the JOINed tables in the following query. I did create an index on cust_row_id in the main table, and it made a difference with a simple query, but by the time I added the JOINs, the improvement was hardly noticeable. This is the query that I used - EXPLAIN SELECT COUNT(*) FROM ccc.ar_rec_subtran a LEFT JOIN ccc.ar_tran_rec_det z ON z.row_id = a.tran_det_row_id LEFT JOIN ccc.ar_tran_rec y ON y.row_id = z.tran_row_id LEFT JOIN ccc.cb_tran_rec_det x ON x.row_id = a.tran_det_row_id LEFT JOIN ccc.cb_tran_rec w ON w.row_id = x.tran_row_id WHERE a.cust_row_id = 4 AND CASE WHEN a.tran_type = 'ar_rec' THEN y.tran_date WHEN a.tran_type = 'cb_rec' THEN w.tran_date END BETWEEN '2015-05-01' AND '2015-05-31' AND CASE WHEN a.tran_type = 'ar_rec' THEN y.posted WHEN a.tran_type = 'cb_rec' THEN w.posted END = true AND a.deleted_id = 0; Attached is the EXPLAIN for this one. I also ran the same query with the following WHERE clause - WHERE a.cust_row_id = 4 AND ((a.tran_type = 'ar_rec' AND y.tran_date BETWEEN '2015-05-01' AND '2015-05-31') OR (a.tran_type = 'cb_rec' AND w.tran_date BETWEEN '2015-05-01' AND '2015-05-31')) AND ((a.tran_type = 'ar_rec' AND y.posted = true) OR (a.tran_type = 'cb_rec' AND w.posted = true)) AND a.deleted_id = 0; The timings were virtually identical, so I have not attached that EXPLAIN. Frank QUERY PLAN - Aggregate (cost=1514.75..1514.76 rows=1 width=8) -> Hash Left Join (cost=895.41..1514.49 rows=102 width=0) Hash Cond: (a.tran_det_row_id = x.row_id) Filter: (CASE WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN y.posted WHEN ((a.tran_type)::text = 'cb_rec'::text) THEN w.posted ELSE NULL::boolean END AND (CASE WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN y.tran_date WHEN ((a.tran_type)::text = 'cb_rec'::text) THEN w.tran_date ELSE NULL::date END >= '2015-05-01'::date) AND (CASE WHEN ((a.tran_type)::text = 'ar_rec'::t
Slow SELECT
Hi all I have a SELECT that runs over 5 times slower on PostgreSQL compared with Sql Server and sqlite3. I am trying to understand why. I have a table that looks like this (simplified) - CREATE TABLE my_table ( row_id SERIAL PRIMARY KEY, deleted_id INT DEFAULT 0, fld_1 INT REFERENCES table_1(row_id), fld_2 INT REFERENCES table_2(row_id), fld_3 INT REFERENCES table_3(row_id), fld_4 INT REFERENCES table_4(row_id), tran_date DATE, tran_total DEC(21,2) ); CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3, fld_4, tran_date) WHERE deleted_id = 0; The table sizes are - my_table : 167 rows table_1 : 21 rows table_2 : 11 rows table_3 : 3 rows table_4 : 16 rows Therefore for each tran_date in my_table there are potentially 21x11x3x16 = 11088 rows. Most will be null. I want to select the row_id for the last tran_date for each of those potential groups. This is my select - SELECT ( SELECT a.row_id FROM my_table a WHERE a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.deleted_id = 0 ORDER BY a.tran_date DESC LIMIT 1 ) FROM table_1 b, table_2 c, table_3 d, table_4 e Out of 11088 rows selected, 103 are not null. On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on sqlite3, and 0.31 sec on PostgreSQL. I have looked at the EXPLAIN, but I don't really know what to look for. I can supply it if that would help. Thanks for any advice. Frank Millman
Re: Slow SELECT
On 2020-05-26 9:32 AM, Olivier Gautherot wrote: Hi Frank, On Tue, May 26, 2020 at 9:23 AM Frank Millman <mailto:fr...@chagford.com>> wrote: Hi all I have a SELECT that runs over 5 times slower on PostgreSQL compared with Sql Server and sqlite3. I am trying to understand why. I have a table that looks like this (simplified) - CREATE TABLE my_table ( row_id SERIAL PRIMARY KEY, deleted_id INT DEFAULT 0, fld_1 INT REFERENCES table_1(row_id), fld_2 INT REFERENCES table_2(row_id), fld_3 INT REFERENCES table_3(row_id), fld_4 INT REFERENCES table_4(row_id), tran_date DATE, tran_total DEC(21,2) ); CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3, fld_4, tran_date) WHERE deleted_id = 0; The table sizes are - my_table : 167 rows table_1 : 21 rows table_2 : 11 rows table_3 : 3 rows table_4 : 16 rows Therefore for each tran_date in my_table there are potentially 21x11x3x16 = 11088 rows. Most will be null. I want to select the row_id for the last tran_date for each of those potential groups. This is my select - SELECT ( SELECT a.row_id FROM my_table a WHERE a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.deleted_id = 0 ORDER BY a.tran_date DESC LIMIT 1 ) FROM table_1 b, table_2 c, table_3 d, table_4 e Out of 11088 rows selected, 103 are not null. On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on sqlite3, and 0.31 sec on PostgreSQL. SQL Server does a good job at caching data in memory. PostgreSQL does too on consecutive calls to the same table. What execution time do you get if you issue the query a second time? My first guess would be to add an index on my_table.tran_date and check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table. I have looked at the EXPLAIN, but I don't really know what to look for. I can supply it if that would help. Thanks for any advice. Thanks Olivier. Unfortunately that did not help. I was already running the query twice and only timing the second one. I added the index on tran_date. The timing is the same, and EXPLAIN shows that it is using a SEQUENTIAL SCAN. Here is the EXPLAIN - Nested Loop (cost=0.00..64155.70 rows=11088 width=4) -> Nested Loop (cost=0.00..10.36 rows=528 width=12) -> Nested Loop (cost=0.00..2.56 rows=33 width=8) -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11 width=4) -> Materialize (cost=0.00..1.04 rows=3 width=4) -> Seq Scan on table_3 d (cost=0.00..1.03 rows=3 width=4) -> Materialize (cost=0.00..1.24 rows=16 width=4) -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16 width=4) -> Materialize (cost=0.00..1.31 rows=21 width=4) -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4) SubPlan 1 -> Limit (cost=5.77..5.77 rows=1 width=8) -> Sort (cost=5.77..5.77 rows=1 width=8) Sort Key: a.tran_date DESC -> Seq Scan on my_table a (cost=0.00..5.76 rows=1 width=8) Filter: ((fld_1 = b.row_id) AND (fld_2 = c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND (deleted_id = 0)) Frank
Re: Slow SELECT
On 2020-05-26 11:10 AM, Charles Clavadetscher wrote: Hello On 2020-05-26 10:38, Frank Millman wrote: On 2020-05-26 9:32 AM, Olivier Gautherot wrote: Hi Frank, On Tue, May 26, 2020 at 9:23 AM Frank Millman <mailto:fr...@chagford.com>> wrote: Hi all I have a SELECT that runs over 5 times slower on PostgreSQL compared with Sql Server and sqlite3. I am trying to understand why. I have a table that looks like this (simplified) - CREATE TABLE my_table ( row_id SERIAL PRIMARY KEY, deleted_id INT DEFAULT 0, fld_1 INT REFERENCES table_1(row_id), fld_2 INT REFERENCES table_2(row_id), fld_3 INT REFERENCES table_3(row_id), fld_4 INT REFERENCES table_4(row_id), tran_date DATE, tran_total DEC(21,2) ); CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3, fld_4, tran_date) WHERE deleted_id = 0; The table sizes are - my_table : 167 rows table_1 : 21 rows table_2 : 11 rows table_3 : 3 rows table_4 : 16 rows Therefore for each tran_date in my_table there are potentially 21x11x3x16 = 11088 rows. Most will be null. I want to select the row_id for the last tran_date for each of those potential groups. This is my select - SELECT ( SELECT a.row_id FROM my_table a WHERE a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.deleted_id = 0 ORDER BY a.tran_date DESC LIMIT 1 ) FROM table_1 b, table_2 c, table_3 d, table_4 e Out of 11088 rows selected, 103 are not null. On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on sqlite3, and 0.31 sec on PostgreSQL. SQL Server does a good job at caching data in memory. PostgreSQL does too on consecutive calls to the same table. What execution time do you get if you issue the query a second time? My first guess would be to add an index on my_table.tran_date and check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table. I have looked at the EXPLAIN, but I don't really know what to look for. I can supply it if that would help. Thanks for any advice. Thanks Olivier. Unfortunately that did not help. I was already running the query twice and only timing the second one. I added the index on tran_date. The timing is the same, and EXPLAIN shows that it is using a SEQUENTIAL SCAN. Here is the EXPLAIN - Nested Loop (cost=0.00..64155.70 rows=11088 width=4) -> Nested Loop (cost=0.00..10.36 rows=528 width=12) -> Nested Loop (cost=0.00..2.56 rows=33 width=8) -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11 width=4) -> Materialize (cost=0.00..1.04 rows=3 width=4) -> Seq Scan on table_3 d (cost=0.00..1.03 rows=3 width=4) -> Materialize (cost=0.00..1.24 rows=16 width=4) -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16 width=4) -> Materialize (cost=0.00..1.31 rows=21 width=4) -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4) SubPlan 1 -> Limit (cost=5.77..5.77 rows=1 width=8) -> Sort (cost=5.77..5.77 rows=1 width=8) Sort Key: a.tran_date DESC -> Seq Scan on my_table a (cost=0.00..5.76 rows=1 width=8) Filter: ((fld_1 = b.row_id) AND (fld_2 = c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND (deleted_id = 0)) Frank If I see it correct, the query runs sequential scans on all tables, i.e. table_1 to table_4. Do you have an index on the referenced keys (row_id) in table_1 to table_4? It happens often that referenced keys are not indexed, leading to poor execution plans. The referenced keys are all defined as SERIAL PRIMARY KEY in their own tables, so I presume that that are all indexed automatically. On the other hand, there are not many rows in those tables, so the planner may decide not to use the index in that case. Frank
Re: Slow SELECT
On 2020-05-26 12:02 PM, Christian Ramseyer wrote: Hi On 26.05.20 09:22, Frank Millman wrote: I have looked at the EXPLAIN, but I don't really know what to look for. I can supply it if that would help. My favorite approach to tuning Postgres queries is: 1. Run EXPLAIN ANALYZE 2. Copy/Paste the output into the fantastic https://explain.depesz.com/ This will turn the somewhat hard-to-understand explain output into a nice colored structure. If it's not obvious from the orange-reddish boxes where the slowness comes from, please post the link here and somebody will certainly have some advice. Thanks, Christian. I will definitely look into that. Frank
Re: Slow SELECT
On 2020-05-26 11:27 AM, Charles Clavadetscher wrote: On 2020-05-26 11:10, Charles Clavadetscher wrote: Hello On 2020-05-26 10:38, Frank Millman wrote: On 2020-05-26 9:32 AM, Olivier Gautherot wrote: Hi Frank, On Tue, May 26, 2020 at 9:23 AM Frank Millman <mailto:fr...@chagford.com>> wrote: Hi all I have a SELECT that runs over 5 times slower on PostgreSQL compared with Sql Server and sqlite3. I am trying to understand why. I have a table that looks like this (simplified) - CREATE TABLE my_table ( row_id SERIAL PRIMARY KEY, deleted_id INT DEFAULT 0, fld_1 INT REFERENCES table_1(row_id), fld_2 INT REFERENCES table_2(row_id), fld_3 INT REFERENCES table_3(row_id), fld_4 INT REFERENCES table_4(row_id), tran_date DATE, tran_total DEC(21,2) ); CREATE UNIQUE INDEX my_table_ndx ON my_table (fld_1, fld_2, fld_3, fld_4, tran_date) WHERE deleted_id = 0; The table sizes are - my_table : 167 rows table_1 : 21 rows table_2 : 11 rows table_3 : 3 rows table_4 : 16 rows Therefore for each tran_date in my_table there are potentially 21x11x3x16 = 11088 rows. Most will be null. I want to select the row_id for the last tran_date for each of those potential groups. This is my select - SELECT ( SELECT a.row_id FROM my_table a WHERE a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.deleted_id = 0 ORDER BY a.tran_date DESC LIMIT 1 ) FROM table_1 b, table_2 c, table_3 d, table_4 e Out of 11088 rows selected, 103 are not null. On identical data, this takes 0.06 sec on SQL Server, 0.04 sec on sqlite3, and 0.31 sec on PostgreSQL. SQL Server does a good job at caching data in memory. PostgreSQL does too on consecutive calls to the same table. What execution time do you get if you issue the query a second time? My first guess would be to add an index on my_table.tran_date and check in EXPLAIN that you don't have a SEQUENTIAL SCAN on that table. I have looked at the EXPLAIN, but I don't really know what to look for. I can supply it if that would help. Thanks for any advice. Thanks Olivier. Unfortunately that did not help. I was already running the query twice and only timing the second one. I added the index on tran_date. The timing is the same, and EXPLAIN shows that it is using a SEQUENTIAL SCAN. Here is the EXPLAIN - Nested Loop (cost=0.00..64155.70 rows=11088 width=4) -> Nested Loop (cost=0.00..10.36 rows=528 width=12) -> Nested Loop (cost=0.00..2.56 rows=33 width=8) -> Seq Scan on table_2 c (cost=0.00..1.11 rows=11 width=4) -> Materialize (cost=0.00..1.04 rows=3 width=4) -> Seq Scan on table_3 d (cost=0.00..1.03 rows=3 width=4) -> Materialize (cost=0.00..1.24 rows=16 width=4) -> Seq Scan on table_4 e (cost=0.00..1.16 rows=16 width=4) -> Materialize (cost=0.00..1.31 rows=21 width=4) -> Seq Scan on table_1 b (cost=0.00..1.21 rows=21 width=4) SubPlan 1 -> Limit (cost=5.77..5.77 rows=1 width=8) -> Sort (cost=5.77..5.77 rows=1 width=8) Sort Key: a.tran_date DESC -> Seq Scan on my_table a (cost=0.00..5.76 rows=1 width=8) Filter: ((fld_1 = b.row_id) AND (fld_2 = c.row_id) AND (fld_3 = d.row_id) AND (fld_4 = e.row_id) AND (deleted_id = 0)) Frank If I see it correct, the query runs sequential scans on all tables, i.e. table_1 to table_4. Do you have an index on the referenced keys (row_id) in table_1 to table_4? It happens often that referenced keys are not indexed, leading to poor execution plans. Bye Charles I noticed later that you have very small tables. This will probably lead to a sequential scan althought there is an index in place. I am not sure if it makes a difference, but what about using explicit joins? SELECT a.row_id FROM my_table a JOIN b table_1 ON (b.row_id = a.fld_1) JOIN c table_2 ON (c.row_id = a.fld_2) JOIN d table_3 ON (d.row_id = a.fld_3) JOIN e table_4 ON (e.row_id = a.fld_4) WHERE a.deleted_id = 0 ORDER BY a.tran_date DESC LIMIT 1; Thanks, Charles. I tried that, but unfortunately it produces a different result. I need to test for every possible combination of fld1-4, and get the highest date for each one. Using joins only tests existing combinations, and gets the highest date for all of them combined. Seel my reply to David Rowley. I do not fully understand his solution yet, but it seems to be what I am looking for. Thanks again Frank
Re: Slow SELECT
On 2020-05-26 12:04 PM, David Rowley wrote: On Tue, 26 May 2020 at 19:23, Frank Millman wrote: The table sizes are - my_table : 167 rows table_1 : 21 rows table_2 : 11 rows table_3 : 3 rows table_4 : 16 rows Therefore for each tran_date in my_table there are potentially 21x11x3x16 = 11088 rows. Most will be null. I want to select the row_id for the last tran_date for each of those potential groups. This is my select - SELECT ( SELECT a.row_id FROM my_table a WHERE a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.deleted_id = 0 ORDER BY a.tran_date DESC LIMIT 1 ) FROM table_1 b, table_2 c, table_3 d, table_4 e Out of 11088 rows selected, 103 are not null. Perhaps SQL Server is doing something to rewrite the subquery in the target list to a LEFT JOIN. PostgreSQL currently does not do that. Since "my_table" is small, you'd likely be much better doing a manual rewrite of the query to join a subquery containing the required details from "my_table". It looks like you want the row_id from the latest tran_date for each fld_N column. So something like: SELECT a.row_id FROM table_1 b CROSS JOIN table_2 c CROSS JOIN table_3 d CROSS JOIN table_4 e LEFT OUTER JOIN ( SELECT fld_1,fld_2,fld_3,fld_4,row_id,tran_date, ROW_NUMBER() OVER (PARTITION BY fld_1,fld_2,fld_3,fld_4 ORDER BY tran_date DESC) row_num FROM my_table WHERE deleted_id = 0 ) a ON a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.row_num = 1; Should do it. You could also perhaps do something with DISTINCT ON instead of using ROW_NUMBER(). That might be a bit more efficient, but it's unlikely to matter too much since there are only 167 rows in that table. Thank you David. I tried that and it produced the correct result in 53ms, which is what I am looking for. It will take me some time to understand it fully, so I have some homework to do! Much appreciated. Frank
Re: Slow SELECT
On 2020-05-26 1:45 PM, David Rowley wrote: On Tue, 26 May 2020 at 22:31, Frank Millman wrote: Thank you David. I tried that and it produced the correct result in 53ms, which is what I am looking for. It will take me some time to understand it fully, so I have some homework to do! The main problem with your previous query was that the subquery was being executed 11088 times and could only ever find anything 167 times. The remaining number of times nothing would be found. I just changed the subquery which would be executed once per output row and altered it so it became a subquery that's joined and only executed once. The ROW_NUMBER() is a windowing function, which is explained in [1]. I used this to get the row_id of the record with the lowest tran_date, just like you were doing with the ORDER BY tran_date DESC LIMIT 1, but the subquery with the windowing function gets them all at once, rather than doing it in a way that requires it to be executed once for each row in the top-level query. In this case, the functionality that the LIMIT 1 does in your query is achieved with "AND a.row_num = 1;" in my version. This is pretty fast to execute once due to there only being 167 rows. It's also important to know that there may be cases where the method I proposed is slower. For example, if my_table was very large and contained rows that were not in table_1 to table_4. Since the subquery in my version calculates everything then it could be wasteful to do that for values that would never be used. For you, you have foreign keys that ensure my_table does not contain records that are not in the other tables, but you could still see this issue if you were to add some restrictive WHERE clause to the outer query. Perhaps this won't be a problem for you, but it's likely good to know. [1] https://www.postgresql.org/docs/current/tutorial-window.html Thanks very much for the explanation. I will go through it carefully. For the record, your query works without modification in both Sql Server and sqlite3. It is also much faster in all three cases - all around 0.005 sec instead of 0.05 sec. Frank
Re: Slow SELECT
On 2020-05-26 12:04 PM, David Rowley wrote: Since "my_table" is small, you'd likely be much better doing a manual rewrite of the query to join a subquery containing the required details from "my_table". It looks like you want the row_id from the latest tran_date for each fld_N column. So something like: SELECT a.row_id FROM table_1 b CROSS JOIN table_2 c CROSS JOIN table_3 d CROSS JOIN table_4 e LEFT OUTER JOIN ( SELECT fld_1,fld_2,fld_3,fld_4,row_id,tran_date, ROW_NUMBER() OVER (PARTITION BY fld_1,fld_2,fld_3,fld_4 ORDER BY tran_date DESC) row_num FROM my_table WHERE deleted_id = 0 ) a ON a.fld_1 = b.row_id AND a.fld_2 = c.row_id AND a.fld_3 = d.row_id AND a.fld_4 = e.row_id AND a.row_num = 1; Should do it. You could also perhaps do something with DISTINCT ON instead of using ROW_NUMBER(). That might be a bit more efficient, but it's unlikely to matter too much since there are only 167 rows in that table. I have studied the above SELECT, and I now more or less understand it. I would not have come up with that unaided, so many thanks. I tried DISTINCT ON, and it was very efficient, but unfortunately that is not supported by SQL Server or sqlite3. Then I came up with this alternative, which works on all three platforms and seems a bit faster than the above - SELECT a.row_id FROM ( SELECT row_id, ROW_NUMBER() OVER (PARTITION BY fld_1, fld_2, fld_3, fld_4 ORDER BY tran_date DESC) row_num FROM my_table WHERE deleted_id = 0 ) as a WHERE a.row_num = 1 Do you see any problem with this? Thanks Frank
Re: Why password authentication failed for user "postgres"?
On 2022-06-04 00:32, BeginnerC wrote: Hello everyone, I am a newbie to the postgres,when I use the psql to connect to the postgres,a error message printed: These command list like this: psql -U postgres Password for user postgres:postgres *postgreSQL: password authentication failed for user "postgres"* * * How to solve this problem? Thanks in advance! * Which OS are you working on? * Can you show us your pg_hba.conf? If you use -U with psql, the connection must not try to use method "peer" (which means "use the system user with this name", but also means "You must be logged in with the system user corresponding to the postgresql user"), but some kind of authorization, like md5, and use it with IP address. This might be difficult for the user postgres, who often has no password set in the database. If you really need to connect with user postgres and some password, you can set one using the method described by Reid Thompson. Regards, Frank OpenPGP_0xE5418B2366C33EE6.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: first order by then partition by x < a fixed value.
Am Thu, Jul 14, 2022 at 01:23:55PM +0530 schrieb jian he: > This question (https://stackoverflow.com/q/72975669/15603477) is fun. > DB fiddle link: > https://dbfiddle.uk/?rdbms=postgres_14&fiddle=36d685ad463831877ae70361be2cfa3b > > account > size idname 1001 John 2002 Mary 3003 > Jane 4004 Anne1005 Mike 6006 Joanne > > Then expected output: account group size idname 1 100 > 1 John 1 2002 Mary 1 3003 Jane 2 > 4004 Anne2 1005 Mike 3 6006 Joanne > > Idea is fixed order by id then cumulative sum. if <=600 then grouped > together using the same row_number. > > But I imagine this kind question has been solved many times. > Current posted solution uses a recursive query, which is really hard for > me. > Is there any simple or more intuitive way to solve this kind of problem? > I can write some customized aggregate function to aggregate stops at 600. > > I can get the following result, So what's the next step? > > +--+++---+-+ > | size | id | name | capped_at_600 | large_or_eq_600 | > +--+++---+-+ > | 100 | 1 | John | 100 | f | > | 200 | 2 | Mary | 300 | f | > | 300 | 3 | Jane | 600 | t | > | 400 | 4 | Anne | 400 | f | > | 100 | 5 | Mike | 500 | f | > | 600 | 6 | Joanne | 1100 | t | > +--+++---+-+ > > > > -- > I recommend David Deutsch's <> > > Jian My proposal: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7daef32ae39b2ec7c38a83cf9e19d4ae select id, name, size , sum(size) over (order by id) as size_total , ((sum(size) over (order by id) - 1) / 600) + 1 as size_group from account order by id, name; Best regards Frank
Re: a database can be created but not droped
Am Mon, Aug 01, 2022 at 11:22:33AM +0200 schrieb Matthias Apitz: > testdb=# DROP DATABASE IF EXISTS SRP-27097 WITH FORCE; > ERROR: syntax error at or near "-" > ZEILE 1: DROP DATABASE IF EXISTS SRP-27097 WITH FORCE; > > Why is this? Please try: DROP DATABASE IF EXISTS "SRP-27097" WITH FORCE; Best rehards Frank
Question about locking
Hi all Apologies if this is an FAQ, but I could not find the answer I was looking for. I want to run two SELECTs, one immediately after the other. I want to ensure that no other process updates the database in between the two. What is the best way to achieve this? Thanks Frank Millman
Re: Question about locking
On 2022-08-07 7:54 AM, Ron wrote: On 8/7/22 00:30, Frank Millman wrote: Hi all Apologies if this is an FAQ, but I could not find the answer I was looking for. I want to run two SELECTs, one immediately after the other. I want to ensure that no other process updates the database in between the two. What is the best way to achieve this? You probably want to put the queries in a Repeatable Read transaction, or even a READ ONLY serialized transaction. https://www.postgresql.org/docs/12/transaction-iso.html#XACT-REPEATABLE-READ https://www.postgresql.org/docs/12/transaction-iso.html#XACT-SERIALIZABLE https://www.postgresql.org/docs/12/sql-set-transaction.html Thanks, Ron. That is exactly what I am looking for. Frank
Re: pg_restore 12 "permission denied for schema" errors
Hi Ron, Am 20.10.22 um 02:58 schrieb Ron: RDS Postgresql 12.11 Backed up a database from one RDS instance, and now am trying to restore it to a new instance. (If you're wondering why, it's to restore in an instance with less disk space.) Here are the commands: export PGHOST=${RDSENV}..us-east-1.rds.amazonaws.com cd /migrate/TASK001793786/2022-10-19b DB=sides pg_restore -v --create --clean --jobs=`nproc` -Fd --dbname=template1 $DB What database do you want to restore your data into? As far as I know your pg_restore command would import the data into template1, right? pg_restore's manpage says: […] > -ddbname > --dbname=dbname > Connect to database dbnameand restore directly into the database. The dbnamecan be a connection string. If so, connection string parameters will override any conflicting command line options. > […] ^^ And I'm wondering, if it's what you want. Possibly your errors could come from there… Cheers, Frank.
Re: pg_restore 12 "permission denied for schema" errors
Hi Tom, Am 20.10.22 um 07:12 schrieb Tom Lane: Frank Gard writes: Am 20.10.22 um 02:58 schrieb Ron: pg_restore -v --create --clean --jobs=`nproc` -Fd --dbname=template1 $DB What database do you want to restore your data into? As far as I know your pg_restore command would import the data into template1, right? No --- my ears went up about that too, but he's using --create. So template1 is just the springboard for the initial connection in which CREATE DATABASE will be issued. you're right, of course. Overseen that detail. I've to learn to read more carefully before answering. Sorry. Frank.
Calling function from VFP changes character field to Memo
Hi, I have a simple table Patients with one field FirstName of type character(30). If I SELECT FirstName From public.patients; I get back the expected character(30) field. If however I have a function defined like this CREATE OR REPLACE FUNCTION public.testfunction( ) RETURNS TABLE ( Firstname character(30) ) LANGUAGE 'plpgsql' AS $BODY$ BEGIN RETURN QUERY SELECT p.cFirstName FROM patients p; END; $BODY$; And I call: SELECT * FROM public.testFunction(); Then FirstName returns as a Memo field (similar to a Text field). Any idea what I need to do to get it to return the character(30) type? -- Frank. Frank Cazabon
Re: Calling function from VFP changes character field to Memo
On 15/11/2022 2:48 pm, Adrian Klaver wrote: On 11/15/22 10:43 AM, Frank Cazabon wrote: Please reply to list als. Ccing list Sorry about that, first time using this list and just assumed I was replying to the list and the list would then notify you SELECT * FROM public.testFunction(); SELECT firstname from FROM public.testFunction(); This has the same result. How about?: SELECT firstname::varchar(30) from FROM public.testFunction(); That does work, thanks, but it may cause me some logic problems in the rest of the code. I'll try the type suggestion from Tom and see if that works and then decide which is my best approach. Thanks Then FirstName returns as a Memo field (similar to a Text field). Any idea what I need to do to get it to return the character(30) type?
Re: Calling function from VFP changes character field to Memo
On 15/11/2022 2:44 pm, Tom Lane wrote: Frank Cazabon writes: If however I have a function defined like this CREATE OR REPLACE FUNCTION public.testfunction( ) RETURNS TABLE ( Firstname character(30) ) LANGUAGE 'plpgsql' AS $BODY$ BEGIN RETURN QUERY SELECT p.cFirstName FROM patients p; END; $BODY$; And I call: SELECT * FROM public.testFunction(); Then FirstName returns as a Memo field (similar to a Text field). This is mostly about whatever software stack you're using on the client side --- Memo is certainly not something Postgres knows about. Any idea what I need to do to get it to return the character(30) type? There's no chance of getting back the "30" part with this structure, because function signatures do not carry length restrictions. What I expect is happening is that you get firstname as an unspecified-length "character" type, and something on the client side is deciding to cope with that by calling it "Memo" instead. You could perhaps work around that by defining a named composite type: create type testfunction_result as (firstname character(30), ...); create function testfunction() returns setof testfunction_result as ... regards, tom lane Thanks, so I could define the function like this - removed the (30): CREATE OR REPLACE FUNCTION public.testfunction( ) RETURNS TABLE ( Firstname character ) LANGUAGE 'plpgsql' I'll try the type definition and see if that helps.
Re: Calling function from VFP changes character field to Memo
I don't think that's necessary, I'm 100% certain that it's VFP not able to interpret the size of what is coming back to it so it just gives it the biggest type it can. Thanks 15 Nov 2022 14:59:59 Ron : > On 11/15/22 12:54, Frank Cazabon wrote: >> >> On 15/11/2022 2:44 pm, Tom Lane wrote: >>> Frank Cazabon writes: >>>> If however I have a function defined like this >>>> CREATE OR REPLACE FUNCTION public.testfunction( >>>> ) >>>> RETURNS TABLE >>>> ( >>>> Firstname character(30) >>>> ) >>>> LANGUAGE 'plpgsql' >>>> AS $BODY$ >>>> BEGIN >>>> RETURN QUERY SELECT p.cFirstName FROM patients p; >>>> END; >>>> $BODY$; >>>> And I call: >>>> SELECT * FROM public.testFunction(); >>>> Then FirstName returns as a Memo field (similar to a Text field). >>> This is mostly about whatever software stack you're using on the >>> client side --- Memo is certainly not something Postgres knows about. >>> >>>> Any idea what I need to do to get it to return the character(30) type? >>> There's no chance of getting back the "30" part with this structure, >>> because function signatures do not carry length restrictions. >>> What I expect is happening is that you get firstname as an >>> unspecified-length "character" type, and something on the client >>> side is deciding to cope with that by calling it "Memo" instead. >>> >>> You could perhaps work around that by defining a named composite >>> type: >>> >>> create type testfunction_result as (firstname character(30), ...); >>> >>> create function testfunction() returns setof testfunction_result as ... >>> >>> regards, tom lane >> Thanks, so I could define the function like this - removed the (30): >> >> CREATE OR REPLACE FUNCTION public.testfunction( >> ) >> RETURNS TABLE >> ( >> Firstname character >> ) >> LANGUAGE 'plpgsql' >> >> I'll try the type definition and see if that helps. > > I think you should try the original function in psql. That will delineate > Postgresql from framework. > > -- > Angular momentum makes the world go 'round.
Re: Calling function from VFP changes character field to Memo
On 15/11/2022 2:58 pm, Adrian Klaver wrote: On 11/15/22 10:54 AM, Frank Cazabon wrote: On 15/11/2022 2:44 pm, Tom Lane wrote: Frank Cazabon writes: If however I have a function defined like this CREATE OR REPLACE FUNCTION public.testfunction( ) RETURNS TABLE ( Firstname character(30) ) LANGUAGE 'plpgsql' AS $BODY$ BEGIN RETURN QUERY SELECT p.cFirstName FROM patients p; END; $BODY$; And I call: SELECT * FROM public.testFunction(); Then FirstName returns as a Memo field (similar to a Text field). This is mostly about whatever software stack you're using on the client side --- Memo is certainly not something Postgres knows about. Any idea what I need to do to get it to return the character(30) type? There's no chance of getting back the "30" part with this structure, because function signatures do not carry length restrictions. What I expect is happening is that you get firstname as an unspecified-length "character" type, and something on the client side is deciding to cope with that by calling it "Memo" instead. You could perhaps work around that by defining a named composite type: create type testfunction_result as (firstname character(30), ...); create function testfunction() returns setof testfunction_result as ... regards, tom lane Thanks, so I could define the function like this - removed the (30): CREATE OR REPLACE FUNCTION public.testfunction( ) RETURNS TABLE ( Firstname character ) LANGUAGE 'plpgsql' No you don't want to do that: select 'test'::char; bpchar t vs select 'test'::varchar; varchar - test Besides you missed the important part, after creating the type testfunction_result: create function testfunction() returns setof testfunction_result as ... Sorry about the confusion, I have got it working using the type definition. Thanks for the help I'll try the type definition and see if that helps.
PostgreSQL listens on localhost?
I have read that PostgreSQL listens on localhost is it possible to have it listen on an IP address? Thank you. FDG
Re: PostgreSQL listens on localhost?
Hi Adrian, I have just started with PostgreSQL after years away from Micro$oft SQL Server. Thank you very much for the quick reference as I have also been trying to get through the documentation (all 3063 pages of it!). FDG On Tue, Aug 1, 2023 at 11:42 AM Adrian Klaver wrote: > On 8/1/23 09:12, Frank Gunseor wrote: > > I have read that PostgreSQL listens on localhost is it possible to have > > it listen on an IP address? > > Read: > > > https://www.postgresql.org/docs/current/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS > > > > > Thank you. > > > > FDG > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
pgBadger: Cannot find any log entries from systemd-journald
Hello, I think I just might did a small configuration issue, but somehow I cannot figure it out by myself. I want to let pgbadger directly fetch data from journald -- so I was using with a unexpected outcome: $ pgbadger --journalctl "journalctl -u postgresql.service" LOG: Ok, generating html report...s: 0, events: 0 Having a look into the journal there is a lot of ---cut Mar 05 09:06:23 myhost postgres[441820]: 2024-03-05 08:06:23 GMT [441820]: user=[unknown],db=[unknown],app=[unknown],client=xxx LOCATION: BackendInitialize, postmaster.c:4373 Mar 05 09:06:23 myhost postgres[441820]: 2024-03-05 08:06:23 GMT [441820]: user=[unknown],db=[unknown],app=[unknown],client=xxx LOG: 0: connection received: host=xxx port=34574 Mar 05 09:06:10 myhost postgres[441805]: 2024-03-05 08:06:10 GMT [441805]: user=postgres,db=xxx,app=psql,client=[local] LOCATION: log_disconnections, postgres.c:4722 Mar 05 09:06:10 myhost postgres[441805]: 2024-03-05 08:06:10 GMT [441805]: user=postgres,db=xxx,app=psql,client=[local] LOG: 0: disconnection: session time: 0:00:00.016 user=postgres database=xxx host=[local] Mar 05 09:06:10 myhost postgres[441805]: 2024-03-05 08:06:10 GMT [441805]: user=postgres,db=xxx,app=[unknown],client=[local] LOCATION: PerformAuthentication, postinit.c:292 Mar 05 09:06:10 myhost postgres[441805]: 2024-03-05 08:06:10 GMT [441805]: user=postgres,db=xxx,app=[unknown],client=[local] LOG: 0: connection authorized: user=postgres database=xxx application_name=psql ---cut--- My psotgresql.conf looks like: ---cut log_autovacuum_min_duration = 0 log_checkpoints = yes log_connections = yes log_destination = 'stderr' log_disconnections = yes log_error_verbosity = 'verbose' log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h ' log_lock_waits = yes log_min_duration_statement = 100 log_temp_files = 0 ---cut Any idea what I missed or did wrong? Cheers, Frank
Re: pgBadger: Cannot find any log entries from systemd-journald
Hello, Sorry for the late response. Am 06.03.24 um 16:40 schrieb Greg Sabino Mullane: On Tue, Mar 5, 2024 at 3:14 AM Frank Lanitz mailto:fr...@frank.uvena.de>> wrote: $ pgbadger --journalctl "journalctl -u postgresql.service" You could try adding --verbose to see if it gives more clues. No, unfortunately not. Though, it adds some more nice information in general, but none that actually helped me. Having a look into the journal there is a lot of None of the snippets from journald you showed were actually things pgbadger cares about, FWIW. It was just an random pick to show there is a bunch of logging -- I also can see checkpoints and vacuum etc. I just didn't want to paste 1GB of log ;) You can get an idea of what is actually parsed by running "journalctl -u postgresql --output=short-iso" Looks good to me. log_error_verbosity = 'verbose' This is probably not needed, and there is a finite chance that the extra context is confusing pgbadger. I tried to change it, but in my case no difference. Cheers, Frank
SELECT is faster on SQL Server
Filter: ((tran_date < '2018-03-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1)) (24 rows) Maybe SQL Server has a way of optimising this, and there is nothing more I can do. I can live with that. But I just thought I would ask the question. Thanks for any advice. Frank Millman
Re: SELECT is faster on SQL Server
On 2021-03-19 10:29 AM, Thomas Kellerer wrote: Frank Millman schrieb am 19.03.2021 um 09:19: This may be a non-issue, and I don't want to waste your time. But perhaps someone can have a look to see if there is anything obvious I have missed. I am writing a cross-platform accounting app, and I test using Sql Server on Windows 10 and PostgreSql on Fedora 31. Performance is usually very similar, with a slight edge to PostgreSql. Now I have a SELECT which runs over twice as fast on Sql Server compared to PostgreSql. Can you change the SELECT statement? Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions Something along the lines (for the first derived table): SELECT ... FROM ( SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot FROM ( SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot FROM prop.ar_totals WHERE deleted_id = 0 AND tran_date <= '2018-03-31' AND ledger_row_id = 1 ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC ) AS a GROUP BY a.source_code_id ) as cl_bal ... Thanks, Thomas I tried that, and it ran about 10% faster. Every little helps, but SQL Server appears to have some secret sauce! Frank
Re: SELECT is faster on SQL Server
On 2021-03-19 11:04 AM, Thomas Kellerer wrote: Frank Millman schrieb am 19.03.2021 um 09:52: I am writing a cross-platform accounting app, and I test using Sql Server on Windows 10 and PostgreSql on Fedora 31. Performance is usually very similar, with a slight edge to PostgreSql. Now I have a SELECT which runs over twice as fast on Sql Server compared to PostgreSql. Can you change the SELECT statement? Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions Thanks, Thomas I tried that, and it ran about 10% faster. Every little helps, but SQL Server appears to have some secret sauce! The two derived tables (cl_bal, op_bal) seem to be doing exactly the same thing - at least I can't spot a difference. If that is correct, you can move them into a common table expression - maybe detecting that is SQL Server's secret sauce. with totals as ( SELECT a.source_code_id, SUM(a.tran_tot) AS total FROM ( SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot FROM prop.ar_totals WHERE deleted_id = 0 AND tran_date <= '2018-03-31' AND ledger_row_id = 1 ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC ) AS a GROUP BY a.source_code_id ) select '2018-03-01' AS op_date, '2018-03-31' AS cl_date, cl_bal.source_code_id, op_bal.total as op_tot, cl_bal.total.cl_tot FROM totals as cl_bal LEFT JOIN totals as op_bal ON op_bal.source_code_id = cl_bal.source_code_id; There is a difference. cl_bal selects WHERE tran_date <= '2018-03-31'. op_bal selects WHERE tran_date < '2018-03-01'. The second one could be written as WHERE tran_date <= '2018-02-28', but I don't think that would make any difference. Frank
Re: SELECT is faster on SQL Server
On 2021-03-19 10:56 AM, Pavel Stehule wrote: pá 19. 3. 2021 v 9:53 odesílatel Frank Millman <mailto:fr...@chagford.com>> napsal: On 2021-03-19 10:29 AM, Thomas Kellerer wrote: > Frank Millman schrieb am 19.03.2021 um 09:19: >> This may be a non-issue, and I don't want to waste your time. But perhaps someone can have a look to see if there is anything obvious I have missed. >> >> I am writing a cross-platform accounting app, and I test using Sql >> Server on Windows 10 and PostgreSql on Fedora 31. Performance is >> usually very similar, with a slight edge to PostgreSql. Now I have a >> SELECT which runs over twice as fast on Sql Server compared to >> PostgreSql. >> > Can you change the SELECT statement? > > Very often "distinct on ()" is faster in Postgres compared to the equivalent solution using window functions > > Something along the lines (for the first derived table): > > SELECT ... > FROM ( > SELECT a.source_code_id, SUM(a.tran_tot) AS cl_tot > FROM ( > SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot > FROM prop.ar_totals > WHERE deleted_id = 0 > AND tran_date <= '2018-03-31' > AND ledger_row_id = 1 > ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC > ) AS a > GROUP BY a.source_code_id > ) as cl_bal > ... Thanks, Thomas I tried that, and it ran about 10% faster. Every little helps, but SQL Server appears to have some secret sauce! can you send a result of EXPLAIN ANALYZE? QUERY PLAN Nested Loop Left Join (cost=5.66..5.74 rows=1 width=132) (actual time=0.213..0.248 rows=5 loops=1) Join Filter: (a_1.source_code_id = a.source_code_id) Rows Removed by Join Filter: 4 -> GroupAggregate (cost=3.65..3.67 rows=1 width=36) (actual time=0.144..0.157 rows=5 loops=1) Group Key: a.source_code_id -> Sort (cost=3.65..3.65 rows=1 width=10) (actual time=0.131..0.135 rows=29 loops=1) Sort Key: a.source_code_id Sort Method: quicksort Memory: 26kB -> Subquery Scan on a (cost=2.36..3.64 rows=1 width=10) (actual time=0.063..0.116 rows=29 loops=1) Filter: (a.row_num = 1) Rows Removed by Filter: 3 -> WindowAgg (cost=2.36..3.24 rows=32 width=34) (actual time=0.062..0.107 rows=32 loops=1) -> Sort (cost=2.36..2.44 rows=32 width=26) (actual time=0.054..0.059 rows=32 loops=1) Sort Key: ar_totals.location_row_id, ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date DESC Sort Method: quicksort Memory: 27kB -> Seq Scan on ar_totals (cost=0.00..1.56 rows=32 width=26) (actual time=0.014..0.028 rows=32 loops=1) Filter: ((tran_date <= '2018-03-31'::date) AND (deleted_id = 0) AND (ledger_row_id = 1)) -> GroupAggregate (cost=2.01..2.03 rows=1 width=36) (actual time=0.017..0.017 rows=1 loops=5) Group Key: a_1.source_code_id -> Sort (cost=2.01..2.02 rows=1 width=10) (actual time=0.012..0.013 rows=8 loops=5) Sort Key: a_1.source_code_id Sort Method: quicksort Memory: 25kB -> Subquery Scan on a_1 (cost=1.68..2.00 rows=1 width=10) (actual time=0.032..0.047 rows=8 loops=1) Filter: (a_1.row_num = 1) -> WindowAgg (cost=1.68..1.90 rows=8 width=34) (actual time=0.031..0.043 rows=8 loops=1) -> Sort (cost=1.68..1.70 rows=8 width=26) (actual time=0.023..0.024 rows=8 loops=1) Sort Key: ar_totals_1.location_row_id, ar_totals_1.function_row_id, ar_totals_1.source_code_id, ar_totals_1.tran_date DESC Sort Method: quicksort Memory: 25kB -> Seq Scan on ar_totals ar_totals_1 (cost=0.00..1.56 rows=8 width=26) (actual time=0.006..0.013 rows=8 loops=1) Filter: ((tran_date < '2018-03-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1)) Rows Removed by Filter: 24 Planning Time: 0.479 ms Execution Time: 0.344 ms (33 rows)
Re: SELECT is faster on SQL Server
On 2021-03-19 12:00 PM, Pavel Stehule wrote: In this query the most slow operation is query planning. You try to do tests on almost empty tables. This has no practical sense. You should test queries on tables with size similar to production size. Sorry about that. I hope this one is better. Same query, different data set. QUERY PLAN Merge Left Join (cost=1401.00..1401.12 rows=1 width=132) (actual time=3.595..3.611 rows=5 loops=1) Merge Cond: (a.source_code_id = a_1.source_code_id) -> GroupAggregate (cost=673.16..673.18 rows=1 width=36) (actual time=1.101..1.108 rows=5 loops=1) Group Key: a.source_code_id -> Sort (cost=673.16..673.16 rows=1 width=12) (actual time=1.092..1.093 rows=5 loops=1) Sort Key: a.source_code_id Sort Method: quicksort Memory: 25kB -> Subquery Scan on a (cost=670.67..673.15 rows=1 width=12) (actual time=1.008..1.086 rows=5 loops=1) Filter: (a.row_num = 1) Rows Removed by Filter: 59 -> WindowAgg (cost=670.67..672.37 rows=62 width=36) (actual time=1.006..1.076 rows=64 loops=1) -> Sort (cost=670.67..670.82 rows=62 width=28) (actual time=0.996..1.004 rows=64 loops=1) Sort Key: ar_totals.location_row_id, ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date DESC Sort Method: quicksort Memory: 30kB -> Seq Scan on ar_totals (cost=0.00..668.82 rows=62 width=28) (actual time=0.012..0.933 rows=64 loops=1) Filter: ((tran_date <= '2015-04-30'::date) AND (deleted_id = 0) AND (ledger_row_id = 1)) Rows Removed by Filter: 840 -> GroupAggregate (cost=727.85..727.89 rows=2 width=36) (actual time=2.490..2.495 rows=5 loops=1) Group Key: a_1.source_code_id -> Sort (cost=727.85..727.85 rows=3 width=12) (actual time=2.485..2.485 rows=5 loops=1) Sort Key: a_1.source_code_id Sort Method: quicksort Memory: 25kB -> Subquery Scan on a_1 (cost=700.70..727.82 rows=3 width=12) (actual time=1.684..2.479 rows=5 loops=1) Filter: (a_1.row_num = 1) Rows Removed by Filter: 674 -> WindowAgg (cost=700.70..719.35 rows=678 width=36) (actual time=1.682..2.397 rows=679 loops=1) -> Sort (cost=700.70..702.40 rows=678 width=28) (actual time=1.676..1.758 rows=679 loops=1) Sort Key: ar_totals_1.location_row_id, ar_totals_1.function_row_id, ar_totals_1.source_code_id, ar_totals_1.tran_date DESC Sort Method: quicksort Memory: 78kB -> Seq Scan on ar_totals ar_totals_1 (cost=0.00..668.82 rows=678 width=28) (actual time=0.007..0.836 rows=679 loops=1) Filter: ((tran_date < '2015-09-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1)) Rows Removed by Filter: 225 Planning Time: 0.496 ms Execution Time: 3.695 ms (34 rows)
Re: SELECT is faster on SQL Server
On 2021-03-19 12:58 PM, Frank Millman wrote: QUERY PLAN Merge Left Join (cost=1401.00..1401.12 rows=1 width=132) (actual time=3.595..3.611 rows=5 loops=1) Merge Cond: (a.source_code_id = a_1.source_code_id) -> GroupAggregate (cost=673.16..673.18 rows=1 width=36) (actual time=1.101..1.108 rows=5 loops=1) Group Key: a.source_code_id -> Sort (cost=673.16..673.16 rows=1 width=12) (actual time=1.092..1.093 rows=5 loops=1) Sort Key: a.source_code_id Sort Method: quicksort Memory: 25kB -> Subquery Scan on a (cost=670.67..673.15 rows=1 width=12) (actual time=1.008..1.086 rows=5 loops=1) Filter: (a.row_num = 1) Rows Removed by Filter: 59 -> WindowAgg (cost=670.67..672.37 rows=62 width=36) (actual time=1.006..1.076 rows=64 loops=1) -> Sort (cost=670.67..670.82 rows=62 width=28) (actual time=0.996..1.004 rows=64 loops=1) Sort Key: ar_totals.location_row_id, ar_totals.function_row_id, ar_totals.source_code_id, ar_totals.tran_date DESC Sort Method: quicksort Memory: 30kB -> Seq Scan on ar_totals (cost=0.00..668.82 rows=62 width=28) (actual time=0.012..0.933 rows=64 loops=1) Filter: ((tran_date <= '2015-04-30'::date) AND (deleted_id = 0) AND (ledger_row_id = 1)) Rows Removed by Filter: 840 -> GroupAggregate (cost=727.85..727.89 rows=2 width=36) (actual time=2.490..2.495 rows=5 loops=1) Group Key: a_1.source_code_id -> Sort (cost=727.85..727.85 rows=3 width=12) (actual time=2.485..2.485 rows=5 loops=1) Sort Key: a_1.source_code_id Sort Method: quicksort Memory: 25kB -> Subquery Scan on a_1 (cost=700.70..727.82 rows=3 width=12) (actual time=1.684..2.479 rows=5 loops=1) Filter: (a_1.row_num = 1) Rows Removed by Filter: 674 -> WindowAgg (cost=700.70..719.35 rows=678 width=36) (actual time=1.682..2.397 rows=679 loops=1) -> Sort (cost=700.70..702.40 rows=678 width=28) (actual time=1.676..1.758 rows=679 loops=1) Sort Key: ar_totals_1.location_row_id, ar_totals_1.function_row_id, ar_totals_1.source_code_id, ar_totals_1.tran_date DESC Sort Method: quicksort Memory: 78kB -> Seq Scan on ar_totals ar_totals_1 (cost=0.00..668.82 rows=678 width=28) (actual time=0.007..0.836 rows=679 loops=1) Filter: ((tran_date < '2015-09-01'::date) AND (deleted_id = 0) AND (ledger_row_id = 1)) Rows Removed by Filter: 225 Planning Time: 0.496 ms Execution Time: 3.695 ms (34 rows) @Pavel & depesz Thanks for the replies. I am now focusing on the index. I tried dropping the index 'ar_tots_cover', and then adding back the index columns one at a time. Adding 'tran_date desc' made a small difference. Adding 'tran_day' and 'tran_tot' made a big difference. This changed the index into a 'covering' index, and this is reflected in the new EXPLAIN ANALYSE (see below). Execution of my main query has improved from 50ms to 33ms. Sql Server takes 25ms, but this is much better than it was. However, the bizarre thing is that I have simply restored the index to what it was in the first place. If you look at the table definition in my original message you can see that all the columns were included in the index. But the query did not use it as a covering index. Now the EXPLAIN ANALYSE clearly shows 'Index Only Scan using ar_tots_cover'. I have no idea what changed. Here is the new EXPLAIN ANALYSE - QUERY PLAN Merge Left Join (cost=161.39..161.51 rows=1 width=132) (actual time=1.566..1.581 rows=5 loops=1) Merge Cond: (a.source_code_id = a_1.source_code_id) -> GroupAggregate (cost=50.27..50.29 rows=1 width=36) (actual time=0.226..0.232 rows=5 loops=1) Group Key: a.source_code_id -> Sort (cost=50.27..50.28 rows=1 width=12) (actual time=0.217..0.218 rows=5 loops=1) Sort Key: a.source_code_id Sort Method: quicksort Memory: 25kB -> Subquery Scan on a (cost=47.78..50.26 rows=1 width=
Re: SELECT is faster on SQL Server
On 2021-03-19 4:38 PM, Tom Lane wrote: Frank Millman writes: However, the bizarre thing is that I have simply restored the index to what it was in the first place. If you look at the table definition in my original message you can see that all the columns were included in the index. But the query did not use it as a covering index. Now the EXPLAIN ANALYSE clearly shows 'Index Only Scan using ar_tots_cover'. I have no idea what changed. VACUUM, maybe? Even if there's a covering index, the planner is not likely to prefer an index-only scan unless it thinks that most of the table's pages are known all-visible. If they're not, most of the rows will require heap probes anyway to check row visibility, meaning that the "index-only" scan's performance degrades to about that of a regular indexscan. In this example, since you're fetching such a large fraction of the table (which the planner is accurately estimating), there's not a lot of daylight between the estimated costs of seqscan and index-only scan to begin with. I'm not surprised that it'd prefer the former if the table isn't recently vacuumed. It is possible. I know that I *did* vacuum. But I also ran a program to generate a few hundred additional rows, and I cannot remember if I ran the vacuum before or after that. Frank
Re: SELECT is faster on SQL Server
On 2021-03-19 5:32 PM, Jehan-Guillaume de Rorthais wrote: On Fri, 19 Mar 2021 14:28:27 +0200 Frank Millman wrote: [...] Execution of my main query has improved from 50ms to 33ms. Sql Server takes 25ms, but this is much better than it was. [...] Here is the new EXPLAIN ANALYSE - QUERY PLAN Merge Left Join (...) (actual time=1.566..1.581 rows=5 loops=1) 1.581ms to output the very last row of this plan. This is in contradiction with the 33ms you are referencing above. What do I miss here? Maybe your 33ms comes yet from another set of data? Could you share an explain analyze actually showing this 33ms total execution time? Sorry, I should have explained. The query I showed selects data for a single month. The 'real' query repeats this 12 times, each with different dates, and combines the results using UNION ALL. This was the timing mentioned above. BTW, I know that I can improve this by setting up the dates in a CTE and using JOIN LATERAL. I am avoiding this as it is not supported by SQL Server or sqlite3, and I am trying to stick to one code base for all databases. But I will look into it further. Frank
Re: SELECT is faster on SQL Server
On 2021-03-19 7:11 PM, Thomas Kellerer wrote: Frank Millman schrieb am 19.03.2021 um 10:16: cl_bal selects WHERE tran_date <= '2018-03-31'. op_bal selects WHERE tran_date < '2018-03-01'. The second one could be written as WHERE tran_date <= '2018-02-28', but I don't think that would make any difference. I knew I overlooked something ;) But as one is a true subset of the other, I think you can merge that into a single SELECT statement: select '2018-03-01' AS op_date, '2018-03-31' AS cl_date, a.source_code_id, sum(a.tran_tot) AS cl_tot, sum(a.tran_tot) filter (where tran_date < '2018-03-01') AS op_tot FROM ( SELECT distinct on (location_row_id, function_row_id, source_code_id) source_code_id, tran_tot, tran_date FROM prop.ar_totals WHERE deleted_id = 0 AND tran_date <= '2018-03-31' AND ledger_row_id = 1 ORDER BY location_row_id, function_row_id, source_code_id, tran_date DESC ) AS a GROUP BY a.source_code_id Thanks very much Thomas - I did not know about FILTER. But it does not quite work. If the SELECT does find a row where the max tran_date is <= '2018-03-31' it correctly includes it in 'cl_tot'. But the filter returns nothing for 'op_tot' because there is no corresponding row where tran_date < '2018-03-01'. But I have learned something new, so thanks for that. Frank
Empty tablespace directory after restore with psql -d dbname -f dump.sql
We cancelled a web application upgrade part way through when it was realized Postgres needed a version upgrade first. Using the dump file extracted from a full dump with pg_dumpall the DB was restored with a line like : psql -d dbname -f dump.sql The file dump.sql was extracted from the full dump using the script pg_extract.sh script provided on Github by someone (not sure which fork). Others may want to take note, this tool does not include the declaration of TABLESPACE and its location. So the tablespace files will be in the general data area. \db+ reports tablespace is defined, but it's just empty. If you have a CREATE TABLESPACE with LOCATION in your full dump, you should add it back into the extracted result for a single DB.
Re: case insensitive collation of Greek's sigma
Am 26.11.21 um 08:37 schrieb Jakub Jedelsky: Hello, during our tests of Postgres with ICU we found an issue with ILIKE of upper and lowercase sigma (Σ). The letter has two lowercase variants σ and ς (at the end of a word). I'm working with en_US and en-US-x-icu collations and results are a bit unexpected - they are inverted: postgres=# SELECT postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en_US", postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en_US" postgres-# ; ?column? | ?column? --+-- t | f (1 row) postgres=# SELECT postgres-# 'ΣΣ' ILIKE 'σσ' COLLATE "en-US-x-icu", postgres-# 'ΣΣ' ILIKE 'σς' COLLATE "en-US-x-icu"; ?column? | ?column? --+-- f | t (1 row) I run those commands on the latest (14.1) official docker image. Is it possible to unify the behaviour?And which one is correct from the community point of view? If I could start, I think both results are wrong as both should return True. If I got it right, in the background there is a lower() function running to compare strings, which is not enough for such cases (until the left side isn't taken as a standalone word). Thanks, - jj Have you seen the subtle intricacies in this example? => SELECT 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en_US" AS c0, 'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en_US" AS c1, 'ΣΣ Μ' ~* 'σσ Μ' COLLATE "el-GR-x-icu" AS c2, 'ΣΣ Μ' ~* 'σς Μ' COLLATE "el-GR-x-icu" AS c3, 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "el-GR-x-icu" AS c4, 'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "el-GR-x-icu" AS c5, 'ΣΣ Μ' ~* 'σσ Μ' COLLATE "en-US-x-icu" AS c6, 'ΣΣ Μ' ~* 'σς Μ' COLLATE "en-US-x-icu" AS c7, 'ΣΣ Μ' ILIKE 'σσ Μ' COLLATE "en-US-x-icu" AS c8, 'ΣΣ Μ' ILIKE 'σς Μ' COLLATE "en-US-x-icu" AS c9; c0 | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 | c9 +++++++++ t | f | t | t | f | t | t | t | f | t (1 row) Obviously, the ILIKE operator is really strict regarding to the correct letter at the end of the word. The regular expression operator works as you expected. Happy computing... Frank
UNSUBSCRIBE
PLEASE UNSUBSCRIBE ME TO ALL pgsql* mailing lists. Thanks. From: rammohan ganapavarapu To: gp...@free.fr, pgsql-ad...@postgresql.org Date: 11/20/2017 01:25 PM Subject:Re: [ADMIN] Can master and slave on different PG versions? Gilles, Thank you, if we set wal_level in version <= 9.6 to "logical" what does it mean what does it do? do we still call it as logical replication? also how to tell if my replication is logical or physical. Please excuse me if i am asking stupid questions :) i am new to Postgres. Ram On Mon, Nov 20, 2017 at 2:20 AM, wrote: Hello, yes, before postgres version 10, you must use an extension such as pg_logical to build logical replication between master and slave. For pg_logical, see the following links to help you start : https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/ http://bonesmoses.org/2016/10/14/pg-phriday-perfectly-logical/ https://www.depesz.com/2016/11/08/major-version-upgrading-with-minimal-downtime/ Regards Gilles - Mail original - De: "rammohan ganapavarapu" À: gp...@free.fr, pgsql-ad...@postgresql.org Envoyé: Vendredi 17 Novembre 2017 16:43:57 Objet: Re: [ADMIN] Can master and slave on different PG versions? Hi, I was using pg_basebackup to rebuild slave, so pglogical is not in-built in postgres and I have to install? Thanks On Nov 17, 2017 12:42 AM, < gp...@free.fr > wrote: Hello, How did you create the slave ? with pg_basebackup or via rsync ? In any case, the slave MUST have been built with a backup/rsync made AFTER the master have been restarted when you changed wal_level to logical. Note, that you did so far, set wal_level=logical, is just a prerequisite to use logical replication. After that, you need a tool which will extract the changes in the WALs from the master and will replay them on the slave. Such tool is pg_logical : https://www.2ndquadrant.com/en/resources/pglogical/ Regards - Mail original - De: "rammohan ganapavarapu" < rammohanga...@gmail.com > À: gp...@free.fr , pgsql-ad...@postgresql.org Envoyé: Jeudi 16 Novembre 2017 17:58:42 Objet: Re: [ADMIN] Can master and slave on different PG versions? From Slave (9.6): Database cluster state: shut down in recovery wal_level setting: logical From master (9.4) : Database cluster state: in production Current wal_level setting: logical Quite frankly i couldn't find proper document on how to set logical replication in 9.4, i think i am using pg_logical. All i did was set wal_level to logical and add primary_slot_name in recovery.conf. Do i need to install pglogical extension? i couldn't find one for amazon linux. Thanks, Ram On Thu, Nov 16, 2017 at 6:12 AM, < gp...@free.fr > wrote: Hello again, can you execute pg_controldata command on both Master and Slave and check the following properties : Database cluster state wal_level setting Also, if I understand correctly, you want to establish logical replication between your 9.4 master and your 9.6 slave but you don't explain how ? using pg_logical ? Regards - Mail original - De: "rammohan ganapavarapu" < rammohanga...@gmail.com > À: "David G. Johnston" < david.g.johns...@gmail.com > Cc: pgsql-ad...@postgresql.org Envoyé: Mercredi 15 Novembre 2017 22:23:57 Objet: Re: [ADMIN] Can master and slave on different PG versions? Any one tried this before? On Wed, Nov 8, 2017 at 8:17 AM, rammohan ganapavarapu < rammohanga...@gmail.com > wrote: David, So initially i have physical replication configured, so to upgrade slave to 9.6 version i did converted master slave to logical replication and then try to upgrade the slave to 9.6. I am getting this error when i try to start the slave. Any idea? 2017-11-08 05:08:49 UTC [20063]: [1-1] user= db= host= LOG: shutting down 2017-11-08 05:08:49 UTC [20063]: [2-1] user= db= host= LOG: database system is shut down 2017-11-08 05:09:45 UTC [20322]: [1-1] user= db= host= LOG: database system was shut down at 2017-11-08 05:08:55 UTC 2017-11-08 05:09:45 UTC [20322]: [2-1] user= db= host= LOG: entering standby mode 2017-11-08 05:09:45 UTC [20322]: [3-1] user= db= host= WARNING: WAL was generated with wal_level=minimal, data may be missing 2017-11-08 05:09:45 UTC [20322]: [4-1] user= db= host= HINT: This happens if you temporarily set wal_level=minimal without taking a new base backup. 2017-11-08 05:09:45 UTC [20322]: [5-1] user= db= host= FATAL: hot standby is not possible because wal_level was not set to "replica" or higher on the master server 2017-11-08 05:09:45 UTC [20322]: [6-1] user= db= host= HINT: Either set wal_level to "replica" on the master, or turn off hot_standby here. 2017-11-08 05:09:45 UTC [20312]: [3-1] user= db= host= LOG: startup process (PID 20322) exited with exit code 1 2017-11-08 05:09:45 UTC [20312]: [4-1] user= db= host= LOG: aborting startup due to startup process failure 2017-11-08 05:09:45 UTC [20312]: [5-1] user= db= host= LOG: da
Re: Postgresql database terminates abruptly with too many open files error
Hello, Have you checked something like lsof to see open file descriptors to see? Cheers, frank Am 14.01.25 um 13:58 schrieb Sri Mrudula Attili: Hello Team, We have a postgresql VDB(virtual database- Delphix) that keeps terminating due "to too many open files". Below are few alerts that we could see from the postgresql.log < 2025-01-14 11:37:20.724 GMT >LOG: out of file descriptors: Too many open files in system; release and retry < 2025-01-14 11:37:20.724 GMT >FATAL: epoll_create1 failed: Too many open files in system The number of allowed openfiles at OS level are 65000. even then we see these all these being utilised and causing the database to terminate. Is there a way we could avoid this. Thanks, Sri
Re: duplicate key value violates unique constraint "chinese_price_infos_pkey"
You can fix the problem with this query: SELECT setval('chinese_price_infos_id_seq', sq.val) from ( SELECT MAX(id) as val FROM chinese_price_infos ) sq; But you have to search in your application because in some point the app are inserting the id column instead of leave this task to the DB. If you are using some entity framework and the app fill the ID field in a new entity in some point of the workflow, then when you save the entity, the framework automatically saves the ID in the DB without checking if the ID already exist. Sheers On Mon, 2019-05-06 at 16:40 +0530, Arup Rakshit wrote: > Hi, > > Thanks for your reply. It is automatic, my app don’t creates ID, it delegates > it to the DB. I am using Ruby on Rails app, where we use Postgresql. > > docking_dev=# \d chinese_price_infos; > Table "public.chinese_price_infos" >Column|Type | Collation | Nullable | > Default > -+-+---+--+- > id | integer | | not null | > nextval('chinese_price_infos_id_seq'::regclass) > created_at | timestamp without time zone | | | > updated_at | timestamp without time zone | | | > item_code | character varying(255) | | | > description | character varying(255) | | | > unit| character varying(255) | | | > price_cents | integer | | | > uuid| uuid| | | > uuid_generate_v4() > company_id | uuid| | | > Indexes: > "chinese_price_infos_pkey" PRIMARY KEY, btree (id) > "index_chinese_price_infos_on_company_id" btree (company_id) > > > > > Thanks, > > Arup Rakshit > a...@zeit.io > > > > On 06-May-2019, at 4:38 PM, Ray O'Donnell wrote: > > On 06/05/2019 12:05, Arup Rakshit wrote: > Every time I try to insert I get the error: > docking_dev=# INSERT INTO "chinese_price_infos" ("item_code", > "price_cents", "unit", "description", "company_id", "created_at", > "updated_at") VALUES ('01GS10001', 6000, 'Lift', 'Shore Crane > Rental', '9ae3f8b8-8f3f-491c-918a-efd8f5100a5e', '2019-05-06 > 10:49:03.894725', '2019-05-06 10:49:03.894725'); ERROR: duplicate > key value violates unique constraint "chinese_price_infos_pkey" DETAIL: Key > (id)=(71165) already exists. docking_dev=# INSERT INTO > "chinese_price_infos" ("item_code", "price_cents", "unit", > "description", "company_id", "created_at", "updated_at") VALUES > ('01GS10001', 6000, 'Lift', 'Shore Crane Rental', > '9ae3f8b8-8f3f-491c-918a-efd8f5100a5e', '2019-05-06 10:49:03.894725', > '2019-05-06 10:49:03.894725'); ERROR: duplicate key value violates > unique constraint "chinese_price_infos_pkey" DETAIL: Key > (id)=(71166) already exists. > Then I found: > docking_dev=# SELECT MAX(id) FROM chinese_price_infos; max 128520 (1 > row) > docking_dev=# SELECT nextval('chinese_price_infos_id_seq'); nextval - > 71164 (1 row) > Not sure how it is out of sync. How can I fix this permanently. I ran > vacuum analyze verbose; still same error. > > > You can fix it by using setval() to set the sequence manually to something > higher than the highest current id value in the table. However, it sounds as > if something in the application code may be causing problems For example, > is something generating id values without reference to the sequence? > > Ray. > > > > -- > Raymond O'Donnell // Galway // Ireland > r...@rodonnell.ie > > > >
Re: PITR based recovery in a primary/standby cluster setup
Hello.I'm not sure which replications issues you have, and I never used Wall-E before, but I get some issues with PotgreSql 10 and Barman. Try starting the primary server at first, when it finish to recovery this should start as primary, if not then go to the postgresql data directory and rename the recovery.conf to recovery.done and start the server as primary. Then start the standby server and when recovery target is reached, the standby server should not leave the recovery status and this should weep receiving wal through the archive_command, and should not rename the recovery.conf file. Regards On Tue, 2019-05-21 at 14:27 +0530, Abhijit Gharami wrote: > Hi, > > We have primary and standby PostgreSQL cluster setup and also we have > PITR enabled on it. To improve the recovery time we are thinking of > recovering the database to both primary and standby at the same time. > > These are the steps we are following: > 1. Restore the base backup to the both primary and standby server > 2. Replay the WAL files on both primary and standby and once the > recovery target is reached stop the servers > 3. Start one of the server as Primary and other one as standby > > > We have followed the above steps but when we are trying to start the > servers as primary and standby we are having replication issues. > > Could you please suggest what should be done here so that we can > recover the database in both primary as well as in standby server? > > We are using PostgrerSQL version: 9.6.12 and for PITR we are using > WAL-E. > > Regards, > Abhijit
Re: Data entry / data editing tools (more end-user focus).
If you have the database modeled, the most quickly think I can thinks is with python framework Django. Configure the connection to the DB and make reverse engineer with Django, this create the entities class, then activate the administration forms and configure each form for the entities (few lines) and Django makes the magic and makes the GUI for the DB and if the entities are related this give you the option to insert before the entities needed. As a plus, you have the access control module done too. Regards On Thu, 2019-05-23 at 11:52 +0200, Tony Shelver wrote: > I looked at quite a few options. Some constraints on my side that > our direction is open source, with Linux development and servers. > Radzen is .NET: I could just as well use MS Access to cobble > together a front end. > > CUBA and OpenXava are Java based and seem to require writing Java for > logic: I last used nearly 20 years ago and 'fast' development and > Java IMHO is an oxymoron. > > Aurelia looks a bit promising, but I am not sure if it gains anything > over the current crop of JS libraries and frameworks, such as Vue, > React et al, and then libraries / frameworks built on top of those > such as Nuxt / Vue, Quasar / Vue or Vuetify / Vue, which seem to > have far more activity on Github. > > I managed to set up a quick and dirty front end using LibreOffice > Base over a weekend, next iteration i will probably move over to a > Vue framework, probably using Quasar.. > > On Sat, 18 May 2019 at 00:26, Stefan Keller > wrote: > > Dear all > > > > > > > > What about following „Rapid App Development Tools"? > > > > * OpenXava (Java): > > https://www.openxava.org/ate/visual-studio-lightswitch > > > > * Radzen (.NET): > > https://www.radzen.com/visual-studio-lightswitch-alternative/ > > > > * Other: https://aurelia.io/ (JS) or CUBA > > https://www.cuba-platform.com/ (Java) > > > > > > > > :Stefan > > > > > > > > Am Do., 28. März 2019 um 15:39 Uhr schrieb Adrian Klaver > > > > : > > > > > > > > > > On 3/27/19 11:49 PM, Tony Shelver wrote: > > > > > > > > > > Please reply to list also, more eyes on the the problem. > > > > > Ccing list > > > > > > > > > > My take on below is since you are feeding a Website why not use > > Web > > > > > technologies for your data entry. My language of choice is > > Python. I > > > > > have done something similar to this(on small scale) using the > > Django > > > > > framework. For something lighter weight there is Flask. Then your > > client > > > > > becomes a browser and you do not have to distribute forms around. > > You > > > > > could integrate with the existing Web apps you are using e.g. > > SnipCart. > > > > > > > > > > > > > > > > Actually I found a possibility. LibreOffice Base on top of PG > > lets me > > > > > > paste photos into a Postgresql bytea field no problem. MS > > Access should > > > > > > work well also, but I am not going to buy it, and running > > Ubuntu most of > > > > > > the time. > > > > > > Possibly will distribute the Base forms to select users to > > enter data. > > > > > > We are a startup company, so this is an affordable temporary > > fix, until > > > > > > the product I have been looking at matures, or we can roll our > > own. > > > > > > > > > > > > We are building a company website, including an eStore, and > > have a few > > > > > > hundred products to load and maintain. Our product data > > currently isn't > > > > > > suitable for a sales catalog. > > > > > > (Brands, categories, products, pricing and deal info, specials, > > images, > > > > > > product comparisons and so on). > > > > > > > > > > > > Right now I input / maintain this via CSV files maintained > > through a > > > > > > spreadsheet (LibreOffice Calc) which our site generator > > (Jekyll) uses > > > > > > to build out the static HTML product [pages automatically. > > > > > > This is really quick to enter basic data, but I have to > > manually > > > > > > maintain image uploads, image names and so on manually in the > > > > > > spreadsheet and through individual file uploads. We have at > > least one, > > > > > > preferably 3 and up to 6 photos per product to maintain. Call > > it a 1000 > > > > > > images right now, and that will only go up. > > > > > > Invalid text / characters in product descriptions and so on can > > break > > > > > > the CSV as well. > > > > > > > > > > > > There are headless CMS solutions out on the market targeting > > this same > > > > > > area, but for various reasons the suitable ones are still > > maturing and > > > > > > shaking out in the marketplace, so I am not in a hurry to make > > a choice. > > > > > > > > > > > > So the goal is to replace CSV with JSON file input. This will > > also make > > > > > > my life easier for more complex structures such as multiple > > categories > > > > > > and specs per product. > > > > > > I also want to migrate text that can change from the HTML pages > > into the > > >
Re: One way replication in PostgreSQL
You could use FDW to replicate what you need to an external server from the provider/primary/master to the subscriber/secondary/slaveUsing triggers on the master tables that you want to replicate, you can execute the insert/update/delete actions on the secondary tables through the FDW.With this approach you only need a connection from provider to the subscriber. Regards On Mon, 2019-06-03 at 18:00 +0200, PALAYRET Jacques wrote: > Hello, > > If, for security reasons, I can't create a connection or a flow from > subscriber/secundary/slave towards provider/primary/master, witch > replication systems can I use ? > > If possible, I would prefer partial replication (only some tables) to > full base replication (all instances). > > Do trigger-based replication systems (like Slony or Londiste or > others) need a connection or flow from subscriber to the provider ? > > Thanks in advance > - Météo-France - > PALAYRET JACQUES > DCSC/MBD > jacques.palay...@meteo.fr > Fixe : +33 561078319
Re: One way replication in PostgreSQL
The FDW is a PostgreSQL extension to connect to other server from PosgreSQL server inside, with this solution you only need connections from P to S and no need a third server (external server), just use triggers to push the INSERT/UPDATE/DELETE information you want to replicate from P to S through Foreign Data Wrapper. If you need integrity on the replicated information then I suggest to use a control table to store the actions to replicate for the case when it fails you can keep trying til the action succeeds. Regards On Tue, 2019-06-04 at 09:02 +0200, PALAYRET Jacques wrote: > Hello, > > Thanks a lot for the suggested solutions. > > So, I can use WAL-shipping replication from Primary to the Secundary > server, but it's only for full replication. > > Let's call " P " the provider/primary/master and " S " the > subscriber/secundary/slave one. > For partial replication (not all the tables), the solutions should > use a third (intermediate / middle) server which could have both ways > flow with the server P but only one way flow towards the server S. > For example, a logical replication (pglogical or logical Postgresql > replication) between server P and the middle server and then a WAL- > shipping replication between middle server and server S. > Is that right ? > > About the " FDW " solution in " an external server " (a middle one), > is it possible to integrate the FDW in the P server to avoid the " > external server " ? > > => What about the trigger-based replication systems like Slony or > Londiste ; is it really necessary to have a connection or flow from > the server S towards the server P ? > > Regards > De: "PALAYRET Jacques" > À: pgsql-general@lists.postgresql.org > Envoyé: Lundi 3 Juin 2019 18:00:51 > Objet: One way replication in PostgreSQL > > Hello, > > If, for security reasons, I can't create a connection or a flow from > subscriber/secundary/slave towards provider/primary/master, witch > replication systems can I use ? > > If possible, I would prefer partial replication (only some tables) to > full base replication (all instances). > > Do trigger-based replication systems (like Slony or Londiste or > others) need a connection or flow from subscriber to the provider ? > > Thanks in advance > - Météo-France - > PALAYRET JACQUES > DCSC/MBD > jacques.palay...@meteo.fr > Fixe : +33 561078319
Re: Forks of pgadmin3?
You could try OmniDB, is web app but have a version that just feels like a desktop application. Is supported by 2ndQuadrant. This is the official website https://omnidb.org/en/ Greetings El vie., 22 mar. 2019 a las 4:56, Christian Henz () escribió: > I know I'm late to the party, but we're only now migrating from > Postgres 9.x, realizing that pgadmin3 does not support Postgres 11. > > I have checked out pgadmin4, but I don't like it at all. My colleagues > feel the same way, and some web searching suggests that we are not > alone. > > So I wonder if there are any active forks of pgadmin3? > > I found some on Github with some significant changes that I assume > were done by people working for VK, the Russian social network. These > appear to be personal hacks though (monosyllabic commit messages, build > scripts added with hard coded local paths etc.). > > There are also the Debian packages that have patches adding Postgres > 10 support among other things. Not sure if there would be interest > there in continuing to support newer Postgres versions. > > Are there other, more organized efforts to continue pgadmin3? > > Are there technical reasons why such a continuation would not make > sense? > > Cheers, > Christian > > -- > Christian Henz > Software Developer, software & vision Sarrazin GmbH & Co. KG > > >