from:"David Mullineux"

Re: Table and data comparison

2024-09-03 Thread David Mullineux

Yes, PostgreSQL.
Why not just do this via SQL join ?

On Tue, 3 Sept 2024, 17:34 arun chirappurath,  wrote:

> Hi All,
>
> Do we have any open-source utility to compare two table data not structure
> and then provide some sort of reports?
>
> Row by row comparison
>
> Thanks,
> Arun
>

Re: Will hundred of thousands of this type of query cause Parsing issue

2024-09-13 Thread David Mullineux

I would usually try to avoid such long IN causes. Why not try this ..
Create a temp table of 1 column. Bulk insert all your IDs into that table.
Then change your query to join to the temp table?  This also has the
advantage of working for 1000s of values.

On Fri, 13 Sept 2024, 16:35 Wong, Kam Fook (TR Technology), <
kamfook.w...@thomsonreuters.com> wrote:

> Follow Postgres expert,
>
> We have a flavor of this type of query with long in-list/bind variables
> (see below).  We notice that some of the bind variables come in as 0 which
> causes the optimizer to choose to full scan two of the following 3 tables.
> One thought to fix a full table scan is to chop off the not-needed bind
> variables (proven to work after some tests).  But my other worry is will
> cause parsing issues because the app will be executing > 100k/sec with this
> type of query.
>
> I am an Oracle DBA, and this change for sure will generate a different
> query id.  Which in turn generates tons of extra parsing to the DB because
> all soft and hard parsing occurs at the DB level.  But my understanding for
> Postgres is parsing occurs at the client jdbc level.  Am I understanding
> this correctly?
>
>
> In summary/my concern:
>
> 1) Where does query parsing occur?
> 2) Will this cause extra parsing to the posgress DB?  Any pg system table
> to measure parsing?
>
>
>
> SELECT  abc, efg from DOCLOC a, COLLECTION b  WHERE  a.colum1 IN ($1, $2,
> $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18,
> $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33,
> $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48,
> $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63,
> $64, $65, $66, $67, $68, $69, $70, $71, $72, $73, $74, $75, $76, $77, $78,
> $79, $80, $81, $82, $83, $84, $85, $86, $87, $88, $89, $90, $91, $92, $93,
> $94, $95, $96, $97, $98, $99, $100, $101, $102, $103, $104, $105, $106,
> $107, $108, $109, $110, $111, $112, $113, $114, $115, $116, $117, $118,
> $119, $120, $121, $122, $123, $124, $125, $126, $127, $128, $129, $130,
> $131, $132, $133, $134, $135, $136, $137, $138, $139, $140, $141, $142,
> $143, $144, $145, $146, $147, $148, $149, $150, $151, $152, $153, $154,
> $155, $156, $157, $158, $159, $160, $161, $162, $163, $164, $165, $166,
> $167, $168, $169, $170, $171, $172, $173, $174, $175, $176, $177, $178,
> $179, $180, $181, $182, $183, $184, $185, $186, $187, $188, $189, $190,
> $191, $192, $193, $194, $195, $196, $197, $198, $199, $200, $201, $202,
> $203, $204, $205, $206, $207, $208, $209, $210, $211, $212, $213, $214,
> $215, $216, $217, $218, $219, $220, $221, $222, $223, $224, $225, $226,
> $227, $228, $229, $230, $231, $232, $233, $234, $235, $236, $237, $238,
> $239, $240, $241, $242, $243, $244, $245, $246, $247, $248, $249, $250,
> $251, $252, $253, $254, $255, $256, $257, $258, $259, $260, $261, $262,
> $263, $264, $265, $266, $267, $268, $269, $270, $271, $272, $273, $274,
> $275, $276, $277, $278, $279, $280, $281, $282, $283, $284, $285, $286,
> $287, $288, $289, $290, $291, $292, $293, $294, $295, $296, $297, $298,
> $299, $300, $301, $302, $303, $304, $305, $306, $307, $308, $309, $310,
> $311, $312, $313, $314, $315, $316, $317, $318, $319, $320, $321, $322,
> $323, $324, $325, $326, $327, $328, $329, $330, $331, $332, $333, $334,
> $335, $336, $337, $338, $339, $340, $341, $342, $343, $344, $345, $346,
> $347, $348, $349, $350, $351, $352, $353, $354, $355, $356, $357, $358,
> $359, $360, $361, $362, $363, $364, $365, $366, $367, $368, $369, $370,
> $371, $372, $373, $374, $375, $376, $377, $378, $379, $380, $381, $382,
> $383, $384, $385, $386, $387, $388, $389, $390, $391, $392, $393, $394,
> $395, $396, $397, $398, $399, $400, $401, $402, $403, $404, $405, $406,
> $407, $408, $409, $410, $411, $412, $413, $414, $415, $416, $417, $418,
> $419, $420, $421, $422, $423, $424, $425, $426, $427, $428, $429, $430,
> $431, $432, $433, $434, $435, $436, $437, $438, $439, $440, $441, $442,
> $443, $444, $445, $446, $447, $448, $449, $450, $451, $452, $453, $454,
> $455, $456, $457, $458, $459, $460, $461, $462, $463, $464, $465, $466,
> $467, $468, $469, $470, $471, $472, $473, $474, $475, $476, $477, $478,
> $479, $480, $481, $482, $483, $484, $485, $486, $487, $488, $489, $490,
> $491, $492, $493, $494, $495, $496, $497, $498, $499, $500, $501, $502,
> $503, $504, $505, $506, $507, $508, $509, $510, $511, $512, $513, $514,
> $515, $516, $517, $518, $519, $520, $521, $522, $523, $524, $525, $526,
> $527, $528, $529, $530, $531, $532, $533, $534, $535, $536, $537, $538,
> $539, $540, $541, $542, $543, $544, $545, $546, $547, $548, $549, $550,
> $551, $552, $553, $554, $555, $556, $557, $558, $559, $560, $561, $562,
> $563, $564, $565, $566, $567, $568, $569, $570, $571, $572, $573, $574,
> $575, $576, $577, $578, $579, $580, $581, $582, $583, $584, $585, $586,
> $587, $588, $589, $590, $591, $592, $593, $594, $595, $596, $597, $5

Re: Index Partition Size Double of its Table Partition?

2024-10-30 Thread David Mullineux

Are you able to cluster the table ? The idea is that rows ordered in the
same way  as the index might reduce it's size ?

On Wed, 30 Oct 2024, 16:29 Don Seiler,  wrote:

> On Wed, Oct 30, 2024 at 11:23 AM Peter Geoghegan  wrote:
>
>>
>> If a substantial amount of the index was written by CREATE INDEX (and
>> not by retail inserts) then my theory is unlikely to be correct. It
>> could just be that you managed to absorb most inserts in one
>> partition, but not in the other. That's probably possible when there
>> are only relatively small differences in the number of inserts that
>> need to use of the space left behind by fillfactor in each case. In
>> general page splits tend to come in distinct "waves" after CREATE
>> INDEX is run.
>>
>
> What do you mean by "absorb" the inserts?
>
> It sounds like the answer will be "No", but: Would rebuilding the index
> after the month-end (when inserts have stopped on this partition) change
> anything?
>
> Don.
> --
> Don Seiler
> www.seiler.us
>

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

2024-11-09 Thread David Mullineux

Thanks for correction. At this point I would be trying to modify
plan_cache_mode
for the session which uses the bond variable. alter it so that
plan_cache_mode=force_custom_plan
One hypothesis is that, a bad plan got cached for that SQL pattern.
Obviously, when you run it *manually* you are always getting a *custom*
plan as it's not a prepared statement.




On Sat, 9 Nov 2024, 03:46 ravi k,  wrote:

> Sorry, it was typo. Bind variable is bigint only.
>
> Thanks
>
> On Fri, 8 Nov, 2024, 7:09 pm David Mullineux,  wrote:
>
>> Just spotted a potential problem. The indexed column is a bigint. Are
>> you, in your prepared statement passing a string or a big int ?
>> I notice your plan is doing an implicit type conversion when you run it
>> manually.
>> Sometimes the wrong type will make it not use the index.
>>
>> On Fri, 8 Nov 2024, 03:07 ravi k,  wrote:
>>
>>> Hi ,
>>>
>>> Thanks for the suggestions.
>>>
>>> Two more observations:
>>>
>>> 1) no sequence scan noticed from pg_stat_user_tables ( hope stats are
>>> accurate in postgres 16) if parameter sniffing happens the possibility of
>>> going to  sequence scan is more right.
>>>
>>> 2) no blockings or IO issue during the time.
>>>
>>> 3) even with limit clause if touch all partitions also it could have
>>> been completed in milliseconds as this is just one record.
>>>
>>> 4) auto_explain in prod we cannot enable as this is expensive and with
>>> high TPS we may face latency issues and lower environment this issue cannot
>>> be reproduced,( this is happening out of Million one case)
>>>
>>> This looks puzzle to us, just in case anyone experianced pls share your
>>> experience.
>>>
>>> Regards,
>>> Ravi
>>>
>>> On Thu, 7 Nov, 2024, 3:41 am David Mullineux,  wrote:
>>>
>>>> It might be worth eliminating the use of cached plans here. Is your app
>>>> using prepared statements at all?
>>>> Point is that if the optimizer sees the same prepared query , 5 times,
>>>> the  it locks the plan that it found at that time. This is a good trade off
>>>> as it avoids costly planning-time for repetitive queries. But if you are
>>>> manually querying, the  a custom plan will be generated  anew.
>>>> A quick analyze of the table should reset the stats and invalidate any
>>>> cached plans.
>>>> This may not be your problem  just worth eliminating it from the list
>>>> of potential causes.
>>>>
>>>> On Wed, 6 Nov 2024, 17:14 Ramakrishna m,  wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> One of the queries, which retrieves a single record from a table with
>>>>> 16 hash partitions, is taking more than 10 seconds to execute. In 
>>>>> contrast,
>>>>> when we run the same query manually, it completes within milliseconds. 
>>>>> This
>>>>> issue is causing exhaustion of the application pools. Do we have any bugs
>>>>> in postgrs16 hash partitions? Please find the attached log, table, and
>>>>> execution plan.
>>>>>
>>>>> size of the each partitions : 300GB
>>>>> Index Size : 12GB
>>>>>
>>>>> Postgres Version : 16.x
>>>>> Shared Buffers : 75 GB
>>>>> Effective_cache :  175 GB
>>>>> Work _mem : 4MB
>>>>> Max_connections : 3000
>>>>>
>>>>> OS  : Ubuntu 22.04
>>>>> Ram : 384 GB
>>>>> CPU : 64
>>>>>
>>>>> Please let us know if you need any further information or if there are
>>>>> additional details required.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Ram.
>>>>>
>>>>

Re: Help with query optimizer

2024-10-01 Thread David Mullineux

Hi,
Its usually helpful to give some details on what 'slow' means and then
define what 'fast enough ' means. Just so we know when to stop optimizing.
You should also help by giving the DDL of those.tables and any indexes they
have etc..
The plan shows me that there are expensive seq scans on 2 tables :
conversations and tasks.

So it would be good to get the pg_stats items for the columns used on those
filters on those tables.

I would then look at what indexes you have on those  tables to find out if
some are not being used and why.( lots of reasons why ).

thanks

On Tue, 1 Oct 2024, 08:53 Siraj G,  wrote:

> Hello Experts!
>
> Can you please help with the attached query as it is taking time to
> execute. The execution plan for the same is below:
> Thank you!
>
> Nested Loop Left Join  (cost=8176.99..168603.11 rows=1 width=294) (actual
> time=3301.886..3303.715 rows=0 loops=1)
>Join Filter: (conversation.id = "jobReminders"."conversationId")
>Buffers: shared hit=743696 read=78069 dirtied=16 written=10
>I/O Timings: shared read=7224.068 write=0.287
>->  Gather  (cost=8176.99..168472.90 rows=1 width=246) (actual
> time=3301.885..3303.711 rows=0 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  Buffers: shared hit=743696 read=78069 dirtied=16 written=10
>  I/O Timings: shared read=7224.068 write=0.287
>  ->  Nested Loop  (cost=7176.99..167472.80 rows=1 width=246)
> (actual time=3286.866..3286.874 rows=0 loops=3)
>Buffers: shared hit=743696 read=78069 dirtied=16 written=10
>I/O Timings: shared read=7224.068 write=0.287
>->  Nested Loop  (cost=7176.56..61685.83 rows=59320
> width=174) (actual time=68.447..1978.593 rows=57917 loops=3)
>  Buffers: shared hit=284214 read=56135 dirtied=11
> written=9
>  I/O Timings: shared read=4546.889 write=0.242
>  ->  Hash Join  (cost=7176.13..20450.20 rows=15571
> width=142) (actual time=68.196..211.016 rows=17903 loops=3)
>Hash Cond: (conversation."mailBoxId" =
> "mailBox".id)
>Buffers: shared hit=7428 read=11254 dirtied=9
> written=3
>I/O Timings: shared read=85.214 write=0.085
>->  Parallel Hash Join  (cost=6944.87..20143.53
> rows=26942 width=118) (actual time=67.736..197.476 rows=21937 loops=3)
>  Hash Cond: (conversation."taskId" =
> task.id)
>  Buffers: shared hit=6678 read=11254
> dirtied=9 written=3
>  I/O Timings: shared read=85.214
> write=0.085
>  ->  Parallel Seq Scan on conversations
> conversation  (cost=0.00..13029.34 rows=64502 width=102) (actual
> time=0.017..79.191 rows=51677 loops=3)
>Filter: ((NOT "isCompose") AND
> ("parentConversationId" IS NULL) AND (id <>
> '559702f9-55a8-47c7-9b5c-93b29baabf3d'::uuid))
>Rows Removed by Filter: 622
>Buffers: shared hit=3605 read=8608
> dirtied=5 written=3
>I/O Timings: shared read=57.862
> write=0.085
>  ->  Parallel Hash  (cost=6583.67..6583.67
> rows=28896 width=16) (actual time=67.046..67.047 rows=22266 loops=3)
>Buckets: 131072  Batches: 1  Memory
> Usage: 4224kB
>Buffers: shared hit=3073 read=2646
> dirtied=4
>I/O Timings: shared read=27.352
>->  Parallel Seq Scan on tasks task
>  (cost=0.00..6583.67 rows=28896 width=16) (actual time=0.015..48.469
> rows=22266 loops=3)
>  Filter: (("deletedAt" IS
> NULL) AND (type = 'MESSAGE'::enum_tasks_type))
>  Rows Removed by Filter: 32166
>  Buffers: shared hit=3073
> read=2646 dirtied=4
>  I/O Timings: shared
> read=27.352
>->  Hash  (cost=230.80..230.80 rows=37
> width=24) (actual time=0.382..0.383 rows=37 loops=3)
>  Buckets: 1024  Batches: 1  Memory Usage:
> 11kB
>  Buffers: shared hit=690
>  ->  Seq Scan on "mailBoxes" "mailBox"
>  (cost=0.00..230.80 rows=37 width=24) (actual time=0.021..0.367 rows=37
> loops=3)
>Filter: (status = ANY
> ('{ACTIVE,SYNCING}'::"enum_mailBoxes_status"[]))
>Rows Removed by Filter: 27
>Buffers: shared hit=690
>  ->  Index Scan using idx_message_fk_conversation_i

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

2024-11-06 Thread David Mullineux

It might be worth eliminating the use of cached plans here. Is your app
using prepared statements at all?
Point is that if the optimizer sees the same prepared query , 5 times, the
it locks the plan that it found at that time. This is a good trade off as
it avoids costly planning-time for repetitive queries. But if you are
manually querying, the  a custom plan will be generated  anew.
A quick analyze of the table should reset the stats and invalidate any
cached plans.
This may not be your problem  just worth eliminating it from the list of
potential causes.

On Wed, 6 Nov 2024, 17:14 Ramakrishna m,  wrote:

> Hi Team,
>
> One of the queries, which retrieves a single record from a table with 16
> hash partitions, is taking more than 10 seconds to execute. In contrast,
> when we run the same query manually, it completes within milliseconds. This
> issue is causing exhaustion of the application pools. Do we have any bugs
> in postgrs16 hash partitions? Please find the attached log, table, and
> execution plan.
>
> size of the each partitions : 300GB
> Index Size : 12GB
>
> Postgres Version : 16.x
> Shared Buffers : 75 GB
> Effective_cache :  175 GB
> Work _mem : 4MB
> Max_connections : 3000
>
> OS  : Ubuntu 22.04
> Ram : 384 GB
> CPU : 64
>
> Please let us know if you need any further information or if there are
> additional details required.
>
>
> Regards,
> Ram.
>

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

2024-11-08 Thread David Mullineux

Just spotted a potential problem. The indexed column is a bigint. Are you,
in your prepared statement passing a string or a big int ?
I notice your plan is doing an implicit type conversion when you run it
manually.
Sometimes the wrong type will make it not use the index.

On Fri, 8 Nov 2024, 03:07 ravi k,  wrote:

> Hi ,
>
> Thanks for the suggestions.
>
> Two more observations:
>
> 1) no sequence scan noticed from pg_stat_user_tables ( hope stats are
> accurate in postgres 16) if parameter sniffing happens the possibility of
> going to  sequence scan is more right.
>
> 2) no blockings or IO issue during the time.
>
> 3) even with limit clause if touch all partitions also it could have been
> completed in milliseconds as this is just one record.
>
> 4) auto_explain in prod we cannot enable as this is expensive and with
> high TPS we may face latency issues and lower environment this issue cannot
> be reproduced,( this is happening out of Million one case)
>
> This looks puzzle to us, just in case anyone experianced pls share your
> experience.
>
> Regards,
> Ravi
>
> On Thu, 7 Nov, 2024, 3:41 am David Mullineux,  wrote:
>
>> It might be worth eliminating the use of cached plans here. Is your app
>> using prepared statements at all?
>> Point is that if the optimizer sees the same prepared query , 5 times,
>> the  it locks the plan that it found at that time. This is a good trade off
>> as it avoids costly planning-time for repetitive queries. But if you are
>> manually querying, the  a custom plan will be generated  anew.
>> A quick analyze of the table should reset the stats and invalidate any
>> cached plans.
>> This may not be your problem  just worth eliminating it from the list of
>> potential causes.
>>
>> On Wed, 6 Nov 2024, 17:14 Ramakrishna m,  wrote:
>>
>>> Hi Team,
>>>
>>> One of the queries, which retrieves a single record from a table with 16
>>> hash partitions, is taking more than 10 seconds to execute. In contrast,
>>> when we run the same query manually, it completes within milliseconds. This
>>> issue is causing exhaustion of the application pools. Do we have any bugs
>>> in postgrs16 hash partitions? Please find the attached log, table, and
>>> execution plan.
>>>
>>> size of the each partitions : 300GB
>>> Index Size : 12GB
>>>
>>> Postgres Version : 16.x
>>> Shared Buffers : 75 GB
>>> Effective_cache :  175 GB
>>> Work _mem : 4MB
>>> Max_connections : 3000
>>>
>>> OS  : Ubuntu 22.04
>>> Ram : 384 GB
>>> CPU : 64
>>>
>>> Please let us know if you need any further information or if there are
>>> additional details required.
>>>
>>>
>>> Regards,
>>> Ram.
>>>
>>

Re: Error creating materialized view

2024-11-25 Thread David Mullineux

On Mon, 25 Nov 2024, 06:08 Shaun Robinson, 
wrote:

> Hi,
>
> I'm currently testing an application with Postgres 17.2 and am getting an
> error when creating a materialized view which works in version 16 and
> below. The sql works fine running as a query, but adding the
> create materialized view breaks it.
>
> The error comes when calling a custom function and the error is that a
> relation doesn't exist (which it does as it works within the same query
> when not creating a view).
>
> Is this a known issue in the version 17.2?
>
> Thanks
> Shau
>


Can you please post the DDL statements so we can see?
Thanks

> can you
>

Re: Memory settings when running postgres in a docker container

2024-11-20 Thread David Mullineux

i dont get why you think all memroy will be used.
 When you say
shared_buffers = 16GB
effective_cache_size = 48GB

...then this is using only 16GB for shared buffers.

The effective _cache_size doesn't cause any memory to.be allocated. It's
just a hint to optomizer 

On Wed, 20 Nov 2024, 11:16 Koen De Groote,  wrote:

> Assuming a machine with:
>
> * 16 CPU cores
> * 64GB RAM
>
> Set to 500 max connections
>
> A tool like this: https://pgtune.leopard.in.ua/
>
> Will output recommended settings:
>
> max_connections = 500
> shared_buffers = 16GB
> effective_cache_size = 48GB
> maintenance_work_mem = 2GB
> checkpoint_completion_target = 0.9
> wal_buffers = 16MB
> default_statistics_target = 100
> random_page_cost = 1.1
> effective_io_concurrency = 200
> work_mem = 8388kB
> huge_pages = try
> min_wal_size = 1GB
> max_wal_size = 4GB
> max_worker_processes = 16
> max_parallel_workers_per_gather = 4
> max_parallel_workers = 16
> max_parallel_maintenance_workers = 4
>
> And they basically use up all the memory of the machine.
>
> 16GB shared buffers, 48GB effective cache size, 8MB of work_mem for some
> reason...
>
> This seems rather extreme. I feel there should be free memory for
> emergencies and monitoring solutions.
>
> And then there's the fact that postgres on this machine will be run in a
> docker container. Which, on Linux, receives 64MB of /dev/shm shared memory
> by default, but can be increased.
>
> I feel like I should probably actually lower my upper limit for memory,
> regardless of what the machine actually has, so I can have free memory, and
> also not bring the container process itself into danger.
>
> Is it as straightforward as putting my limit on, say 20GB, and then giving
> more /dev/shm to the container? Or is there more to consider?
>
> Regards,
> Koen De Groote
>
>
>
>
>
>
>

Re: Wired behaviour from SELECT

2024-11-22 Thread David Mullineux

Instead of nickname you probably want tontet where password=`Arbol'  ..  or
am.i.missong something ?.

On Fri, 22 Nov 2024, 20:13 David G. Johnston, 
wrote:

> On Fri, Nov 22, 2024 at 1:07 PM Arbol One  wrote:
>
>> The below sql statement produces the right output
>> SELECT nickname, password FROM password WHERE id='0938105618107N1';
>>   nickname   | password
>> -+--
>>  Piccard@@21 |  Arbol
>> (1 row)
>> However, if this sql statement produces the wrong output
>>
>>
>>
>> *SELECT nickname, password FROM password WHERE nickname='Arbol';
>>  nickname | password --+-- (0 rows)*
>> What am I doing wrong?
>>
> Naming a column in your table the same name as the table is problematic
> generally.  As for the query, if they are both intended to return the same
> row the value Arbol is in the password column, not the nickname column.
> You seem to have reversed the data for the two columns.
>
> David J.
>
>

Re: Table and data comparison

Re: Will hundred of thousands of this type of query cause Parsing issue

Re: Index Partition Size Double of its Table Partition?

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

Re: Help with query optimizer

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16

Re: Error creating materialized view

Re: Memory settings when running postgres in a docker container

Re: Wired behaviour from SELECT

10 matches

Site Navigation

Mail list logo

Footer information