date:20230811

Re: pb with big volumes

2023-08-11 Thread Marc Millas

No if there is a pumping effect ie. Loading data in the cache  flushed by
the other request and again and again.
I have had this on an Oracle db years ago. Iostat 100% for days.
Managed by partitionning and locking some data in the cache

Le ven. 11 août 2023 à 03:54, Ron  a écrit :

> Wouldn't IO contention make for additive timings instead of exponential?
>
> On 8/10/23 20:41, Adam Scott wrote:
>
> I think your concern is that 20 min + 30 min does not equal 3 hours.
>
> It might be natural to think the contention would, at max, be 50 min x 2
> (1 hr 40 min).
>
> So what's going on?
>
> It seems disk I/O  is a primary suspect since you hint for an iostat
> replacement inside of Postgres.
>
> If it is due to disk I/O the resolution will be to add RAID 0 SSDs at best.
>
> Consider looking at io stats on the container's persistent volumes.
>
> What is the pipe connecting the database server to the disks?  If it's
> NAS, well that would explain it.
>
> HTH,
> Adam
>
>
>
> On Thu, Aug 10, 2023 at 2:37 PM Marc Millas 
> wrote:
>
>> Hi,
>>
>> I have a 15 TB db on postgres 14 (soon 15).
>> shared buffers is 32 GB.
>>
>> It's a db with max 15 users and often less, and currently 1 or 2.
>> the biggest table have 133 partitions of 150M to 200M+ rows each.
>> lots of request access explicitly one of those.
>>
>> When I, alone, run a query "reading" 15M buffers, it takes 20 minutes
>> (+-5minutes). inside the query there are  2 group by on a 200M rows
>> partition, with all the rows in each group by.
>> When a colleague run the same kind of request (not the same request, but
>> something reading roughly the same volume ) , on a different set of data,
>>  his request is completed in less than half an hour.
>> If we run our requests simultaneously... my request take hours. around 3
>> hours.
>>
>> I am making a supposition that its some kind of "pumping" effect in the
>> cache.
>>
>> I cannot have access to the underlying OS. I can, for sure, do some copy
>> xx from program 'some command',  but its a container with very limited
>> possibilities, not even 'ps'.
>> So I would like to monitor from inside the db (so without iostat and the
>> same)  the volumes of read that postgres do to the OS.
>> I did activate track_io_timing, but the volumes I get in the explain
>> analyze buffer are roughly the same alone or not alone. (the 15M buffers
>> told )
>> to my understanding, the volumes that are shown in pg_stat_database are
>> the useful ones ie. even if the db as to read it from disk more than once.
>> true ? or false ?
>>
>> So.. either my supposition is not correct, and I will read with a lot of
>> interest other ideas
>> either its correct and I would like to know how to monitor this (in the
>> current context, installing a dedicated extension is not impossible, but is
>> a very boring process)
>>
>> Thanks for your help :-)
>>
>> regards,
>>
>> PS: I know that providing the complete data model and the exact requests
>> can be considered mandatory, but when I change the request I get the very
>> same behaviour...
>>
>>
>>
>> Marc MILLAS
>> Senior Architect
>> +33607850334
>> www.mokadb.com
>>
>>
> --
> Born in Arizona, moved to Babylonia.
>

Re: pb with big volumes

2023-08-11 Thread Ron


On 8/10/23 23:40, David Rowley wrote:

On Fri, 11 Aug 2023 at 13:54, Ron  wrote:

Wouldn't IO contention make for additive timings instead of exponential?

No, not necessarily. Imagine one query running that's doing a
parameterised nested loop join resulting in the index on the inner
side being descended several, say, million times.  Let's say there's
*just* enough RAM/shared buffers so that the index pages, once the
index is scanned the first time, all the required pages are cached
which results in no I/O on subsequent index scans.  Now, imagine
another similar query but with another index, let's say this index
also *just* fits in cache.  Now, when these two queries run
concurrently, they each evict buffers the other one uses.  Of course,
the shared buffers code is written in such a way as to try and evict
lesser used buffers first, but if they're all used about the same
amount, then this can stuff occur.  The slowdown isn't linear.


But that's cache thrashing (which was OP's concern), not IO contention.


--
Born in Arizona, moved to Babylonia.

Re: PgSQL 15.3: Execution plan not using index as expected

2023-08-11 Thread Adrian Klaver


On 8/11/23 03:11, Dürr Software wrote:

Please reply to list also
Ccing list

Dear Adrian,

thanks for the reply. Of course i ran ANALYZE on the 15.3 system, its in 
the second part of my post, but here again, FYI:


That is EXPLAIN ANALYZE where it is an option to the command:

https://www.postgresql.org/docs/current/sql-explain.html

ANALYZE

Carry out the command and show actual run times and other 
statistics. This parameter defaults to FALSE.



What I was talking about was the ANALYZE command:

https://www.postgresql.org/docs/current/sql-analyze.html

ANALYZE collects statistics about the contents of tables in the 
database, and stores the results in the pg_statistic system catalog. 
Subsequently, the query planner uses these statistics to help determine 
the most efficient execution plans for queries.




test=# \d client_session
  Tabelle »client_session«
    Spalte |  Typ   | Sortierfolge | NULL 
erlaubt? |   Vorgabewert

---++--+---+--
id    | bigint |  | not null 
  | nextval('admin.client_session_id_seq'::regclass)
tstamp_start  | timestamp(3) without time zone |  | not null 
  | now()
permit_id | character varying(63)  |  | not null 
  | "current_user"()
user_id   | character varying(63)  |  | not null 
  | "session_user"()


Indexe:
     "client_session_pkey" PRIMARY KEY, btree (id)
     "client_session_user_id_idx" btree (user_id, tstamp_start DESC)

test=# explain analyze SELECT permit_id FROM client_session WHERE 
user_id::character varying(63)=SESSION_USER::character varying(63) ORDER 
BY tstamp_start DESC LIMIT 1;

    QUERY PLAN
---
Limit  (cost=2852336.36..2852336.48 rows=1 width=23) (actual 
time=5994.540..6000.702 rows=1 loops=1)
   ->  Gather Merge  (cost=2852336.36..2852697.59 rows=3096 width=23) 
(actual time=5946.422..5952.583 rows=1 loops=1)

 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=2851336.34..2851340.21 rows=1548 width=23) 
(actual time=5934.963..5934.964 rows=1 loops=3)

   Sort Key: tstamp_start DESC
   Sort Method: quicksort  Memory: 25kB
   Worker 0:  Sort Method: quicksort  Memory: 25kB
   Worker 1:  Sort Method: quicksort  Memory: 25kB
   ->  Parallel Seq Scan on client_session 
  (cost=0.00..2851328.60 rows=1548 width=23) (actual 
time=3885.774..5934.915 rows=1 loops=3)
 Filter: ((user_id)::text = 
((SESSION_USER)::character varying(63))::text)

 Rows Removed by Filter: 37163374
Planning Time: 0.167 ms
JIT:
   Functions: 13
   Options: Inlining true, Optimization true, Expressions true, 
Deforming true
   Timing: Generation 0.940 ms, Inlining 119.027 ms, Optimization 79.333 
ms, Emission 29.624 ms, Total 228.924 ms

Execution Time: 6001.014 ms
(18 Zeilen)

Funny thing: if i create an index on tstamp_start alone, it is used just 
perfectly:


Indexe:
     "client_session_pkey" PRIMARY KEY, btree (id)
     "client_session_tstamp_start" btree (tstamp_start)
     "client_session_user_id_idx" btree (user_id, tstamp_start DESC)

test=# explain analyze SELECT permit_id FROM admin.client_session WHERE 
user_id::character varying(63)=SESSION_USER::character varying(63) ORDER 
BY tstamp_start DESC LIMIT 1;


QUERY PLAN
-
  Limit  (cost=0.57..1787.85 rows=1 width=23) (actual time=0.721..0.723 
rows=1 loops=1)
    ->  Index Scan Backward using client_session_tstamp_start on 
client_session  (cost=0.57..6639766.39 rows=3715 width=23) (actual 
time=0.719..0.719 rows=1 loops=1)
  Filter: ((user_id)::text = ((SESSION_USER)::character 
varying(63))::text)

  Planning Time: 0.227 ms
  Execution Time: 0.761 ms
(5 Zeilen)

==
Dürr Software Entw.
Guggenberg 26, DE-82380 Peißenberg
fon: +49-8803-4899016  fax: +49-8803-4899017
i...@fduerr.de

Am 10.08.23 um 16:41 schrieb Adrian Klaver:

On 8/9/23 01:14, Dürr Software wrote:

Dear list,

i have a strange problem when migrating a DB from version 9.3.4 to 15.3:
An index which seems perfect for the query and is used in 9.3.4 as 
expected is not used in 15.3.


Did you run ANALYZE on the 15.3 database after the migration?






--
Adrian Klaver
adrian.kla...@aklaver.com

Re: PgSQL 15.3: Execution plan not using index as expected

2023-08-11 Thread rob stone

Hello,

> -
> -
>   PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc
> (Debian 
> 4.8.3-2) 4.8.3, 64-bit
> 
> 
> -
> 
>   PostgreSQL 15.3 (Debian 15.3-1.pgdg120+1) on x86_64-pc-linux-gnu, 
> compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
> (1 Zeile)
> 
> 
9.3 plan
->  Index Scan using client_session_user_id_idx on client_session  

Looks like a collation issue given the difference in compilers used.
In the 9.3 plan the index is used.
Maybe try a reindex of the table.

HTH,
Rob

Re: pb with big volumes

Re: pb with big volumes

Re: PgSQL 15.3: Execution plan not using index as expected

Re: PgSQL 15.3: Execution plan not using index as expected

4 matches

Site Navigation

Mail list logo

Footer information