Re: Slow GroupAggregate and Sort

Darwin Correa Mon, 01 Jan 2024 08:37:57 -0800

Hello, Happy New Year! I add my responses in blue.







---- El Thu, 28 Dec 2023 13:06:18 -0500, Jeff Janes <jeff.ja...@gmail.com> 
escribió ----



On Thu, Dec 28, 2023 at 12:03 PM Darwin Correa <mailto:dcorrea@jedai.group> 
wrote:





when run the query (query.sql)  as you can see in explain (plan4_v3.txt) citus 
take about 18s to run all fragments






Where is plan4_v3.txt?  Is that hidden in some non-obvious way in one of your 
links?








sorry by the wrong name, Yes The explain plan is in the link that said plan, is 
this



https://explain-postgresql.com/archive/explain/8e4b573c5f7bcf3a0d30675a430051fd:0:2023-12-26
    (plan updated)







 



but each fragment take at most 2s, so my questions are- why citus take this 
time in run all fragments?






I only see that one arbitrary fragment takes 2.7s, with no indication whether 
that one is the slowest one or not.  But I am not used to reading citus plans.








In the explain plan citus show one of 72 subtask and show the most slow



 

also we remove partitions, and test only with citus, but query took more than a 
minute.

as a note, we not have 72 shards on the same node we have 72 in total, 24 
shards each node.






I thought the point of sharding was to bring more CPU and RAM to bear than can 
feasibly be obtained in one machine.  Doesn't that make 24 shards per machine 
completely nuts?








Based o citus docs the recommended shards is 2x cpu cores in my case I've 
tested with few shards and 1:1, 2:1 shards but always have slow query time in 
the last step (sorting and grouping) in máster node.







I think the problem was in Sort and in GroupAggregate  I no have idea how speed 
up this in master node, because the Custom Scan (Citus Adaptive)  is not too 
slow, the most time is consumed in master on Sort and group






You want to know why citus is so slow here, but also say it isn't slow and 
something else is slow instead? 








I'm refering in general that this query run slow in Citus cluster, but 
analizing explain plan I think that the specific part of citus (Adaptive 
executor) is not the slow part, instead of I can show that the “postgres only 
part” is slow (Sort and GroupAggregate)





I'd break this down into more manageable chunks for investigation.  Populate 
one scratch table (on one node, not a hypertable) with all 2.6 million rows.  
See how long it takes to populate it based on the citus query, and separately 
see how long it takes to run the aggregate query on the populated scratch table.









Populate table based with citus query, took 1.45 seconds each fragment, I don't 
know how citus run all fragments in parallel  but running secuential each 
fragment, total took 51s



After scratch table filled sort took 32s, explain 
(https://explain.dalibo.com/plan/8a3h26hcc6328c11)



and sort+aggregation took 34s explain 
(https://explain.dalibo.com/plan/c5e4d62ge87cafg4)



I don't understand "actual time" metric, because accordind plan (citus) startup 
time is high in Sort step






What version of PostgreSQL (and citus) are you using?  In my hands (without 
citus being involved), the sort includes "users" as the last column, to support 
the count(distinct users) operation.  I don't know why yours doesn't do that.






I'm using citus 12.0 wich comes with postgreSQL 16, I upgrade to 12.1 this is 
the updated plan:  (now took more time)



https://explain-postgresql.com/archive/explain/3849220d3e3ff2850fe39c62f954cd32:0:2024-01-01







Cheers,



Jeff










 


Darwin    

 






Correa P.   



//     software architect    




























 



 



 


Veintimilla y Leonidas Plaza   



0999965925           



mailto:dcor...@uqaidev.com  /  mailto:dcorrea@jedai.group   



  //      DESARROLLO E INNOVACIÓN TECNOLÓGICA

Re: Slow GroupAggregate and Sort

Reply via email to