xinyiZzz commented on pull request #8695:
URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080590449


   > > Why add a memory control to limit the broadcast memory? Instead of using 
mem limit uniformly?
   > 
   > there are 2 reason:
   > 
   > 1. broadcast is not always fast than shuffle. The cost of creating a FULL 
TABLE hash table is not negligible when broadcast table is large.
   > 2. In be, we allocate hash table in buffer pool, and it' is not limited by 
mem limit.
   
   1. Added a new memory parameter that will make it more difficult for users 
to understand and debug.
   I understand that broadcast is faster than shuffle in most cases. If shuffle 
is faster than broadcast, it is not directly related to the size of the hash 
table, but is related to the gap between the data sizes of the left and right 
tables.
   In this case, can manually hint to specify the join method.
   
   2. From what I see, the MemPool currently used by HashJoinNode allocates the 
memory of the HashTable, and the BufferPool is only used in the HashTable of 
the Partitioned Agg.
   
   If the remaining 1G is to reserve memory for a query except for hash join, 
we should try to estimate the memory consumption of all nodes in a fragment, 
and complete it by collecting statistics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to