morrySnow commented on pull request #8695: URL: https://github.com/apache/incubator-doris/pull/8695#issuecomment-1080649337
> > > Why add a memory control to limit the broadcast memory? Instead of using mem limit uniformly? > > > > > > there are 2 reason: > > > > 1. broadcast is not always fast than shuffle. The cost of creating a FULL TABLE hash table is not negligible when broadcast table is large. > > 2. In be, we allocate hash table in buffer pool, and it' is not limited by mem limit. > > 1. Added a new memory parameter that will make it more difficult for users to understand and debug. > I understand that broadcast is faster than shuffle in most cases. If shuffle is faster than broadcast, it is not directly related to the size of the hash table, but is related to the gap between the data sizes of the left and right tables. > In this case, can manually hint to specify the join method. About Create hash table is expensive when expand hash table size. it can't just include network overhead, If we need an accurate cost model. > 2. From what I see, the MemPool currently used by HashJoinNode allocates the memory of the HashTable, and the BufferPool is only used in the HashTable of the Partitioned Agg. > > If the remaining 1G is to reserve memory for a query except for hash join, we should try to estimate the memory consumption of all nodes in a fragment, and complete it by collecting statistics. i will recheck it, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org