alamb commented on issue #16490:
URL: https://github.com/apache/datafusion/issues/16490#issuecomment-2993504534

   > I think that's kind of what the morsel paper suggests as well. Avoid 
processing imbalance by partitioning more dynamically and compensate for data 
skew by coalescing in a multi-threaded build side. Easy to write that sentence, 
actually building it is a different matter.
   
   Yeah, I agree -- in theory building multi-threaded operators could 
potentially better manage the asymmetry, but in practice I think it gets very 
tricky. For example, in DuckDB they use thread-local storage to store state per 
thread (rather than have a fully concurrent hash table); That design does not 
obviously have any better skew avoidance properties


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to