alamb commented on issue #16490: URL: https://github.com/apache/datafusion/issues/16490#issuecomment-2993504534
> I think that's kind of what the morsel paper suggests as well. Avoid processing imbalance by partitioning more dynamically and compensate for data skew by coalescing in a multi-threaded build side. Easy to write that sentence, actually building it is a different matter. Yeah, I agree -- in theory building multi-threaded operators could potentially better manage the asymmetry, but in practice I think it gets very tricky. For example, in DuckDB they use thread-local storage to store state per thread (rather than have a fully concurrent hash table); That design does not obviously have any better skew avoidance properties -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org