Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

via GitHub Fri, 22 Nov 2024 14:51:14 -0800


alamb commented on PR #13424:
URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2495013307


   > At the risk of repeating myself from 
[datafusion-contrib/datafusion-dft#248 
(comment)](https://github.com/datafusion-contrib/datafusion-dft/pull/248#issuecomment-2489110287)
 I would strongly discourage overloading the ObjectStore trait as some sort of 
IO/CPU boundary.
   
   I know you have said you have said you suggest doing something different, 
but I don't know how to translate your suggestions into actual code. I am 
pretty happy now that this PR illiustrates the core usecase of running 
DataFusion plans on a separate runtime/threadpool. 
   
   If you can give me some hits on how to update the example in this PR to do 
what you have in mind I would be glad to try
   
   > Forcing every individual IO operation to be spawned to a separate runtime 
feels like the wrong solution to be encouraging. Instead DF should make this 
judgement call at a meaningful semantic boundary.
   
   In my mind the ObjectStore is both a meaningful and obvious semantic 
boundary (it is the IO abstraction used by DataFusion), so I don't fully 
understand this point. Also having all the IO on a separate threadpool I 
thought was best practice 🤔 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

Reply via email to