Dandandan commented on PR #20481:
URL: https://github.com/apache/datafusion/pull/20481#issuecomment-4010938363

   > I spent some more time this evening reading this PR -- I think if we did 
some rearranging of the code it would likely be easier to read / understand 
(and possibly test)
   > 
   > Specifically I am thinking of trying to pull it out into a more explicit 
pipeline like this
   > 
   > ```
   > ┌────────────────────┐      ┌────────────────────┐    
┌───────────────────────┐   ┌────────────────────┐ 
   > │     Open File      │      │        Plan        │    │      Fetch Data    
   │   │                    │ 
   > │  (fetch / decode   │─────▶│(Apply statistics + │───▶│(fetch Row Group 
data) │──▶│    Decode Data     │ 
   > │  ParquetMetadata)  │      │      pruning)      │    │                    
   │   │                    │ 
   > └────────────────────┘      └────────────────────┘    
└───────────────────────┘   └────────────────────┘ 
   >                                                                            
                              
   >         IO+CPU                       CPU                         IO        
                 CPU          
   > ```
   > 
   > I will try and spend some of my time on the airplane tomorrow working to 
see if I can rearrange the opener code towards this direction.
   > 
   > My idea is to split the implementation into a few steps:
   > 
   > 1. Introduce the abstraction of ParquetMorsels (but don't yet implement 
work stealing across partitions)
   > 2. (maybe) restructure the existing FileOpener so it produces a Stream of 
ParquetMorsels
   > 
   > I think then we would be in a better position to add in the (key) idea of 
"steal morsels from other partitions" idea if the next file/row group wasn't 
yet ready
   > 
   > But I realize this is all just a theory until I actually produce code
   
   I like the direction it is going to split the control flow into isolated 
steps so we can control (and minimize) the IO.
   
   @adriangb I also did some profiling for ClickBench.
   
   The queries it seems to benefit the most from morselizing currently are the 
ones with enough row groups to read (see e.g. picture from my presentation 
below).
   But there are also some with only a _few_ row groups (and the upstream)
   
   One strategy that might be useful for parquet is:
   
   * first split the query into row group morsels
   * when the remaining queue is small, further split the morsels into smaller 
ones (by changing).
   
   I think we should carefully measure this to avoid any overhead, I think we 
have to split such changes into a couple of PRs so we can handle each change 
individually.
   
   <img width="981" height="361" alt="image" 
src="https://github.com/user-attachments/assets/6b0c7748-fd50-4a86-8d3b-62c79d59da6d";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to