[I] Investigate native query planning overhead [datafusion-comet]

via GitHub Mon, 18 Nov 2024 21:31:06 -0800


andygrove opened a new issue, #1098:
URL: https://github.com/apache/datafusion-comet/issues/1098


   ### What is the problem the feature request solves?
   
   For each query stage, the serialized query plan is sent to the executor with 
each task. Each task deserializes the protobuf and then creates a 
`PhysicalPlanner` and builds a native query plan. The query plan for each 
partition in a stage is essentially identical, except for the scan input  JNI 
references, so we are duplicating this query planning work across each 
partition. 
   
   In some cases, planning is very expensive, TPC-H q3 stage 18 seems to take 
around 90 seconds. Here is partial debug output. Note that each partition seems 
to create the query plan twice, which needs further investigation.
   
   ```
   executePlan() stage 18 partition 6 of 29: planning took 1.482816587s
   executePlan() stage 18 partition 6 of 29: planning took 1.748504654s
   executePlan() stage 18 partition 7 of 29: planning took 1.552462415s
   executePlan() stage 18 partition 7 of 29: planning took 1.822570717s
   executePlan() stage 18 partition 8 of 29: planning took 1.498230863s
   executePlan() stage 18 partition 8 of 29: planning took 1.771406765s
   executePlan() stage 18 partition 9 of 29: planning took 1.457221672s
   executePlan() stage 18 partition 9 of 29: planning took 1.771535457s
   ...
   ```
   
   Here is another example where planning is relatively cheap, but repeated 
many times, resulting in 1.76 seconds total planning time.
   
   ```
   executePlan() stage 10 partition 171 of 176: planning took 10.97809ms
   executePlan() stage 10 partition 171 of 176: planning took 11.395246ms
   executePlan() stage 10 partition 172 of 176: planning took 10.283634ms
   executePlan() stage 10 partition 172 of 176: planning took 10.669009ms
   executePlan() stage 10 partition 173 of 176: planning took 9.233809ms
   executePlan() stage 10 partition 173 of 176: planning took 9.651204ms
   executePlan() stage 10 partition 174 of 176: planning took 9.536889ms
   executePlan() stage 10 partition 174 of 176: planning took 9.927454ms
   ...
   ```
   
   Questions:
   
   1. Are there any general optimizations we can make?
   2. Can we cache query plans in each executor and copy them to each task 
rather than duplicate the planning work?
   3. Why do we create the plan twice per partition, or is there an error in 
how I am logging this?
   
   I used the following code to pass the partition numbers to the native code:
   
   ```scala
           nativeLib.executePlan(
             plan,
             arrayAddrs,
             schemaAddrs,
             taskContext.stageId(),
             taskContext.partitionId(),
             taskContext.numPartitions())
   ```
   
   
   
   
   
   
   
   
   
   
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Investigate native query planning overhead [datafusion-comet]

Reply via email to