pepijnve commented on PR #16398:
URL: https://github.com/apache/datafusion/pull/16398#issuecomment-2987865677

   I added some output to be able to see what the coop logic was doing. Ignore 
the times; this is dev profile.
   What you can see is that the PR hardly forces yields, while main does. This 
is explained by the yield frequency being 64 on main and 128 in the PR (to 
match Tokio's budget). So it doesn't look like increased yielding is the 
culprit.
   
   **Yield output:**
   <details>
   
   ```
   Main (same for datafusion_coop="per_stream" and 
datafusion_coop="tokio_fallback")
   Polled 1146 times; 20 pending, 1118 ready, 8 forced yields
   Polled 1262 times; 20 pending, 1230 ready, 12 forced yields
   Polled 1076 times; 20 pending, 1048 ready, 8 forced yields
   Polled 1124 times; 24 pending, 1095 ready, 5 forced yields
   Polled 1183 times; 25 pending, 1153 ready, 5 forced yields
   Polled 1209 times; 25 pending, 1179 ready, 5 forced yields
   Polled 1157 times; 28 pending, 1127 ready, 2 forced yields
   Polled 1218 times; 26 pending, 1187 ready, 5 forced yields
   Polled 1572 times; 28 pending, 1535 ready, 9 forced yields
   Polled 1594 times; 29 pending, 1556 ready, 9 forced yields
   Query 4 iteration 2 took 6382.8 ms and returned 1 rows
   ```
   
   vs
   
   ```
   PR
   Polled 1138 times; 20 pending, 1118 ready, 0 forced yields
   Polled 1251 times; 20 pending, 1230 ready, 1 forced yields
   Polled 1068 times; 19 pending, 1048 ready, 1 forced yields
   Polled 1118 times; 23 pending, 1095 ready, 0 forced yields
   Polled 1178 times; 25 pending, 1153 ready, 0 forced yields
   Polled 1154 times; 27 pending, 1127 ready, 0 forced yields
   Polled 1204 times; 25 pending, 1179 ready, 0 forced yields
   Polled 1213 times; 26 pending, 1187 ready, 0 forced yields
   Polled 1584 times; 28 pending, 1556 ready, 0 forced yields
   Polled 1563 times; 28 pending, 1535 ready, 0 forced yields
   Query 4 iteration 6 took 6362.7 ms and returned 1 rows
   ```
   
   </details>
   
   The next thing I can think of is the access to the budget thread local. But 
that doesn't explain why we got a very similar benchmark delta with the 
non-thread local counter.
   Comparing the clickbench1 run we had this in 
https://github.com/apache/datafusion/pull/16398#issuecomment-2980961066 with 
per_stream
   
   ```
   Total Time (HEAD)          │ 56209.26ms
   Total Time (task_budget)   │ 57381.15ms
   Average Time (HEAD)        │  1307.19ms
   Average Time (task_budget) │  1334.45ms
   ```
   
   vs the last result in 
https://github.com/apache/datafusion/pull/16398#issuecomment-2981793916 with 
tokio_fallback
   
   ```
   Total Time (HEAD)          │ 55925.88ms
   Total Time (task_budget)   │ 56847.01ms
   Average Time (HEAD)        │  1300.60ms
   Average Time (task_budget) │  1322.02ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to