peter-toth commented on issue #13692:
URL: https://github.com/apache/datafusion/issues/13692#issuecomment-2577048321
This is a very interresting issue.
I was trying to repro the results of the above experiement, but got matching
results with rayon and tokio. Maybe rayon is slightly faster on my M3:
```
% cargo run --release --bin tokio -- --cpu-duration 1s --concurrency 7
Finished `release` profile [optimized] target(s) in 0.46s
Running `target/release/tokio --cpu-duration 1s --concurrency 7`
Average duration of 1061 ms (IO 57 ms) over 1 samples, throughput 0.94205743
rps
Average duration of 1060 ms (IO 57 ms) over 7 samples, throughput 6.741966
rps
Average duration of 1039 ms (IO 35 ms) over 7 samples, throughput 6.7168765
rps
Average duration of 1041 ms (IO 37 ms) over 7 samples, throughput 6.7122035
rps
Average duration of 1039 ms (IO 37 ms) over 7 samples, throughput 6.7568874
rps
Average duration of 1039 ms (IO 36 ms) over 7 samples, throughput 6.7319846
rps
Average duration of 1037 ms (IO 34 ms) over 7 samples, throughput 6.7261376
rps
Average duration of 1042 ms (IO 38 ms) over 7 samples, throughput 6.720298
rps
^C
% cargo run --release --bin rayon -- --cpu-duration 1s --concurrency 7
Finished `release` profile [optimized] target(s) in 1.89s
Running `target/release/rayon --cpu-duration 1s --concurrency 7`
Average duration of 1042 ms (IO 37 ms) over 1 samples, throughput 0.9594442
rps
Average duration of 1044 ms (IO 41 ms) over 7 samples, throughput 6.8305006
rps
Average duration of 1028 ms (IO 27 ms) over 7 samples, throughput 6.8629923
rps
Average duration of 1023 ms (IO 22 ms) over 7 samples, throughput 6.8896527
rps
Average duration of 1019 ms (IO 18 ms) over 7 samples, throughput 6.9374223
rps
Average duration of 1027 ms (IO 22 ms) over 7 samples, throughput 6.8580756
rps
Average duration of 1018 ms (IO 16 ms) over 7 samples, throughput 6.847353
rps
Average duration of 1018 ms (IO 16 ms) over 7 samples, throughput 6.8783445
rps
^C
```
Honestly, I don't get why would be significant difference between the 2 as
both apps seem to work the same way.
We have the main thread that spawns tasks and those tasks are excuted either
on the 8 threads of tokio or the 7 threads of rayon (more on this later). In
the rayon app the tokio worker theads don't do anything, do they? So I would
explain the slight discrepancy with the different work stealing logic of tokio
and rayon.
Where I do see difference is `--concurrency 8`+:
```
% cargo run --release --bin tokio -- --cpu-duration 1s --concurrency 8
Finished `release` profile [optimized] target(s) in 0.25s
Running `target/release/tokio --cpu-duration 1s --concurrency 8`
Average duration of 1064 ms (IO 59 ms) over 1 samples, throughput 0.93955696
rps
Average duration of 1072 ms (IO 69 ms) over 8 samples, throughput 7.634239
rps
Average duration of 1045 ms (IO 43 ms) over 8 samples, throughput 7.6972647
rps
Average duration of 1040 ms (IO 37 ms) over 8 samples, throughput 7.66677 rps
Average duration of 1039 ms (IO 36 ms) over 8 samples, throughput 7.762669
rps
Average duration of 1037 ms (IO 34 ms) over 8 samples, throughput 7.6279187
rps
Average duration of 1044 ms (IO 40 ms) over 8 samples, throughput 7.680044
rps
Average duration of 1042 ms (IO 39 ms) over 8 samples, throughput 7.695585
rps
^C
% cargo run --release --bin rayon -- --cpu-duration 1s --concurrency 8
Finished `release` profile [optimized] target(s) in 1.48s
Running `target/release/rayon --cpu-duration 1s --concurrency 8`
Average duration of 1036 ms (IO 31 ms) over 1 samples, throughput 0.96456635
rps
Average duration of 1198 ms (IO 196 ms) over 7 samples, throughput 6.9807143
rps
Average duration of 1167 ms (IO 166 ms) over 7 samples, throughput 6.969797
rps
Average duration of 1158 ms (IO 156 ms) over 7 samples, throughput 6.9852505
rps
Average duration of 1145 ms (IO 143 ms) over 7 samples, throughput 6.9620543
rps
Average duration of 1150 ms (IO 148 ms) over 7 samples, throughput 6.962559
rps
Average duration of 1148 ms (IO 146 ms) over 7 samples, throughput 6.959907
rps
^C
```
But that's because the aforementioned `.use_current_thread()` initialization
of rayon treadpool which causes the main thread to be part of the rayon pool,
but work stealing is not initialized there. Removing that line makes the 2
match again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]