makssent commented on issue #36972:
URL: 
https://github.com/apache/shardingsphere/issues/36972#issuecomment-3596300010

   Let me refresh the data a bit, because during this time I tested many 
different configurations and made various adjustments.
   
   ## Cluster Benchmark (200 threads)
   
   | Duration | QPS #1 | QPS #2 | QPS #3 | **QPS Avg** | p95 #1 | p95 #2 | p95 
#3 | **p95 Avg (ms)** |
   
|----------|--------|--------|--------|-------------|--------|--------|--------|-------------------|
   | **30s**  | 12101.23 | 13172.53 | 12348.53 | **12540.76** | 26.753 | 25.773 
| 27.758 | **26.76** |
   | **60s**  | 14958.77 | 16880.63 | 16072.38 | **15970.59** | 24.482 | 20.582 
| 21.288 | **22.12** |
   | **120s** | 20357.69 | 14640.98 | 17766.53 | **17588.40** | 18.248 | 20.635 
| 20.587 | **19.82** |
   
   ## SingleDB Benchmark (200 threads)
   
   | Duration | QPS #1   | QPS #2   | QPS #3   | **QPS Avg** | p95 #1  | p95 #2 
 | p95 #3  | **p95 Avg (ms)** |
   
|----------|----------|----------|----------|-------------|---------|---------|---------|-------------------|
   | **30s**  | 9282.47  | 9295.67  | 9088.30  | **9222.15** | 72.246  | 66.671 
 | 69.021  | **69.31**         |
   | **60s**  | 9375.52  | 9648.97  | 9532.12  | **9518.87** | 88.779  | 87.511 
 | 90.904  | **89.07**         |
   | **120s** | 8935.35  | 9191.13  | 9431.38  | **9185.95** | 110.902 | 
111.966 | 109.380 | **110.75**        |
   
   ## Comparative Benchmark Results (Cluster vs Single, 200 threads)
   
   | Duration | QPS Single | QPS Cluster | QPS Gain (%)        | p95 Single 
(ms) | p95 Cluster (ms) |
   
|----------|-----------:|------------:|----------------------:|----------------:|-----------------:|
   | **30s**  | 9222.15    | 12540.76    | **+35.9% (×1.36)**   | 69.31         
  | 26.76           |
   | **60s**  | 9518.87    | 15970.59    | **+67.7% (×1.68)**   | 89.07         
  | 22.12           |
   | **120s** | 9185.95    | 17588.40    | **+91.5% (×1.92)**   | 110.75        
  | 19.82           |
   
   ---
   
   ### CPU Load on Single (120s)
   
   <img width="534" height="792" alt="Single CPU Load" 
src="https://github.com/user-attachments/assets/4a90c04c-e040-4404-9893-e16df5b46f01";
 />
   
   As visible in the chart, during the 120-second runs the Single node reaches 
**93–100% total CPU load**.
   
   ### CPU Load on a Cluster Node (120s)
   
   <img width="394" height="592" alt="Cluster Node CPU Load" 
src="https://github.com/user-attachments/assets/aac29d2d-08e7-4018-b465-c32e94e97a17";
 />
   
   The cluster node CPU barely feels the load.
   
   I included CPU graphs because one colleague suggested that Proxmox might 
distribute virtual CPU cores incorrectly, causing cluster VM nodes to compete 
for the same physical cores. However, the graphs show no evidence of CPU 
contention.
   
   If virtualization were limiting RAM, cluster nodes would experience buffer 
pool pressure and fall back to disk I/O — but this does not happen.  
   In contrast, the Single node frequently performs physical reads.  
   Therefore, the issue does not appear to be related to virtualization.
   
   ---
   
   ### SphereEX Reference Hardware
   
   | Component | Value |
   |----------|--------|
   | CPU      | 48 cores |
   | Memory   | 96 GB |
   | Disk     | SSD 820 GB |
   
   The documentation does not specify whether this refers to a single physical 
machine or a set of VMs.
   
   Comparing your p95 results with mine:  
   in our 60-second run, the cluster p95 is about **22 ms**, while the Single 
node shows about **89 ms**.  
   The latency reduction is significant, yet QPS increases only to 
**×1.36–×1.92**, far from the **×4.5** in your benchmark.
   
   If there is anything else I can provide to help clarify the situation, I am 
ready to collect additional metrics. I do not believe Proxy optimizations alone 
could realistically increase performance from this range up to ×4.5 in the 
current environment.
   
   ---
   
   ### Current Situation Summary
   
   - The Single node pushes CPU usage close to 100%.  
   - RAM is distributed correctly across cluster nodes, with no signs of 
virtualization-related memory pressure.  
   - Network throughput has not been tested yet; I can check it if needed.  
   - The workload is equivalent to Sysbench OLTP Point Select, and correctness 
was verified directly through MySQL.  
   - Dataset sizes match your example: 40M rows on Single, 8M rows per cluster 
node; each MySQL instance has a 2 GB buffer pool.  
     The Single node performs about 80% physical reads, which causes regression 
but still does not approach ×4.5.  
   
   If you need any additional information, I will be happy to provide it.  
   Your questions may help point me toward the root cause.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to