Kontinuation commented on code in PR #14644:
URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957056835


##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -641,7 +702,15 @@ pub fn sort_batch(
         lexsort_to_indices(&sort_columns, fetch)?
     };
 
-    let columns = take_arrays(batch.columns(), &indices, None)?;
+    let mut columns = take_arrays(batch.columns(), &indices, None)?;
+
+    // The columns may be larger than the unsorted columns in `batch` 
especially for variable length
+    // data types due to exponential growth when building the sort columns. We 
shrink the columns
+    // to prevent memory reservation failures, as well as excessive memory 
allocation when running
+    // merges in `SortPreservingMergeStream`.
+    columns.iter_mut().for_each(|c| {

Review Comment:
   This `shrink_to_fit` is basically a no-op for primitive type columns 
produced by `take_arrays` because their internal buffer already has the right 
capacity, and it will only perform reallocations for columns with variable 
length data. so I don't there's a need for cherry-picking which columns to 
shrink.
   
   I've also benchmarked sorting using utf8 columns and have not observed 
significant performance overhead:
   
   ```
   merge sorted utf8 low cardinality
                           time:   [3.8824 ms 3.8903 ms 3.8987 ms]
                           change: [-2.8027% -2.1227% -1.5573%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   sort merge utf8 low cardinality
                           time:   [4.2295 ms 4.2360 ms 4.2430 ms]
                           change: [+0.5975% +0.8722% +1.1242%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 15 outliers among 100 measurements (15.00%)
     2 (2.00%) low mild
     5 (5.00%) high mild
     8 (8.00%) high severe
   
   sort utf8 low cardinality
                           time:   [6.4265 ms 6.4369 ms 6.4483 ms]
                           change: [-0.3276% -0.0658% +0.1908%] (p = 0.62 > 
0.05)
                           No change in performance detected.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   sort partitioned utf8 low cardinality
                           time:   [343.87 µs 347.16 µs 351.07 µs]
                           change: [-1.1291% +0.3066% +1.8160%] (p = 0.68 > 
0.05)
                           No change in performance detected.
   Found 15 outliers among 100 measurements (15.00%)
     5 (5.00%) high mild
     10 (10.00%) high severe
   
   merge sorted utf8 high cardinality
                           time:   [5.9968 ms 6.0083 ms 6.0207 ms]
                           change: [-1.9398% -1.6215% -1.2928%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     7 (7.00%) high mild
     1 (1.00%) high severe
   
   sort merge utf8 high cardinality
                           time:   [6.4266 ms 6.4399 ms 6.4558 ms]
                           change: [-0.5594% -0.2292% +0.1020%] (p = 0.19 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   sort utf8 high cardinality
                           time:   [7.7403 ms 7.7541 ms 7.7693 ms]
                           change: [-2.7779% -2.1541% -1.6176%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   
   sort partitioned utf8 high cardinality
                           time:   [364.21 µs 370.21 µs 376.41 µs]
                           change: [+2.2461% +4.2833% +6.3333%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   ```
   
   sort, sort_tpch and tpch10 benchmarks also showed not much difference in 
performance.
   
   ```
   Comparing main and fix-sort-mem-usage-reserve-mem-for-sort-merging
   --------------------
   Benchmark sort.json
   --------------------
   
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
   ┃ Query        ┃       main ┃ 
fix-sort-mem-usage-reserve-mem-for-sort-merging ┃    Change ┃
   
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
   │ Qsort utf8   │ 28514.66ms │                                      
28451.16ms │ no change │
   │ Qsort int    │ 29749.57ms │                                      
29879.78ms │ no change │
   │ Qsort        │ 28608.24ms │                                      
29085.31ms │ no change │
   │ decimal      │            │                                                
 │           │
   │ Qsort        │ 31013.98ms │                                      
31126.24ms │ no change │
   │ integer      │            │                                                
 │           │
   │ tuple        │            │                                                
 │           │
   │ Qsort utf8   │ 28925.23ms │                                      
29281.38ms │ no change │
   │ tuple        │            │                                                
 │           │
   │ Qsort mixed  │ 30579.63ms │                                      
30550.25ms │ no change │
   │ tuple        │            │                                                
 │           │
   
└──────────────┴────────────┴─────────────────────────────────────────────────┴───────────┘
   
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
   ┃ Benchmark Summary                                              ┃           
  ┃
   
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
   │ Total Time (main)                                              │ 
177391.31ms │
   │ Total Time (fix-sort-mem-usage-reserve-mem-for-sort-merging)   │ 
178374.12ms │
   │ Average Time (main)                                            │  
29565.22ms │
   │ Average Time (fix-sort-mem-usage-reserve-mem-for-sort-merging) │  
29729.02ms │
   │ Queries Faster                                                 │           
0 │
   │ Queries Slower                                                 │           
0 │
   │ Queries with No Change                                         │           
6 │
   
└────────────────────────────────────────────────────────────────┴─────────────┘
   --------------------
   Benchmark sort_tpch.json
   --------------------
   
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
   ┃ Query        ┃     main ┃ fix-sort-mem-usage-reserve-mem-for-sort-merging 
┃    Change ┃
   
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
   │ Q1           │ 187.27ms │                                        187.75ms 
│ no change │
   │ Q2           │ 154.92ms │                                        157.82ms 
│ no change │
   │ Q3           │ 885.18ms │                                        893.74ms 
│ no change │
   │ Q4           │ 184.50ms │                                        189.54ms 
│ no change │
   │ Q5           │ 315.13ms │                                        322.19ms 
│ no change │
   │ Q6           │ 335.00ms │                                        338.65ms 
│ no change │
   │ Q7           │ 584.88ms │                                        594.44ms 
│ no change │
   │ Q8           │ 452.66ms │                                        460.51ms 
│ no change │
   │ Q9           │ 472.15ms │                                        475.38ms 
│ no change │
   │ Q10          │ 681.58ms │                                        685.07ms 
│ no change │
   
└──────────────┴──────────┴─────────────────────────────────────────────────┴───────────┘
   
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
   ┃ Benchmark Summary                                              ┃           
┃
   
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
   │ Total Time (main)                                              │ 4253.28ms 
│
   │ Total Time (fix-sort-mem-usage-reserve-mem-for-sort-merging)   │ 4305.10ms 
│
   │ Average Time (main)                                            │  425.33ms 
│
   │ Average Time (fix-sort-mem-usage-reserve-mem-for-sort-merging) │  430.51ms 
│
   │ Queries Faster                                                 │         0 
│
   │ Queries Slower                                                 │         0 
│
   │ Queries with No Change                                         │        10 
│
   
└────────────────────────────────────────────────────────────────┴───────────┘
   --------------------
   Benchmark sort_tpch_sf10.json
   --------------------
   
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ Query        ┃       main ┃ 
fix-sort-mem-usage-reserve-mem-for-sort-merging ┃        Change ┃
   
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
   │ Q1           │  2617.74ms │                                       
2652.65ms │     no change │
   │ Q2           │  2019.64ms │                                       
2034.08ms │     no change │
   │ Q3           │ 10748.55ms │                                      
11028.78ms │     no change │
   │ Q4           │  2565.69ms │                                       
2581.39ms │     no change │
   │ Q5           │  3182.88ms │                                       
3226.93ms │     no change │
   │ Q6           │  3379.76ms │                                       
3432.35ms │     no change │
   │ Q7           │  7200.46ms │                                       
7245.30ms │     no change │
   │ Q8           │  4932.09ms │                                       
5133.81ms │     no change │
   │ Q9           │  5488.64ms │                                       
5473.89ms │     no change │
   │ Q10          │ 18188.22ms │                                      
17129.05ms │ +1.06x faster │
   
└──────────────┴────────────┴─────────────────────────────────────────────────┴───────────────┘
   
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ Benchmark Summary                                              ┃           
 ┃
   
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
   │ Total Time (main)                                              │ 
60323.67ms │
   │ Total Time (fix-sort-mem-usage-reserve-mem-for-sort-merging)   │ 
59938.23ms │
   │ Average Time (main)                                            │  
6032.37ms │
   │ Average Time (fix-sort-mem-usage-reserve-mem-for-sort-merging) │  
5993.82ms │
   │ Queries Faster                                                 │          
1 │
   │ Queries Slower                                                 │          
0 │
   │ Queries with No Change                                         │          
9 │
   
└────────────────────────────────────────────────────────────────┴────────────┘
   --------------------
   Benchmark tpch_sf10.json
   --------------------
   
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
   ┃ Query        ┃      main ┃ fix-sort-mem-usage-reserve-mem-for-sort-merging 
┃       Change ┃
   
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
   │ QQuery 1     │ 1003.40ms │                                        974.06ms 
│    no change │
   │ QQuery 2     │  142.90ms │                                        142.05ms 
│    no change │
   │ QQuery 3     │  437.35ms │                                        429.21ms 
│    no change │
   │ QQuery 4     │  218.20ms │                                        219.31ms 
│    no change │
   │ QQuery 5     │  638.99ms │                                        633.81ms 
│    no change │
   │ QQuery 6     │  152.49ms │                                        151.94ms 
│    no change │
   │ QQuery 7     │  937.33ms │                                        952.74ms 
│    no change │
   │ QQuery 8     │  690.88ms │                                        675.75ms 
│    no change │
   │ QQuery 9     │ 1055.28ms │                                       1039.38ms 
│    no change │
   │ QQuery 10    │  621.41ms │                                        632.68ms 
│    no change │
   │ QQuery 11    │   93.62ms │                                        100.54ms 
│ 1.07x slower │
   │ QQuery 12    │  321.36ms │                                        329.27ms 
│    no change │
   │ QQuery 13    │  442.88ms │                                        434.09ms 
│    no change │
   │ QQuery 14    │  252.07ms │                                        252.79ms 
│    no change │
   │ QQuery 15    │  419.63ms │                                        414.17ms 
│    no change │
   │ QQuery 16    │  106.30ms │                                        107.51ms 
│    no change │
   │ QQuery 17    │ 1088.73ms │                                       1083.62ms 
│    no change │
   │ QQuery 18    │ 1795.68ms │                                       1785.46ms 
│    no change │
   │ QQuery 19    │  462.31ms │                                        458.10ms 
│    no change │
   │ QQuery 20    │  403.54ms │                                        428.10ms 
│ 1.06x slower │
   │ QQuery 21    │ 1453.76ms │                                       1454.77ms 
│    no change │
   │ QQuery 22    │  158.43ms │                                        151.23ms 
│    no change │
   
└──────────────┴───────────┴─────────────────────────────────────────────────┴──────────────┘
   
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
   ┃ Benchmark Summary                                              ┃           
 ┃
   
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
   │ Total Time (main)                                              │ 
12896.55ms │
   │ Total Time (fix-sort-mem-usage-reserve-mem-for-sort-merging)   │ 
12850.56ms │
   │ Average Time (main)                                            │   
586.21ms │
   │ Average Time (fix-sort-mem-usage-reserve-mem-for-sort-merging) │   
584.12ms │
   │ Queries Faster                                                 │          
0 │
   │ Queries Slower                                                 │          
2 │
   │ Queries with No Change                                         │         
20 │
   
└────────────────────────────────────────────────────────────────┴────────────┘
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to