HeartSaVioR opened a new pull request, #51036:
URL: https://github.com/apache/spark/pull/51036

   ### What changes were proposed in this pull request?
   
   This PR proposes to squeeze the protocol of retrieving timers for 
transformWithState in PySpark, which will help a lot on dealing with 
not-to-be-huge number of timers.
   
   Here are the changes:
   
   * StatefulProcessorHandleImpl.listTimers(), 
StatefulProcessorHandleImpl.getExpiredTimers() no longer requires additional 
request to notice there is no further data to read.
     * We inline the data into proto message, to ease of determine whether the 
iterator has fully consumed or not.
   
   This change is the same mechanism we applied for ListState & MapState. We 
got performance improvement in the prior case, and we also see this change to 
be helpful on our internal benchmark.
   
   
   
   ### Why are the changes needed?
   
   To optimize further on some timer operations.
   
   We benchmarked the change with listing 100 timers (PR for benchmarking: 
#50952), and we saw overall performance improvements.
   
   > Before the fix
   
   ```
   
    ==================== SET IMPLICIT KEY latency (micros) 
======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   78.250               141.583         184.375         635.792         
962743.500
    ==================== REGISTER latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   65.375               126.125         162.792         565.833         
60809.333
    ==================== DELETE latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   68.500               130.000         170.292         610.083         
156733.125
    ==================== LIST latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   486.833              714.961         998.625         2695.417                
167039.959
   
    ==================== SET IMPLICIT KEY latency (micros) 
======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   77.916               139.000         182.375         671.792         
521809.958
    ==================== REGISTER latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   65.000               124.333         160.875         596.667         
30860.208
    ==================== DELETE latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   67.125               127.916         170.250         740.051         
64404.416
    ==================== LIST latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   482.041              710.333         1050.333                2685.500        
        76762.583
   
    ==================== SET IMPLICIT KEY latency (micros) 
======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   78.208               139.959         181.459         722.459         
713788.250
    ==================== REGISTER latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   65.209               125.125         159.625         636.666         
27963.167
    ==================== DELETE latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   67.417               129.000         168.875         764.602         
12991.667
    ==================== LIST latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   479.000              709.584         1045.543                2776.541        
        92247.542
   ```
   
   > After the fix
   
   ```
    ==================== SET IMPLICIT KEY latency (micros) 
======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   31.250               47.250          75.875          150.000         
551557.750
    ==================== REGISTER latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   26.958               39.208          65.208          122.667         
78609.292
    ==================== DELETE latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   23.500               41.125          64.542          125.958         
52641.042
    ==================== LIST latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   93.125               118.542         156.500         284.625         
19910.000
   
    ==================== SET IMPLICIT KEY latency (micros) 
======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   30.875               44.083          70.417          128.875         
628912.209
    ==================== REGISTER latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   26.917               36.416          61.292          109.917         
164584.666
    ==================== DELETE latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   23.333               38.375          59.542          113.839         
114350.250
    ==================== LIST latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   94.125               115.208         148.917         246.292         
36924.292
   
    ==================== SET IMPLICIT KEY latency (micros) 
======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   31.375               58.375          93.041          243.750         
719545.583
    ==================== REGISTER latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   26.959               50.167          81.833          194.375         
67609.583
    ==================== DELETE latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   24.208               50.834          83.000          211.018         
20611.959
    ==================== LIST latency (micros) ======================
   perc:50              perc:95         perc:99         perc:99.9               
perc:100
   95.291               132.375         183.875         427.584         
36971.792
   ```
   
   Worth noting that it is not only impacting the LIST operation - it also 
impacts other operations as well. It's not clear why it happens, but the 
direction of reducing round-trips is proven to be the right direction.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing UT.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to