steveloughran commented on PR #3562:
URL: https://github.com/apache/parquet-java/pull/3562#issuecomment-4591384114

   Latest status. Ran benchmarks unattended; still lot of variation where even 
some tests which only write data are being statisticially significantly faster 
to serialize. That's a codepath which isn't being updated, and implies that 
system/other work is interfering. 
   
   Results: 
https://github.com/steveloughran/benchmarking-variants/tree/main/json/hardening
   [final hardened 
json](https://github.com/steveloughran/benchmarking-variants/blob/main/json/hardening/2026-06-01-parquet-hardened-01.json)
 and [baseline 
master](https://github.com/steveloughran/benchmarking-variants/blob/main/json/hardening/2026-05-31-parquet-master-02.json)
   
   Comparing these with [JMH 
tabulate](https://github.com/steveloughran/jmh-tabulate) (important: use my 
fork as it strictly enforces a safe version of the charting.js lib from npm):
   <img width="1543" height="627" alt="Screenshot 2026-06-01 at 10 18 59" 
src="https://github.com/user-attachments/assets/cea32db7-e770-40f8-a552-59eadc6205b5";
 />
   
   
   This looks like a slowdown but filter on statistical significance and most 
vanish and of the three which are significant, *two are writing data not 
reading it*
   <img width="1808" height="952" alt="Screenshot 2026-06-01 at 10 17 32" 
src="https://github.com/user-attachments/assets/1d0de7ac-59ef-4e74-8883-3599d5c0584d";
 />
   
   the two `write` variants are just serializing the prebuilt variant, so not 
stressing the code. the reader does on a deeply nested structure, but there I'm 
not sure anything is showing
   <img width="1554" height="1551" alt="Screenshot 2026-06-01 at 10 39 29" 
src="https://github.com/user-attachments/assets/f7764715-fa34-41be-8964-a4140ab72c5a";
 />
   
   Summary: even though minor some statistically significant slowdown is being 
reported, the fact that speedups in unmodified codepaths are also observed 
tells me the results aren't reliable.
   
   Whatever changes are being made here, they aren't actually measurable in the 
new dataset and general os/execution/jvm noise is more of a factor
   
   
   + updated all the constructor javadocs
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to