gowa commented on PR #3121:
URL: https://github.com/apache/parquet-java/pull/3121#issuecomment-2608733828

   Hi @gszadovszky , @wgtmac . Thank you for your feedback.
   Yes, I see that it is a big feature and the implementation is far from being 
a simple fix. And, maybe, it should be a pluggable thing instead of being a 
first-class resident in the code. However, if you feel the changes can be 
incorporated into the main codebase, I could try to find someone to review 
ByteBuddy part and implement the reader part as well.
   
   As for benchmarks. I've implemented some and committed. I attempted to 
replicate the original org.apache.parquet.benchmarks.WriteBenchmarks with some 
proto stuff in org.apache.parquet.benchmarks.ProtoWriteBenchmarks.
   
   ```
   The result are as follows: the bigger number of fields (especially 
primitives), the bigger the gain.
   E.g. for 100 int32 fields:
   Benchmark                                                            
(codegenMode)  (protoClass)  Mode  Cnt   Score   Error  Units
   ProtoWriteBenchmarks.write1MRowsBS256MPS4MUncompressed                       
  OFF  Test100Int32    ss    5  13.171 ± 1.206   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS4MUncompressed                
REQUIRED_ALL  Test100Int32    ss    5   6.075 ± 1.258   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS8MUncompressed                       
  OFF  Test100Int32    ss    5  13.304 ± 1.497   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS8MUncompressed                
REQUIRED_ALL  Test100Int32    ss    5   6.235 ± 0.617   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS4MUncompressed                       
  OFF  Test100Int32    ss    5  13.450 ± 3.429   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS4MUncompressed                
REQUIRED_ALL  Test100Int32    ss    5   5.947 ± 0.430   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS8MUncompressed                       
  OFF  Test100Int32    ss    5  13.433 ± 3.879   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS8MUncompressed                
REQUIRED_ALL  Test100Int32    ss    5   6.523 ± 2.831   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeGZIP                  
  OFF  Test100Int32    ss    5  13.288 ± 0.429   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeGZIP           
REQUIRED_ALL  Test100Int32    ss    5   6.333 ± 0.444   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeSNAPPY                
  OFF  Test100Int32    ss    5  13.197 ± 1.396   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeSNAPPY         
REQUIRED_ALL  Test100Int32    ss    5   6.855 ± 2.689   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeUncompressed          
  OFF  Test100Int32    ss    5  13.473 ± 1.930   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeUncompressed   
REQUIRED_ALL  Test100Int32    ss    5   6.006 ± 0.285   s/op
   
   ```
   
   For 30 int32 fields:
   ```
   Benchmark                                                            
(codegenMode)  (protoClass)  Mode  Cnt  Score   Error  Units
   ProtoWriteBenchmarks.write1MRowsBS256MPS4MUncompressed                       
  OFF   Test30Int32    ss    5  3.421 ± 1.303   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS4MUncompressed                
REQUIRED_ALL   Test30Int32    ss    5  2.410 ± 0.357   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS8MUncompressed                       
  OFF   Test30Int32    ss    5  3.396 ± 0.708   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS8MUncompressed                
REQUIRED_ALL   Test30Int32    ss    5  2.362 ± 0.174   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS4MUncompressed                       
  OFF   Test30Int32    ss    5  3.250 ± 0.721   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS4MUncompressed                
REQUIRED_ALL   Test30Int32    ss    5  2.310 ± 0.168   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS8MUncompressed                       
  OFF   Test30Int32    ss    5  3.447 ± 0.884   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS8MUncompressed                
REQUIRED_ALL   Test30Int32    ss    5  2.416 ± 0.387   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeGZIP                  
  OFF   Test30Int32    ss    5  3.156 ± 0.276   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeGZIP           
REQUIRED_ALL   Test30Int32    ss    5  2.514 ± 0.687   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeSNAPPY                
  OFF   Test30Int32    ss    5  3.398 ± 0.853   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeSNAPPY         
REQUIRED_ALL   Test30Int32    ss    5  2.501 ± 0.323   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeUncompressed          
  OFF   Test30Int32    ss    5  3.644 ± 3.423   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeUncompressed   
REQUIRED_ALL   Test30Int32    ss    5  2.384 ± 0.203   s/op
   
   ```
   
   
   For 30 strings ("fieldXX:XX"):
   ```
   Benchmark                                                            
(codegenMode)  (protoClass)  Mode  Cnt   Score   Error  Units
   ProtoWriteBenchmarks.write1MRowsBS256MPS4MUncompressed                       
  OFF  Test30String    ss    5   9.426 ± 3.621   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS4MUncompressed                
REQUIRED_ALL  Test30String    ss    5   8.257 ± 1.113   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS8MUncompressed                       
  OFF  Test30String    ss    5   9.848 ± 1.141   s/op
   ProtoWriteBenchmarks.write1MRowsBS256MPS8MUncompressed                
REQUIRED_ALL  Test30String    ss    5   8.302 ± 1.910   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS4MUncompressed                       
  OFF  Test30String    ss    5  10.216 ± 1.843   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS4MUncompressed                
REQUIRED_ALL  Test30String    ss    5   8.173 ± 1.419   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS8MUncompressed                       
  OFF  Test30String    ss    5   9.940 ± 1.680   s/op
   ProtoWriteBenchmarks.write1MRowsBS512MPS8MUncompressed                
REQUIRED_ALL  Test30String    ss    5   8.242 ± 1.270   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeGZIP                  
  OFF  Test30String    ss    5   9.833 ± 1.010   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeGZIP           
REQUIRED_ALL  Test30String    ss    5   8.247 ± 1.284   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeSNAPPY                
  OFF  Test30String    ss    5   9.638 ± 0.502   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeSNAPPY         
REQUIRED_ALL  Test30String    ss    5   7.935 ± 0.889   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeUncompressed          
  OFF  Test30String    ss    5   9.968 ± 1.651   s/op
   ProtoWriteBenchmarks.write1MRowsDefaultBlockAndPageSizeUncompressed   
REQUIRED_ALL  Test30String    ss    5   8.356 ± 1.319   s/op
   
   ```
   
   For 5-7 fields the gain is negligeable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org
For additional commands, e-mail: issues-h...@parquet.apache.org

Reply via email to