the-other-tim-brown opened a new pull request, #13686:
URL: https://github.com/apache/hudi/pull/13686

   ### Change Logs
   
   Currently, our payload based paths use `HoodieAvroRecord` to transport data 
between Spark Executors. As we move away from the Payloads though, we can start 
relying on the other record objects directly. The `HoodieAvroIndexedRecord` can 
fit our needs for transporting the Avro data but needs some changes to match 
the existing performance.
   
   This change introduces a new class `SerializableIndexedRecord` which is used 
to manage the serialization of the data in the `HoodieAvroIndexedRecord`. 
Unlike payloads, the data is only written out to a byte array when it is 
required. This allows us to keep performance on par with the existing 
performance when working with data that only resides within a single machine.
   
   For existing workflows that use `HoodieAvroIndexedRecord` like compaction, 
we expect to see the same performance. This is validate with a JMH 
Microbenchmark where I validate that the call the `setSchema` does not cause 
throughput changes when working with the object.
   
   The serialized size of the object when using Kryo is also about 2/3 the size 
of the existing record with a fairly basic object with 15 fields with mainly 
numeric or small strings as values.
   
   ### Impact
   
   Allows us to move away from Payloads without performance degradation on 
serialization
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
     ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to