YuweiXiao commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077404381


   @boneanxs 
   True, full support of Dataset is the long term solution. In my experiment, 
optimizing the usage of `AvroSerializer` could save 80% costs of the source 
data reading. But the optimization requires modification of the 
`AvroSerializer` source code in the spark side.
   
   @qjqqyy Yes, each row will initialize `AvroSerializer` (variables in the 
lambda named `converter`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to