[GitHub] [hudi] YuweiXiao commented on issue #5107: [SUPPORT] High performance costs of AvroSerializer in Datasource writing

GitBox Thu, 24 Mar 2022 02:15:54 -0700


YuweiXiao commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077404381



   @boneanxs 
   True, full support of Dataset is the long term solution. In my experiment, 
optimizing the usage of `AvroSerializer` could save 80% costs of the source 
data reading. But the optimization requires modification of the 
`AvroSerializer` source code in the spark side.
   
   @qjqqyy Yes, each row will initialize `AvroSerializer` (variables in the 
lambda named `converter`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] YuweiXiao commented on issue #5107: [SUPPORT] High performance costs of AvroSerializer in Datasource writing

Reply via email to