YuweiXiao commented on issue #5107: URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077404381
@boneanxs True, full support of Dataset is the long term solution. In my experiment, optimizing the usage of `AvroSerializer` could save 80% costs of the source data reading. But the optimization requires modification of the `AvroSerializer` source code in the spark side. @qjqqyy Yes, each row will initialize `AvroSerializer` (variables in the lambda named `converter`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org