qjqqyy commented on issue #5107: URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077508202
> @qjqqyy Yes, each row will initialize `AvroSerializer` (variables in the lambda named `converter`) the reason why `converter` is re-initialized for each row is because a `new AvroSerializer()` (explicit object creation) is happening for each row. I did some tests, exact same dataset. #### git master <img width="1224" alt="Screenshot 2022-03-24 at 7 02 57 PM" src="https://user-images.githubusercontent.com/8439769/159903075-067f3271-1ac0-4919-be25-8429d515fbae.png"> #### with the patch from earlier comment <img width="1218" alt="Screenshot 2022-03-24 at 7 03 28 PM" src="https://user-images.githubusercontent.com/8439769/159903248-06f45a7a-301e-40b3-a2ca-36cdf988766f.png"> Seems like it's a regression introduced in #4789 * before: 1 invocation of `new AvroSerializer()` for each partition * after: 1 invocation of `new AvroSerializer()` for each row -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org