qjqqyy commented on issue #5107:
URL: https://github.com/apache/hudi/issues/5107#issuecomment-1077508202


   > @qjqqyy Yes, each row will initialize `AvroSerializer` (variables in the 
lambda named `converter`)
   
   the reason why `converter` is re-initialized for each row is because a `new 
AvroSerializer()` (explicit object creation) is happening for each row.
   
   I did some tests, exact same dataset.
   
   #### git master
   <img width="1224" alt="Screenshot 2022-03-24 at 7 02 57 PM" 
src="https://user-images.githubusercontent.com/8439769/159903075-067f3271-1ac0-4919-be25-8429d515fbae.png";>
   
   #### with the patch from earlier comment
   <img width="1218" alt="Screenshot 2022-03-24 at 7 03 28 PM" 
src="https://user-images.githubusercontent.com/8439769/159903248-06f45a7a-301e-40b3-a2ca-36cdf988766f.png";>
   
   Seems like it's a regression introduced in #4789
   
   * before: 1 invocation of `new AvroSerializer()` for each partition
   * after: 1 invocation of `new AvroSerializer()` for each row


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to