alexeykudinkin commented on PR #5629:
URL: https://github.com/apache/hudi/pull/5629#issuecomment-1194409817

   @danny0405 a few considerations we need to keep in mind here:
   
   1. RFC-46 is a stepping stone for transitioning from our current "modus 
operandi" with intermediate representation (Avro) to a state where we'd 
completely hybrid in relying on engine-specific containers (Dataset/RDD for 
Spark, for ex) as well as Data representation formats (`InternalRow` for Spark, 
for ex). This change is very critical first step in that direction of 
decoupling Hudi fro Avro.
   2. Given how dynamic our code-base is we can't park this change for long. 
Even now after 2 months of dev, it's going to be a humongous effort to rebase 
it again onto the latest changes given how much have landed in these 2 months.
   
   While i understand that we all expect radical improvements, we need to keep 
in mind that these will come when we reach the final state.
   
   P.S. Also, BTW, we won't see 5x improvements, it's gonna be more like up to 
2x in the best case simply b/c Hudi is pretty tight in terms of performance 
across the board.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to