alexeykudinkin commented on PR #5629: URL: https://github.com/apache/hudi/pull/5629#issuecomment-1195937698
@danny0405 > I agree we should decouple Hudi from Avro, but that does not mean we should lean back to engine-specific data structures which is very hard to maintain as a engine neutral project, see how hard it is for hudi to integrate with a new engine now :), i kind of expect hudi's own reader/writer/data structures, which is the right direction we should elaborate with. I don't think we are aligned on this one: Hudi is and will be staying engine-neutral project. However for the top-of-the-line performance on *any* engine (to stay competitive with other formats) we *have to* use engine-specific representations (think `Row`, `RowData`, `ArrayWritable`, Arrow, etc). There's just no other way -- any intermediate representation will be a tax on performance, and general direction is that we want to provide best possible performance in any supported workload be it a a read or write. > And another concern i always have in my mind is hudi needs a stable release tooo much ! We can not make huge changes to core reader/writers now at this moment before we do enough tests/practice, and we should not rush in the code for just the reason of code rebase effort. Totally agree with you there, and it's one of the reasons why we decided that it's a good idea to take a more measured approach here and avoid pushing really hard (and compromising on quality testing) to meet 0.12 deadline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
