wulei0302 commented on PR #5627: URL: https://github.com/apache/hudi/pull/5627#issuecomment-1169528775
> Made one pass, changes themselves look reasonable. Need to read SparkRecord and its utils closely. My understanding of this PR is > > * Wraps the payload completely using methods on the HoodieRecord abstraction > * All code paths now just call HoodieRecord for merging, the old getInsertValue() is pushed into Record APIs as well. > > I would love to understand > > * What testing has been done? AFAICT the Spark datasource writer is still not working with SparkRecord? > * any perf comparisons for the new SparkRecord based writes vs AvroRecord based writes. > * My wishful thinking is that we eliminate `preCombine` as well somehow. This will make the end system very simple for implementation in various engines. > > I suggest @danny0405 to look at the changes from the Flink lens once as well. - Yeah, The Testing and SparkRecordWriter/Reader has been completed in [Step3](https://github.com/apache/hudi/pull/5629) - (comparisons)This is in progress on the basis of Step3 - Currently, Our guys think that `preCombine` acts between log files and `combineAndGetUpdateValue` acts between log and base files. There are some differences in definitions. Maybe from a higher abstraction, it can be merged into one API, but let's discuss this further, and mayby it can be completed in other pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
