Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

via GitHub Wed, 18 Mar 2026 11:01:58 -0700


vinothchandar commented on code in PR #18276:
URL: https://github.com/apache/hudi/pull/18276#discussion_r2955275942



##########
rfc/rfc-98/rfc-98.md:
##########
@@ -52,25 +54,260 @@ The current implementation of Spark Datasource V2 
integration is presented in th
 
 ## Implementation
 
-<!--  -->
+The main problem is that Hudi's write path involves indexing, precombining, 
upsert/insert routing, file sizing, and table services 
(compaction/clustering/cleaning). 
+Also `HoodieSparkSqlWriter::write` handles schema evolution, partition 
encoding, metadata updates, and multi-writer concurrency.
+DSv2's `WriteBuilder` >> `BatchWrite` >> DataWriter API is too simplistic for 
this, and moving to this entirely would be high risk.
+
+The proposed approach is hybrid: DSv2 for reads, with a DSv1 fallback for 
writes (`V2TableWithV1Fallback`) in the current state.
+Later, if a DSv2 write path can be implemented without loss of performance or 
functionality, it may become possible to move to full DSv2 support.
+However, this migration should still be incremental, please check the "Future 
Work" chapter for details.
+
+Overall proposed architecture for the hybrid approach is shown in the 
following schema:
+
+![Proposed approach with hybrid V1 write and V2 
read](integration_with_DSv2_read.jpg)
+
+### DataFrame API
+
+A new SPI short name, `"hudi_v2"`, activates the DSv2 read path when using the 
Spark DataFrame API.

Review Comment:
   yes. sounds sane.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] docs: [RFC-98] Design doc of DSv2 read support [hudi]

Reply via email to