parthchandra commented on PR #4339:
URL: 
https://github.com/apache/datafusion-comet/pull/4339#issuecomment-4480202130

   Thank you @schenksj for putting in this effort. Really thorough work here!
   However, I think this may be solving a harder problem than the one we 
actually need to solve right now.. 
   
   In 
[#3932](https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4296101740),
 I suggested putting the Delta code in a `contrib` directory -- conditionally 
compiled, not included by default, with fewer guarantees for end users. That's 
primarily an organizational and build concern: keep the Delta code isolated so 
it doesn't become a maintenance burden for core committers who don't have Delta 
expertise.
   
   This PR takes that idea significantly further with a generic extension 
framework. I understand the appeal of building for the general case, but I'm 
not sure we need this level of abstraction yet.  
   
   The biggest reason for thinking so is that the API surface is substantial 
and we do not have enough concrete implementations to nail the API down. This 
PR introduces a stability commitment once it is merged - 
`CometScanRuleExtension` has 6 methods, `CometOperatorSerdeExtension` has 5, 
`ContribOperatorPlanner` has 2, `ContribPlannerContext` has 5, 
`ParquetDatasourceParams` has 15 fields. This is a lot to have to finalize.
   
   I think we can get the isolation we need by taking #3932's approach (which 
mirrors how Iceberg works today) and reorganizing it into a `contrib/delta/` 
directory with conditional compilation. No SPI framework is needed, instead we 
can have some core touchpoints that are behind a feature flag.
   
   *Directory structure:*
   ```
   contrib/delta/
     native/           # Rust: delta_kernel integration, DV filter, column 
mapping
     spark/            # Scala: CometDeltaNativeScanExec, DeltaReflection, serde
     proto/            # DeltaScan proto messages (or in the main proto, gated)
     tests/            # Delta-specific test suites
   ```
   
   *Core touchpoints (feature-gated):*
   - `operator.proto`: `DeltaScan` as a typed variant (like `IcebergScan` at 
field 112) -- compile-time type safety, better debuggability than opaque 
`ContribOp { kind, payload }`
   - `planner.rs`: one `#[cfg(feature = "delta")] OpStruct::DeltaScan => { ... 
}` match arm (~10 lines of dispatch to a function in the delta crate)
   - `CometScanRule.scala`: Delta table detection behind 
`COMET_DELTA_NATIVE_ENABLED` config check (~20 lines, same pattern as the 
existing Iceberg block at lines 304-430)
   - `CometExecRule.scala`: one match arm for `CometDeltaNativeScanExec` (~6 
lines)
   
   *Build:*
   - Rust: `#[cfg(feature = "delta")]` on the match arms + `delta_kernel` 
dependency
   - Scala: Maven profile `-Pcontrib-delta` (like `-Piceberg` today, or the 
`-Pcontrib-example` profile already in this PR)
   - **Not** compiled or shipped by default
   
   **Total core coupling:** ~40 lines behind feature gates, following the same 
pattern as Iceberg. No new abstractions, no new traits, no runtime dispatch.
   
   Iceberg already works this way in Comet -- typed `IcebergScan` proto 
variant, direct match arm in `planner.rs`, direct handling in `CometScanRule`. 
It's simple and it works. Delta can follow the same pattern.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to