LantaoJin opened a new pull request, #80:
URL: https://github.com/apache/datafusion-java/pull/80

   ## Which issue does this PR close?
   
   - Closes #75 .
   
   ## Rationale for this change
   
   `SessionContext.fromProto(byte[])` accepts only DataFusion's *own* 
`LogicalPlanNode` proto. [Substrait](https://substrait.io/) — the cross-engine 
logical-plan standard that DataFusion already supports through the 
[`datafusion-substrait`](https://crates.io/crates/datafusion-substrait) crate — 
has had no Java-side entry point. Embedders that compile plans elsewhere 
(Calcite via [Isthmus](https://github.com/substrait-io/substrait-java), custom 
planners, federation hubs, integrations with other engines) had to round-trip 
through SQL to use the Java binding. That round-trip is lossy: source-side 
optimisations baked into the Substrait plan are discarded, and SQL is not 
always expressive enough to round-trip cleanly when plans reference extensions 
or function variants with no surface SQL form.
   
   ## What changes are included in this PR?
   
   This PR adds a single new entry point that mirrors the existing `fromProto` 
shape but consumes Substrait `Plan` bytes instead. The implementation is small 
(~50 LOC of JNI plus ~25 LOC on the Java side); the bulk of the diff is the 
test that round-trips a hand-built Substrait plan through the JNI bridge.
   
   New public Java API on `SessionContext`:
   
   ```java
   public DataFrame fromSubstrait(byte[] planBytes);
   ```
   
   `planBytes` is a serialised `substrait.proto.Plan`. The plan is translated 
against this context's catalog: any tables it references must already be 
registered. The returned `DataFrame` is lazy and composes with the rest of the 
API.
   
   **Default-off**, so `cargo build` (and therefore `make test`, `make`, and 
everyone who doesn't need Substrait) stays hermetic without any new build 
prerequisites. Substrait support is opt-in:
   
   | invocation | substrait support | build prereqs |
   |---|---|---|
   | `cargo build` (default) | off (stub handler) | none |
   | `cargo build --features substrait` | on | `protoc` on PATH |
   | `cargo build --features substrait,protoc` | on (vendored protoc) | `cmake` 
on PATH |
   
   The Java surface is unchanged either way — 
`SessionContext.fromSubstrait(...)` is always present; calls just throw a clear 
"datafusion-jni was built without the `substrait` Cargo feature; rebuild with 
`--features substrait`" error from the JVM if the feature was compiled off. 
`SessionContextSubstraitTest` detects this case and skips itself via JUnit's 
`Assumptions.assumeFalse(...)`, so `make test` stays green either way.
   
   This is intentionally different from PR #60's avro handling, which is 
always-on.
   
   ## Are these changes tested?
   
   Yes, 7 new tests in `SessionContextSubstraitTest`
   
   ## Are there any user-facing changes?
   
   Yes, purely additive. New public API:
   
   - `SessionContext.fromSubstrait(byte[]) → DataFrame`
   
   No API removals, no deprecations, no behavior change for existing callers. 
The default `cargo build` does **not** pull in `datafusion-substrait` and adds 
no new build prerequisites; `SessionContext.fromSubstrait(...)` is present but 
throws "feature not enabled" at runtime. Users who need Substrait rebuild with 
`--features substrait` (and either install `protoc` or also enable the `protoc` 
helper feature). The native binary is unchanged in size unless the feature is 
opted in.
   
   The new test-scope dependency `io.substrait:core:0.81.0` is added to the 
parent POM's `dependencyManagement` (with version property 
`substrait.java.version`) and to `core/pom.xml` in `test` scope only; it does 
not enter the runtime classpath of the published artifact.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to