mattcuento commented on PR #1360:
URL:
https://github.com/apache/datafusion-ballista/pull/1360#issuecomment-3708756589
> there is one issue with docker build and looks like issue with disk space
failing other (not sure how to fix)
Thanks, looks like `substrait` doesn't run a high enough version of `protoc`
to to support optional fields by default.
```
error: failed to run custom build command for `substrait v0.62.2`
note: To improve backtraces for build dependencies, set the
CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true environment variable to enable
debug information generation.
Caused by:
process didn't exit successfully:
`/home/builder/workspace/target/release/build/substrait-339db7bdcba362ae/build-script-build`
(exit status: 1)
--- stdout
cargo:rerun-if-env-changed=FORCE_REBUILD
cargo:rerun-if-changed=substrait
cargo:rerun-if-changed=substrait/text/dialect_schema.yaml
cargo:rerun-if-changed=substrait/text/simple_extensions_schema.yaml
cargo:rerun-if-changed=substrait/proto/substrait/plan.proto
cargo:rerun-if-changed=substrait/proto/substrait/extensions/extensions.proto
cargo:rerun-if-changed=substrait/proto/substrait/type.proto
cargo:rerun-if-changed=substrait/proto/substrait/parameterized_types.proto
cargo:rerun-if-changed=substrait/proto/substrait/algebra.proto
cargo:rerun-if-changed=substrait/proto/substrait/extended_expression.proto
cargo:rerun-if-changed=substrait/proto/substrait/capabilities.proto
cargo:rerun-if-changed=substrait/proto/substrait/function.proto
cargo:rerun-if-changed=substrait/proto/substrait/type_expressions.proto
--- stderr
Error: Custom { kind: Other, error: "protoc failed: substrait/type.proto:
This file contains proto3 optional fields, but
--experimental_allow_proto3_optional was not set.\n" }
```
From this [issue](https://github.com/apache/datafusion/issues/13853) I
gathered that we could compile `protoc` ourselves for the build, and that's
what I have now for the latest commit. However, looks like a few distributions
(don't ship with or download cmake for
compilation)[https://github.com/apache/datafusion-ballista/actions/runs/20702788219/job/59427713820?pr=1360].
I'm curious, @milenkovicm, have any advice here? Should I just add cmake in
the necessary Dockerfiles?
> Maybe as a follow up we should put a bit more documentation around this
and example(s)
Agreed, happy to file an issue to track it. I'm still kind of curious if
more changes would be desired for user ergonomics. Do we have existing examples
of connecting to a scheduler besides the client? I'd imagine it might be useful
to add some convenience/wrapper methods to create scheduler gRPC clients, such
as combining `create_grpc_client_connection` +
`SchedulerGrpcClient::new(connection)` like in `distributed_query.rs`:
```
info!("Connecting to Ballista scheduler at {scheduler_url}");
// TODO reuse the scheduler to avoid connecting to the Ballista
scheduler again and again
let connection = create_grpc_client_connection(scheduler_url,
&grpc_config)
.await
.map_err(|e| DataFusionError::Execution(format!("{e:?}")))?;
let mut scheduler = SchedulerGrpcClient::new(connection)
.max_encoding_message_size(max_message_size)
.max_decoding_message_size(max_message_size);
```
I don't know much about Ibis, but will take a look to see how it would
integrate with this.
> Also, could we gate substrait with config option, which could be on by
default?
> Users not needing it could disable it at compile time.
Done! Updated the PR description, added support/conditional dependencies
under `substrait`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]