I'd be interested in helping spec this out, it's especially tricky atm to track down issues when integrating DataFusion into the same binary as other medium/large dependencies.
Recently hit a really specific issue where DataFusion depends on Parquet, which supports various compression algs, including Brotli, and actix-web also depends on a slightly different Rust implementation of Brotli. Both of these Brotli libs package the same underlying C lib separately, resulting in multiply-defined symbols compiling using msvc (and maybe on other platforms? didn't test in CI in the end). Got a quick interim hack [1] in place for my use case which doesn't really use Parquet, so it's not pressing, but would be awesome to sort this properly upstream. I guess the only major tradeoff of having a comprehensive feature setup is that it could make testing slightly harder, in terms of making sure no-one breaks the build for specific feature combinations; this can always be mitigated with more CI though (yay, unlimited Actions minutes for public repos). Also, unrelated, is there a schedule for the sync calls? Will try and carve out some free time for the next one :) [1] https://github.com/reservoirdb/arrow/commit/e63e157927a552ecf1a6f63ec401f0b6157b5468 -----Original Message----- From: Andrew Lamb <al...@influxdata.com> Sent: 14 February 2021 11:14 To: dev <dev@arrow.apache.org> Subject: [Rust] [DataFusion] Topic for next Rust Sync Call I would like to add the following item to the agenda call for the next Rust sync call: Dependencies Background: As the dependency stack gets larger, it will be harder to use DataFusion as an embedded query engine and the compile / dev times will get higher. As we expand the supported functions of DataFusion this problem is likely to get worse. For example https://github.com/apache/arrow/pull/9243#discussion_r575716759 and https://github.com/apache/arrow/pull/9139 Proposal: Add Rust "features" to the datafusion crate and make many of the new dependencies optional (so that we had features like regex and unicode and hash which would only pull in the dependencies / have those functions if the features were enabled.) This approach has worked well for Arrow (which has only chrono and num as required dependencies)