I'd be interested in helping spec this out, it's especially tricky atm to track 
down issues when integrating DataFusion into the same binary as other 
medium/large dependencies.

Recently hit a really specific issue where DataFusion depends on Parquet, which 
supports various compression algs, including Brotli, and actix-web also depends 
on a slightly different Rust implementation of Brotli. Both of these Brotli 
libs package the same underlying C lib separately, resulting in 
multiply-defined symbols compiling using msvc (and maybe on other platforms? 
didn't test in CI in the end).

Got a quick interim hack [1] in place for my use case which doesn't really use 
Parquet, so it's not pressing, but would be awesome to sort this properly 
upstream.

I guess the only major tradeoff of having a comprehensive feature setup is that 
it could make testing slightly harder, in terms of making sure no-one breaks 
the build for specific feature combinations; this can always be mitigated with 
more CI though (yay, unlimited Actions minutes for public repos).

Also, unrelated, is there a schedule for the sync calls? Will try and carve out 
some free time for the next one :)

[1] 
https://github.com/reservoirdb/arrow/commit/e63e157927a552ecf1a6f63ec401f0b6157b5468

-----Original Message-----
From: Andrew Lamb <al...@influxdata.com> 
Sent: 14 February 2021 11:14
To: dev <dev@arrow.apache.org>
Subject: [Rust] [DataFusion] Topic for next Rust Sync Call

I would like to add the following item to the agenda call for the next Rust 
sync call:

Dependencies

Background: As the dependency stack gets larger, it will be harder to use 
DataFusion as an embedded query engine and the compile / dev times will get 
higher.

As we expand the supported functions of DataFusion this problem is likely to 
get worse. For example
https://github.com/apache/arrow/pull/9243#discussion_r575716759 and
https://github.com/apache/arrow/pull/9139

Proposal: Add Rust "features" to the datafusion crate and make many of the new 
dependencies optional (so that we had features like regex and unicode and hash 
which would only pull in the dependencies / have those functions if the 
features were enabled.) This approach has worked well for Arrow (which has only 
chrono and num as required dependencies)

Reply via email to