As previously discussed [1], I took on the effort the effort of trying to
come up with a demo for using bazel as a build system for C++/Python.  The
results [2] are a little bit of a mixed bag.

I was able to construct an example that runs on my Mac that can compile and
run most of the tests in "src/arrow" as well as the IPC read/write test,
and a python test (test_array.py).  I also have C++ Flight compiling.  A
demonstration for how different library locations can be selected is also
available [3]. This would need a lot more work to come to the current
functionality that CMake has.

After going through this exercise I put together a list of pros and cons
below.

I would like to hear from other devs:
1.  Their opinions on setting this up as an alternative system (I'm willing
to invest some more time in it).
2. What people think the minimum bar for merging a PR like this should be?

Pros:
1.  Being able to run "bazel test python/..." and having compilation of all
python dependencies just work is a nice experience.
2.  Because of the granular compilation units, it can improve developer
velocity. Unit tests can depend only on the sub-components they are meant
to test. They don't need to compile and relink arrow.so.
3.  The built-in documentation it provides about visibility and
relationships between components is nice (its uncovered some "interesting
dependencies").  I didn't make heavy use of it, but its concept of
"visibility" makes things more explicit about what external consumers
should be depending on, and what inter-project components should depend on
(e.g. explicitly limit the scope of vendored code).
4.  Extensions are essentially python, which might be easier to work with
then CMake

Cons:
1.  Bazel is opinionated on C++ layout.  In particular it requires some
workarounds to deal with circular .h/.cc dependencies.  The two main ways
of doing this are either increasing the size of compilable units [4] to
span all dependencies in the cycle, or creating separate
header/implementation targets, I've used both strategies in the PR.  One
could argue that it would be nice to reduce circular dependencies in
general.
2.  Bazel python support still seems lacking.  To make the test work, I
needed to explicitly include all transitive dependencies of the "pip"
installed packaged by hand.
3.  Bazel in general doesn't seem to have wide adoption so any
customization probably won't have a whole lot of support (I've been told
there are some adapters with CMake that can leverage some of the existing
code).
4.  It is more verbose to configure then CMake (each compilation unit needs
to be spelled out with dependencies).
5.  The "packaging" story of different build artifacts still needs to be
explored.

Thanks,
Micah


[1]
https://lists.apache.org/thread.html/26c2a9e7e35ffc6f6ff68fbbfb38a0a33002b8e7210e8d323566f447@%3Cdev.arrow.apache.org%3E
[2] https://github.com/apache/arrow/pull/5897/files
[3]
https://github.com/apache/arrow/pull/5897/files#diff-85ecc9fdaae4c714198a1c31c7748f2a
[4]
https://github.com/apache/arrow/pull/5897/files#diff-c23198ffa8af9adf6825cb9c6f6e135b

Reply via email to