Independent of the particulars of the discussion, the C++ project needs to be free to create a C API for itself. If you want to try to block the C++ contributors from doing this we may be barreling toward a governance crisis in the project. I'm stepping back from this discussion for a time now to allow others to catch up on the discussion and to weigh in as needed
On Mon, Jan 20, 2020 at 1:00 PM Jacques Nadeau <jacq...@apache.org> wrote: > > I don't see this as an endogenous concern of the C++ project. I appreciate > your goal with saying so but I think this has broader ramifications around > fragmentation of the project. > > The core challenge that we're dealing with is we introduced foundational > concepts in some implementations that go beyond the spec and then provided > useful features based on them (in this case, the offset concept). Ideally, > those concepts are first introduced at the specification level so there > aren't inconsistent viewpoints of what Arrow is (which I believe is what is > happening here). Having a cross-language specification for in-memory > processing is a new concept so it isn't surprising that we're going to > learn these things along the way. > > Without this, we create a slippery slope of fragmentation between the > specifications and the implementations. I understand that the toothpaste is > out of the tube in this particular case. We can respond in two ways: stop > the slip or continue to slide down the slope. I'm inclined to stop the slip. > > As I said on the GitHub, I'm struggling with how much of this should be > solved in the project. I'm going to pause a bit on responding to reflect > further about this as well to reduce the likelihood that this devolves into > a flame war (which is always a risk with complex issues such as these). > > > > On Mon, Jan 20, 2020 at 9:59 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Jacques, > > > > Taking a step back from the discussion, the original problem statement > > was to enable third party projects to produce the data structure used > > by C++ Array classes in C without depending on the C++ code > > > > That's the ArrayData class here > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232 > > > > It is important for us simplify the programming interface with the C++ > > library, so I think that we should address this as an endogenous > > concern of the C++ project, namely providing a "C API for the C++ > > project". The C API for the C++ library needs to mirror what's in the > > C++ project (i.e. the ArrayData data structure). We should not > > advertise this as being a part of the project specification. > > > > - Wes > > > > On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau <jacq...@apache.org> > > wrote: > > > > > > As I noted on the pull request, I think fundamentally this work is at > > odds > > > with the Arrow specification and being used to introduce a shadow > > > specification. > > > > > > I don't think our intentions about how people should use something really > > > influence how people will actually use or perceive it. They'll just find > > > supported Arrow code and expose things based on it and call it "Arrow > > > compatible". In other words, I don't think people in the outside world > > will > > > be able to perceive the distinction between "Arrow C++ compatible" and > > > "Arrow compatible". > > > > > > On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney <wesmck...@gmail.com> > > wrote: > > > > > > > hi folks, > > > > > > > > I just made a comment in https://github.com/apache/arrow/pull/6026 > > > > that I wanted to surface here on the mailing list. > > > > > > > > It seems that to reach consensus for a C interface that is intended to > > > > be broadly used by multiple programming languages, we may make some > > > > compromises that harm or outright undermine some of the use cases that > > > > motivated the creation of the C interface in the first place. That > > > > does not seem good. I wonder if it would be more productive to reduce > > > > the scope of the project to merely providing a C-header-based data > > > > interface to the C++ project only. That was the original problem > > > > statement and it seems in attempting to make it useful beyond C++ has > > > > made it difficult to reach consensus. > > > > > > > > Thanks > > > > Wes > > > > > > > > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau <jacq...@apache.org> > > wrote: > > > > > > > > > > Thanks for addressing my comments. I'm actively reviewing the > > proposal. > > > > It > > > > > is taking me more time than I would like given the time of the year > > but I > > > > > want to make sure that you know that I'm looking at it and hope to > > > > provide > > > > > additional feedback beyond that which I've provided thus far on the > > PR. > > > > > Will update soon. > > > > > > > > > > Thanks for your patience. > > > > > > > > > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou <solip...@pitrou.net > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > Following Jacques's feedback, I drafted a new version of the C data > > > > > > interface spec. > > > > > > > > > > > > The spec PR is here: > > > > > > https://github.com/apache/arrow/pull/6040 > > > > > > Direct link to the RST file: > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst > > > > > > > > > > > > There is also a C++ implementation, together with a Python <-> R > > > > > > bridge demonstrating the functionality: > > > > > > https://github.com/apache/arrow/pull/6026 > > > > > > > > > > > > The main change from the previous spec is that there are now two C > > > > > > structures; one for the type or schema information, one for the > > > > > > array or record batch data. This allows exchanging both kinds of > > > > > > information independently (and so, potentially, to exchange schema > > once > > > > > > and then multiple arrays or record batches). > > > > > > > > > > > > Comments and questions welcome. > > > > > > > > > > > > Regards > > > > > > > > > > > > Antoine. > > > > > > > > > > > > > > > > > > > > > > > >