Personally, I am not really in favor of ABI stability in the short term for a few reasons
* We don't have enough maintainers as is to keep up with the development flow in the project * It will may harm forward progress in the project's design. Because the development team is so small and there are so few maintainers, there has not been a great deal of feedback on the general factoring of the C++ code. When the size of the development team grows, it would be valuable to be able to revisit design decisions based on feedback of new contributors yet to join the project Basically, many ABI decisions have been made hurriedly and I think we need the flexibility to fix our mistakes while the project is growing. I think it would be more valuable to develop shared / reusable build infrastructure to better accommodate an evolving ABI so that rebuilding packages is not too onerous for downstream dependencies. In large companies like Google that maintain monorepos, this problem is solved by requiring all call sites associated with an ABI to be fixed all at once. We probably won't be able to create a monorepo for all projects that use Arrow, but we could make Turbodbc package rebuilds easier, for example In summary, until the Arrow developer group grows significantly larger, I think we should expect the users of these libraries to "live at HEAD". I do think we should make ABI changes transparent and well-documented so the pain is minimized. For the moment, we still have a lot of development work to do for more people to "care" about Apache Arrow and invest in its success long term. - Wes On Thu, Apr 19, 2018 at 1:38 PM, Antoine Pitrou <anto...@python.org> wrote: > > Hi Uwe, > > Le 19/04/2018 à 18:42, Uwe L. Korn a écrit : >>> 1) are we ok with paying the cost of pimpls? (mostly the indirection >>> cost I guess, and the fact that we can't have inline methods/accessors >>> anymore) >> >> I'm not sure about how much of the cost we're ready to pay. There is a >> certain element to keeping a stable ABI (this is done fantastically by the >> NumPy people), you can do patch releases without consumers worrying if they >> need to rebuild their binaries. >> >> The indirection on paths that call expensive functions is certainly no >> problem, i.e. if you have a table and select a column, this is an operation >> you don't do often, thus I think the overhead is acceptable. On the other >> hand, accessing the null_count or the length of an array is definitely an >> operation that is performed quite often. These should be as fast as possible. >> >> I cannot give you a certain answer, once I have the relevant time, I'll try >> to implement and profile some of the possible approaches. >> >>> 2) how do we do for things like ArrayData, which seems publicly exposed >>> by design? >> >> ArrayData is marked as internal and thus I would feel ok to break its ABI >> between non-major releases. If people really depend on its usage, then we >> should think of a clear way to make it public / non-internal. > > Perhaps we need a three-tiered approach? > > 1) a public and stable namespace ("arrow") with the goal to reach ABI > stability post-1.0; > > 2) a public but still moving namespace ("arrow::unstable"?) where we > generally try not to remove existing functionality and to honor API > compatibility, but do not guarantee any sort of ABI stability; > (this could have ArrayData, PrimitiveArray...) > > 3) an internal-use namespace ("arrow::internal"), which third-party > projects can use at their own risk. > (this should get all our internal helpers, including almost all CPython > helpers) > > Regards > > Antoine.