I've updated the document with an extra section to include code snippets for alternative designs to address concerns raised here on the mailing list.
MSVC 19 2017 supports covariant return types, and the code presented above successfully compiled on gcc.godbolt.org using that compiler. I would agree that the covariant_{shared, derived}_ptr classes themselves are a bit complicated, and I wasn't particularly happy with the fact that there's an extra pointer floating around. I've presented two alternatives in the document that would eliminate the need for those classes. The first solution makes the Array::type() function pure virtual and has sub-classes implement without covariant return types. Instead, the sub-classes provide a function with a different name that returns the concrete data type sub-class (e.g. - ListArray::list_type()). The second solution uses C++ name hiding. In that design, the Array::type() function is not virtual and calls through to a pure virtual function called get_type(). The subclass would implement get_type() and simply return a shared pointer to a DataType object, but would also name hide Array::type() with its own implementation that returns a shared pointer the sub-class data type. While the name hiding keeps the API cleaner (i.e. - you always call type()), I find that it's one of the less used and more obscure features of C++ that tends to confuse more novice to intermediate C++ programmers. The design does not suggest removing the ArrayData class, but instead removing the type_ member. It sounds like for the ArrayData class you are trying to eschew the C++ type hierarchy and move to a more C oriented design with type punning. I think the compromise here would be to have ArrayData store the type id but not the shared pointer to the DataType. You can take that approach one step further, and make the ArrayData a C struct with its own API that would successfully compile with a C compiler. You could then add a C++ wrapper that deals with that class and is used by the various C++ classes in the Array/DataType type hierarchy. The benefit of taking that approach is that you can easily inter-operate between C++ and C. In other words, a C only arrow API would be using the richer C++ API underneath, but the user of the API would only be able to obtain or pass ArrayData C structs from/to the API. This proposal was in no way trying to address the static / compile-time problem. It was specifically targeted at making it easier to obtain concrete data types in the dynamic schema setting when dealing with arbitrary nesting of types. I do like the idea of a compile-time hierarchy (e.g. - StaticListArray<StaticStructArray<StaticInt32Array, StaticListArray<StaticInt32Array> >), but that will require a lot of template meta-programming. If the covariant_{shared,derived}_ptr classes above seem too complicated, I'm pretty sure that a compile-time hierarchy will seem immensely more complicated. Bottom line: I would want to see a concrete use case before implementing a compile-time type hierarchy with templates. On Sat, May 19, 2018 at 9:06 AM, Wes McKinney <wesmck...@gmail.com> wrote: > hi Josh, > > Thank you for putting together this document. These changes are > interesting and worth giving some consideration. Some initial > reactions / questions to get a discussion going: > > * How is MSVC support for covariant return types? It doesn't look like > it's supported, which (I'm sorry to say) would be a deal breaker > https://msdn.microsoft.com/en-us/library/8z9feath.aspx > * I don't think creating a situation where arrow::DataType objects > have to be copied is a good idea. Schemas and types will be reused, > particular in a streaming data setting > * Adding a covariant shared_ptr seems very complicated; it seems the > primary benefit will be in code where we would currently have to > introduce a dynamic visitor pattern. Could the same benefit be reaped > by adding a _new_ virtual base method that returns a pointer or > const-ref to the concrete type? This is a less invasive change > * I think it is beneficial to have a simple internal data structure > (i.e. ArrayData) that describes any Arrow array object > > From reading this document, I am wondering if it would not be worth > thinking about creating a from-scratch set of array and type data > structures intended for use in a static / compile-time setting. Since > Arrow features "dynamic" schemas (in the sense that Avro is a more > dynamic version of Protocol Buffers or Thrift), we inevitably will be > dealing with the problem of dynamic dispatch from type information > known only at runtime. I agree that it would be useful to eliminate > usages of dynamic visitor pattern in the codebase where possible. > > - Wes > > On Sat, May 19, 2018 at 12:48 AM, joshuasto...@gmail.com > <joshuasto...@gmail.com> wrote: > > I've put together a proposal for using covariant return types in > Array::type() in the C++ library. I wanted to get some feedback before > putting together a PR in case it's too controversial or would require to > much re-factoring of the code: > > > > https://docs.google.com/document/d/14mLO9uNIcrb-yTj_ > byBxwJeYbGy88JS9fQ6aOa46XRc/edit?usp=sharing > > > > It would be nice to use Google Doc's comment feature, but I would > imagine it may be safer to memorialize the discussion here on the mailing > list. >