In the abstract (since I haven't written any code), let me see if I can make an argument for considering "nullable int" and "int" to both be worthwhile "primitive" types, as opposed to "Nullable<int>" being a constructed type over the primitive type "int", in the C++ arena.
Let's assume Arrow's use case is to manage arrays of numbers, i.e. Array<number_t>. We have two choices for nullability: Array<nullable_int_t> Array<Nullable<int_t>> I think what we want from the data structure is for an array of nullable ints (ignoring special cases like an array of nullable ints where none of the ints, at runtime, happens to be null) to be laid out in memory as std::pair<std::vector<bool>, std::vector<int_t>>. We probably don't want std::vector<std::pair<boolean, int_t>, because cpus are a Thing (let me know if this shorthand doesn't make sense, I can elaborate). If we define separate nullable primitive types and non-nullable primitive types, then we can template specialize for the nullable types and factor out the null bits into their own array. If we require than Nullable be its own template, it's also possible to specialize the Array template for Array<Nullable<T>> but I think the template code becomes a lot more complex. I'd be happy to be proven wrong here, but for now I'll assume that. I don't think we have many other singleton typeclasses like Nullable that we want to apply to single primitive types. In fact, I can't think of any others that would be useful. Given that, we're only multiplying the number of primitive types by two, we're not at risk of exploding the number of primitive types, and we're probably greatly simplifying the template implementations of container templates like Array. If you can think of other useful primitive templates, or if you can demonstrate that Array<Nullable<T>> is simple in all languages, I would change my position on this. On Fri, Feb 26, 2016 at 11:06 AM Wes McKinney <w...@cloudera.com> wrote: > To paraphrase a great poet, "mo' templates, 'mo problems". I agree > that some theoretical benefits may be reaped in exchange for > significantly higher code complexity / likely lower productivity for > both developers and users of the library. We would need to see > pragmatic argument why the whole library should be made much more > complex in exchange compile-time benefits in a small portion of the > code. > > Probably the biggest issue I see with this is the combinatorial > explosion of generated code. For example, let's consider the array > function Take(T, Integer) -> T (for example, numpy.ndarray.take). If > you introduce nullable types, rather than generating one variant for > each type T and integer type, you need 4: > > Take(T, Int) -> T > Take(T, NullableInt) -> NullableT (indices have nulls) or T (indices > have no nulls) > Take(NullableT, Int) -> NullableT > Take(NullableT, NullableInt) -> NullableT > > If you add to this the fact that any nullable index type may not have > any nulls, you actually have more than 4 branches of logic to > consider. > > In Java, this would be less of a concern, because all functions are > effectively virtual (dynamic dispatch overhead is something the JIT > largely takes care of), but in C++ using virtual functions to make the > arrays more "dynamic" (i.e. using NullableT or T in the same code > path) would not yield acceptable performance. > > thanks, > Wes > > On Fri, Feb 26, 2016 at 6:01 AM, Daniel Robinson > <danrobinson...@gmail.com> wrote: > > In C++ at least, I think making Nullable<T> a template (see here: > > https://github.com/danrobinson/arrow-demo/blob/master/types.h) would > make > > it easier to both define arbitrary classes and to write parsers that take > > advantage of template specialization. > > > > Re: 2 vs 3 code paths: handling the null-skipping case can be a single > line > > of code at the start of the function (which could be reduced to a macro): > > > > if (arr.null_count() == 0) return ALGORITHM_NAME(arr.child_array()); > > > > And it seems like good practice anyway to separate the null_count=0 code > > into a separate function. > > > -- -- Cheers, Leif