By "much more expensive", I mean that currently ArrayBuilder::type() costs
exactly as much as incrementing the reference count of ArrayBuilder::type_.
If it were virtual then a builder of struct_({field(child_name,
child_type)...}) incurs a virtual call for the struct builder +
sizeof...(child_type) virtual calls for the child builders + construction
of a vector to hold the child fields + more if any of the children are also
nested.

For an example of where this might be unacceptable, for ARROW-5711 I am
removing custom builders in favor of vanilla array builders and using
ArrayBuilder::type to track the JSON kind of that field. It sounds like
type() should be made virtual and cases like this should be handled by
tracking type separately. I'll update the PR to this approach.

On Mon, Aug 19, 2019 at 10:32 AM Antoine Pitrou <anto...@python.org> wrote:

>
> If it becomes much more expensive then calling it `type()` (rather than
> e.g. GetCurrentType()) is a bit misleading.
>
> Regards
>
> Antoine.
>
>
> Le 19/08/2019 à 16:27, Wes McKinney a écrit :
> > On Mon, Aug 19, 2019 at 9:16 AM Ben Kietzman <ben.kietz...@rstudio.com>
> wrote:
> >>
> >> Thanks for responding.
> >>
> >> I can certainly add VisitBuilder/VisitBuilderInline for ArrayBuilder,
> but
> >> there's a slight difficulty: some types don't have a single concrete
> >> builder class. For example, the builders of dictionary arrays are
> templated
> >> on the encoded type and also on an index builder, which is either
> adaptive
> >> or fixed at int32. I can make this transparent in the definition of
> >> VisitBuilder so that one can add Visit(StringDictionaryBuilder*) and
> >> Visit(DictionaryBuilder<Int64Type>*). Something similar is already
> required
> >> for IntervalType, which may be MonthIntervalType or DayTimeIntervalType
> >> depending on the value of IntervalType::interval_type().
> >>
> >> Another difficulty is the adaptive builders. Whatever they return for
> >> ArrayBuilder::type()/type_id()/..., it won't help to discriminate them
> from
> >> the non adaptive builders. Is there any application which uses the
> adaptive
> >> builders except the dictionary builders? If possible, I think it would
> be
> >> simplest to make their inheritance of ArrayBuilder protected.
> >>
> >
> > My gut feeling is that ArrayBuilder::type virtual is the simplest
> > thing and causes the complexity relating to changing types be
> > localized to classes like AdaptiveIntBuilder. I don't have a sense for
> > what use cases changing from an inline member (type_) to a virtual
> > function would cause a meaningful performance issue. You could even
> > play tricks by having something like
> >
> > sp<DataType> type() const {
> >   if (!is_fixed_type_) {
> >     // type_ is null, GetCurrentType is a protected virtual
> >     return GetCurrentType();
> >   } else {
> >     return type_;
> >   }
> > }
> >
> > You said "much more expensive for nested builders" -- where and how
> exactly?
> >
> >> Ben
>

Reply via email to