Indeed, I'd expect the `type()` method to not be called in the hot path.

François

On Mon, Aug 19, 2019 at 10:17 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> hi Ben,
>
> On this possibility
>
> - Make ArrayBuilder::type() virtual. This will be much more expensive for
> nested builders and for applications which need to branch on
> ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as
> well.
>
> Could you explain what would be a situation where these methods
> (type() and type_id()) would be on a hot path? Whether virtual or not,
> I would think that writing code like
>
> VisitTypeInline(*builder->type(), func)
>
> would not necessarily be advisable for cell-level granularity in general.
>
> - Wes
>
> On Sun, Aug 18, 2019 at 8:05 PM Sutou Kouhei <k...@clear-code.com> wrote:
> >
> > Hi,
> >
> > Thanks for working of this.
> >
> > This fix is very happy for Arrow GLib.
> >
> > Arrow GLib wants array builder's type ID to convert C++
> > object to GLib object. Here is a JIRA issue for this use case:
> >   https://issues.apache.org/jira/browse/ARROW-5355
> >
> > In Arrow GLib's use case, ArrayBuilder::type_id() is only
> > required. Arrow GLib doesn't need ArrayBuilder::type().
> >
> > Furthermore, Arrow GLib just wants to know the type of
> > ArrayBuilder (not the type of built Array). Arrow GLib
> > doesn't need ArrayBuilder::type() nor
> > ArrayBuilder::type_id() if ArrayBuilder::id() or something
> > (e.g. visitor API for ArrayBuilder) is provided. Arrow GLib
> > can use dynamic_cast and NULL check to detect the real
> > ArrowBuilder's class. But I don't want to use it as much as
> > possible.
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In <CALPsrS2FdQWnZtEs=tnwl23+q+oyery92gcn6dy3gxdn3y8...@mail.gmail.com>
> >   "[DISCUSS] ArrayBuilders with mutable type" on Fri, 16 Aug 2019 16:40:15 
> > -0400,
> >   Ben Kietzman <ben.kietz...@rstudio.com> wrote:
> >
> > > For some array builders, ArrayBuilder::type() will be different from the
> > > type of array produced by ArrayBuilder::Finish(). These are:
> > > - AdaptiveIntBuilder will progress through {int8, int16, int32, int64}
> > > whenever a value is inserted which cannot be stored using the current
> > > integer type.
> > > - DictionaryBuilder will similarly increase the width of its indices if 
> > > its
> > > memo table grows too large.
> > > - {Dense,Sparse}UnionBuilder may append a new child builder
> > > - Any nested builder whose child builders include a builder with mutable
> > > type
> > >
> > > IMHO if ArrayBuilder::type is sporadically inaccurate then it's a user
> > > hostile API and needs to be fixed.
> > >
> > > The current solution solution is for mutable type to be marked by
> > > ArrayBuilder::type() == null. This results in significant loss of metadata
> > > from nested types; for example StructBuilder::FinishInternal currently 
> > > sets
> > > all field names to "" if constructed with null type. Null type is
> > > inconsistently applied; a builder of list(dictionary()) will currently
> > > finish to an invalid array if the dictionary builder widens its indices
> > > before finishing.
> > >
> > > Options:
> > > - Implement array builders such that ArrayBuilder::type() is always the
> > > type to which the builder would Finish. There is a PR for this
> > > https://github.com/apache/arrow/pull/4930 but it introduces performance
> > > regressions for the dictionary builders: 5% if the values are integer, 
> > > 1.8%
> > > if they are strings.
> > > - Provide ArrayBuilder::UpdateType(); type() is not guaranteed to be
> > > accurate unless immediately preceded by UpdateType().
> > > - Remove ArrayBuilder::type() in favor of ArrayBuilder::type_id(), which
> > > will be an immutable property of ArrayBuilders.
> > > - Make ArrayBuilder::type() virtual. This will be much more expensive for
> > > nested builders and for applications which need to branch on
> > > ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as
> > > well.

Reply via email to