Indeed, I'd expect the `type()` method to not be called in the hot path. François
On Mon, Aug 19, 2019 at 10:17 AM Wes McKinney <wesmck...@gmail.com> wrote: > > hi Ben, > > On this possibility > > - Make ArrayBuilder::type() virtual. This will be much more expensive for > nested builders and for applications which need to branch on > ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as > well. > > Could you explain what would be a situation where these methods > (type() and type_id()) would be on a hot path? Whether virtual or not, > I would think that writing code like > > VisitTypeInline(*builder->type(), func) > > would not necessarily be advisable for cell-level granularity in general. > > - Wes > > On Sun, Aug 18, 2019 at 8:05 PM Sutou Kouhei <k...@clear-code.com> wrote: > > > > Hi, > > > > Thanks for working of this. > > > > This fix is very happy for Arrow GLib. > > > > Arrow GLib wants array builder's type ID to convert C++ > > object to GLib object. Here is a JIRA issue for this use case: > > https://issues.apache.org/jira/browse/ARROW-5355 > > > > In Arrow GLib's use case, ArrayBuilder::type_id() is only > > required. Arrow GLib doesn't need ArrayBuilder::type(). > > > > Furthermore, Arrow GLib just wants to know the type of > > ArrayBuilder (not the type of built Array). Arrow GLib > > doesn't need ArrayBuilder::type() nor > > ArrayBuilder::type_id() if ArrayBuilder::id() or something > > (e.g. visitor API for ArrayBuilder) is provided. Arrow GLib > > can use dynamic_cast and NULL check to detect the real > > ArrowBuilder's class. But I don't want to use it as much as > > possible. > > > > > > Thanks, > > -- > > kou > > > > In <CALPsrS2FdQWnZtEs=tnwl23+q+oyery92gcn6dy3gxdn3y8...@mail.gmail.com> > > "[DISCUSS] ArrayBuilders with mutable type" on Fri, 16 Aug 2019 16:40:15 > > -0400, > > Ben Kietzman <ben.kietz...@rstudio.com> wrote: > > > > > For some array builders, ArrayBuilder::type() will be different from the > > > type of array produced by ArrayBuilder::Finish(). These are: > > > - AdaptiveIntBuilder will progress through {int8, int16, int32, int64} > > > whenever a value is inserted which cannot be stored using the current > > > integer type. > > > - DictionaryBuilder will similarly increase the width of its indices if > > > its > > > memo table grows too large. > > > - {Dense,Sparse}UnionBuilder may append a new child builder > > > - Any nested builder whose child builders include a builder with mutable > > > type > > > > > > IMHO if ArrayBuilder::type is sporadically inaccurate then it's a user > > > hostile API and needs to be fixed. > > > > > > The current solution solution is for mutable type to be marked by > > > ArrayBuilder::type() == null. This results in significant loss of metadata > > > from nested types; for example StructBuilder::FinishInternal currently > > > sets > > > all field names to "" if constructed with null type. Null type is > > > inconsistently applied; a builder of list(dictionary()) will currently > > > finish to an invalid array if the dictionary builder widens its indices > > > before finishing. > > > > > > Options: > > > - Implement array builders such that ArrayBuilder::type() is always the > > > type to which the builder would Finish. There is a PR for this > > > https://github.com/apache/arrow/pull/4930 but it introduces performance > > > regressions for the dictionary builders: 5% if the values are integer, > > > 1.8% > > > if they are strings. > > > - Provide ArrayBuilder::UpdateType(); type() is not guaranteed to be > > > accurate unless immediately preceded by UpdateType(). > > > - Remove ArrayBuilder::type() in favor of ArrayBuilder::type_id(), which > > > will be an immutable property of ArrayBuilders. > > > - Make ArrayBuilder::type() virtual. This will be much more expensive for > > > nested builders and for applications which need to branch on > > > ArrayBuilder::type()->id() ArrayBuilder::type_id() should be provided as > > > well.