Wes McKinney created ARROW-6042:
-----------------------------------
Summary: [C++] Implement alternative DictionaryBuilder that always
yields int32 indices
Key: ARROW-6042
URL: https://issues.apache.org/jira/browse/ARROW-6042
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Wes McKinney
Fix For: 1.0.0
One problem with the current {{DictionaryBuilder<T>}} in some applications is
that, if it is used to produce a series of arrays to form a ChunkedArray, it
may yield constituent chunks having different index widths. For example:
{code}
chunk 0: int8 indices
chunk 1: int16 indices
chunk 2: int16 indices
chunk 3: int32 indices
chunk 4: int32 indices
chunk 5: int32 indices
chunk 6: int32 indices
{code}
Obviously this is problematic for these applications. I'm running into this
issue in the context of ARROW-3772 where we are looking to decode Parquet data
directly to {{DictionaryArray}} without stepping through an intermediate dense
decoded stage.
I'm not sure what to call the class, whether {{DictionaryInt32Builder}} or
something similar, but this would be the same API more or less as
{{DictionaryBuilder}} but instead use {{Int32Builder}} for the indices rather
than {{AdaptiveIntBuilder}}.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)