Wes McKinney created ARROW-6042:
-----------------------------------

             Summary: [C++] Implement alternative DictionaryBuilder that always 
yields int32 indices
                 Key: ARROW-6042
                 URL: https://issues.apache.org/jira/browse/ARROW-6042
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Wes McKinney
             Fix For: 1.0.0


One problem with the current {{DictionaryBuilder<T>}} in some applications is 
that, if it is used to produce a series of arrays to form a ChunkedArray, it 
may yield constituent chunks having different index widths. For example:

{code}
chunk 0: int8 indices
chunk 1: int16 indices
chunk 2: int16 indices
chunk 3: int32 indices
chunk 4: int32 indices
chunk 5: int32 indices
chunk 6: int32 indices
{code}

Obviously this is problematic for these applications. I'm running into this 
issue in the context of ARROW-3772 where we are looking to decode Parquet data 
directly to {{DictionaryArray}} without stepping through an intermediate dense 
decoded stage. 

I'm not sure what to call the class, whether {{DictionaryInt32Builder}} or 
something similar, but this would be the same API more or less as 
{{DictionaryBuilder}} but instead use {{Int32Builder}} for the indices rather 
than {{AdaptiveIntBuilder}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to