Hi Li,

If it's practical for you to create an index and a dictionary array from
your source you could use those to create a DictionaryArray as seen here
[1].
Another option that might fit your situation is to use a dictionary builder
[2].

Best,
Rok

[1]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_dict_test.cc#L1128-L1160
[2]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_dict_test.cc#L218-L249

On Thu, Nov 3, 2022 at 3:50 PM Li Jin <ice.xell...@gmail.com> wrote:

> Hello,
>
> I am working on converting some internal data sources to Arrow data. One
> particularly sets of data we have contains many string columns that can be
> dictionary-encoded (basically string enums)
>
> The current internal C++ API I am using gives me an iterator of "row"
> objects, for each string column, the row object exposes a method
> "getStringField(index)" that return me a "string_view" and I want to
> construct a dictionary-encoded Arrow string column from it.
>
> My question is:
> (1) Is there a way to do this using the Arrow C++ API?
> (2) Does the internal C++ API need to return something other than a
> "string_view" to support this? Internally the string column is already
> dictionary-encoded (although not in Arrow format) and it might already know
> the dictionary and the encoded (int) value for each string field, but it
> doesn't expose it now.
>
> Thanks,
> Li
>

Reply via email to