Hi Li, If it's practical for you to create an index and a dictionary array from your source you could use those to create a DictionaryArray as seen here [1]. Another option that might fit your situation is to use a dictionary builder [2].
Best, Rok [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_dict_test.cc#L1128-L1160 [2] https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/array_dict_test.cc#L218-L249 On Thu, Nov 3, 2022 at 3:50 PM Li Jin <ice.xell...@gmail.com> wrote: > Hello, > > I am working on converting some internal data sources to Arrow data. One > particularly sets of data we have contains many string columns that can be > dictionary-encoded (basically string enums) > > The current internal C++ API I am using gives me an iterator of "row" > objects, for each string column, the row object exposes a method > "getStringField(index)" that return me a "string_view" and I want to > construct a dictionary-encoded Arrow string column from it. > > My question is: > (1) Is there a way to do this using the Arrow C++ API? > (2) Does the internal C++ API need to return something other than a > "string_view" to support this? Internally the string column is already > dictionary-encoded (although not in Arrow format) and it might already know > the dictionary and the encoded (int) value for each string field, but it > doesn't expose it now. > > Thanks, > Li >