Hi folks,

The IPC file format notes that it is "invalid to have more than one
non-delta dictionary batch per dictionary ID (i.e. dictionary replacement
is not supported)" but there is the "isDelta" flag that indicates
replacement dictionaries are supported. However it isn't clear that this
only applies to streams.

I've tried finding context around this [2] [3] [4] but I think there was
another use case where I want to be able to stream data in blocks to a file
system but then on read, process each data block and associated dictionary
in parallel. Dictionary replacement helps with the parallel read case in
that each data block can load associated dictionary blocks without having
to read multiple dictionaries up to the associated data block.

Given the delta flag, is there any reason not to support replacement
dictionaries in the file format?

[1]
https://github.com/apache/arrow/pull/5585/files#diff-8b68cf6859e881f2357f5df64bb073135d7ff6eeb51f116418660b3856564c60R1027-R1030
[2] https://github.com/apache/arrow/issues/22842
[3] https://lists.apache.org/thread/2h3o1kbk0t9l16wxp51wdtnz16yqg03d
[4] https://lists.apache.org/thread/31910z7g64np3dmblokbh1llmxgt74y7

Reply via email to