Hi Joris, I do believe this is missing. I believe we worked around this for testing by directly writing dictionary batches to the stream [1].
Thanks, Micah [1] https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java#L614 On Thu, Mar 4, 2021 at 4:06 AM Joris Peeters <joris.mg.peet...@gmail.com> wrote: > Hello, > > For my use case I'm sending an Arrow IPC-stream from a server to a client, > with some columns being dictionary-encoded. Dictionary-encoding happens on > the fly, though, so the full dictionary isn't known yet at the beginning of > the stream, but rather is computed for every batch, and DictionaryBatches > are to be emitted prior to every RecordBatch. > > However, unless I am mistaken, this is not currently supported in the > ArrowStreamWriter. The dictionary provider is passed in at construction > time, the dicts are emitted once, and there is no hook for re-emitting > these. > > I've locally hacked around this by basically copy-pasting ArrowStreamWriter > and extending it with a `public void writeBatch(DictionaryProvider > provider)` method, that re-emits the dictionaries prior to emitting the > record batches. > > However, I'd of course much prefer if the provided ArrowStreamWriter > supported this. If people agree that it's missing (i.e. maybe I'm > overlooking something obvious) and that it would be useful to have, then > I'm happy to contribute it myself (not necessarily by using the > aforementioned `writeBatch(provider)` approach, but seems reasonable). > > Cheers, > -J >