Cheers. Made and self-assigned
https://issues.apache.org/jira/browse/ARROW-11869.

On Fri, Mar 5, 2021 at 1:44 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Joris,
> I do believe this is missing.  I believe we worked around this for testing
> by directly writing dictionary batches to the stream [1].
>
> Thanks,
> Micah
>
> [1]
>
> https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java#L614
>
> On Thu, Mar 4, 2021 at 4:06 AM Joris Peeters <joris.mg.peet...@gmail.com>
> wrote:
>
> > Hello,
> >
> > For my use case I'm sending an Arrow IPC-stream from a server to a
> client,
> > with some columns being dictionary-encoded. Dictionary-encoding happens
> on
> > the fly, though, so the full dictionary isn't known yet at the beginning
> of
> > the stream, but rather is computed for every batch, and DictionaryBatches
> > are to be emitted prior to every RecordBatch.
> >
> > However, unless I am mistaken, this is not currently supported in the
> > ArrowStreamWriter. The dictionary provider is passed in at construction
> > time, the dicts are emitted once, and there is no hook for re-emitting
> > these.
> >
> > I've locally hacked around this by basically copy-pasting
> ArrowStreamWriter
> > and extending it with a   `public void writeBatch(DictionaryProvider
> > provider)` method, that re-emits the dictionaries prior to emitting the
> > record batches.
> >
> > However, I'd of course much prefer if the provided ArrowStreamWriter
> > supported this. If people agree that it's missing (i.e. maybe I'm
> > overlooking something obvious) and that it would be useful to have, then
> > I'm happy to contribute it myself (not necessarily by using the
> > aforementioned `writeBatch(provider)` approach, but seems reasonable).
> >
> > Cheers,
> > -J
> >
>

Reply via email to