I'll plan on starting a vote in the next day or two if there are no further
objections/comments.

On Sun, Oct 13, 2019 at 11:06 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I think the only point asked on the PR that I think is worth discussing is
> assumptions about dictionaries at the beginning of streams.
>
> There are two options:
> 1.  Based on the current wording, it does not seem that all dictionaries
> need to be at the beginning of the stream if they aren't made use of in the
> first record batch (i.e. a dictionary encoded column is all null in the
> first record batch).
> 2.  We require a dictionary batch for each dictionary at the beginning of
> the stream (and require implementations to send an empty batch if they
> don't have the dictionary available).
>
> The current proposal in the PR is option #1.
>
> Thanks,
> Micah
>
> On Sat, Oct 5, 2019 at 4:01 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> I've opened a pull request [1] to clarify some recent conversations about
>> semantics/edge cases for dictionary encoding [2][3] around interleaved
>> batches and when isDelta=False.
>>
>> Specifically, it proposes isDelta=False indicates dictionary
>> replacement.  For the file format, only one isDelta=False batch is allowed
>> per file and isDelta=true batches are applied in the order supplied file
>> footer.
>>
>> In addition, I've added a new enum to DictionaryEncoding to preserve
>> future compatibility in case we want to expand dictionary encoding to be an
>> explicit mapping from "ID" to "VALUE" as discussed in [4].
>>
>> Once people have had a change to review and come to a consensus. I will
>> call a formal vote to approve the change commit the change.
>>
>> Thanks,
>> Micah
>>
>> [1] https://github.com/apache/arrow/pull/5585
>> [2]
>> https://lists.apache.org/thread.html/9734b71bc12aca16eb997388e95105bff412fdaefa4e19422f477389@%3Cdev.arrow.apache.org%3E
>> [3]
>> https://lists.apache.org/thread.html/5c3c9346101df8d758e24664638e8ada0211d310ab756a89cde3786a@%3Cdev.arrow.apache.org%3E
>> [4]
>> https://lists.apache.org/thread.html/15a4810589b2eb772bce5b2372970d9d93badbd28999a1bbe2af418a@%3Cdev.arrow.apache.org%3E
>>
>>

Reply via email to