On Wed, Aug 28, 2019 at 12:32 PM Fan Liya <liya.fa...@gmail.com> wrote:

> Dear all,
>
> In the discussion of this PR (https://github.com/apache/arrow/pull/5073),
> we are faced with a problem:
>
> Normally, in a VariableWidthVector (e.g. VarCharVector), a null value is
> supposed to take no space in the data buffer. In particular, for a null
> value, we have
>
> start index == end index
>
> Where start index and end index are the start/end positions of the value in
> the data buffer. This problem is also related to the ListVector.
>
> However, it seems that for some scenarios, a null value can take non-empty
> space (please see this comment
> https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491).
>
> Since this is an important issue, we should make it clear in the
> specification. Otherwise, some unexpected problems may occur in client
> code.
>
> It seems we are faced with 3 options:
>
> 1. a null value always takes no space.
> 2. a null value can take non-empty space, and the content of the non-empty
> space is always 0.
> 3. a null value can take non-empty space, and the content of the non-empty
> space is undefined.
>
> Option 1 makes the data buffer of a VariableWidthVector a continuous region
> (not interleaved by undefined regions). So optimization can be applied.

However, it may lead to memory copy/move (as indicated in the above comment
> https://github.com/apache/arrow/pull/5073#pullrequestreview-274215491)
>
> Option 3 can address the above problem of memory copy/move. However, it
> splits memory into un-continuous regions, so optimizations cannot be
> performed. In addition, it may cause unexpected problems in client code.
>

We could still apply the optimisation for the contiguous "valid regions".
eg. if the entire vector is valid (called array in cpp), then compare data
buffers. If there are only two null entries in the vector, compare the
three consecutive regions in the data buffer, ..



>
> Option 2 seems like a trade-off between the two. However, it is not
> suitable for ListVector.
>
> Please give your valuable feedback.
>
> Best,
> Liya Fan
>


-- 
Thanks and regards,
Ravindra.

Reply via email to