hi Jorge,

The intent when authoring the specification was as follows

* If two array slots being compared are both null, then they are equal
* If one is null and the other is not, they are not equal
* If they are both not null, then they are equal if the data
represented in the slot is equal (and if dictionary indices reference
the same dictionary value, even if the dictionaries are different,
then they are equal because the data they represent is the same)

- Wes

On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão
<jorgecarlei...@gmail.com> wrote:
>
> Hi,
>
> Recently, I revisited the code for array equality in Rust. While going
> through it, I observed some assumptions about how we conclude that two
> elements of an arrow array are equal, and when two arrays are equal.
>
> The notion of equality is also used throughout the document e.g. when we
> offer examples using "unspecified", we are implicitly arguing that we
> should not care about that value when comparing arrays. It is also used
> when we use the wording "unique values" in the dictionary-encoded arrays.
>
> The notion of array equality is important when we want to verify
> interoperability between languages, where we often need to compare arrays
> (e.g. after a round-trip), as some implementations may change the data of
> the "unspecified" slots and e.g. offsets.
>
> More fundamentally, IMO the specification offers a physical representation
> (buffers, childs, offests, etc) of a logical asset (lists, structs, int8,
> int32), but currently does not say when two logical assets are considered
> equal.
>
> Would it make sense to systematize the notion of equality in the
> specification, to align the different implementations into when they should
> consider two arrays to be equal?
>
> Best,
> Jorge

Reply via email to