Hi Jorge,
I think it would make sense to add some clarifications to the document per
Wes's comments. Do you want to maybe try to make a PR?

One small edge case to consider is how NaN float values are compared.

-Micah

On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi Wes,
>
> Thanks a lot. I agree. My question is whether we should make it explicit in
> the specification. AFAIK, "if the data represented in the slot is equal"
> depends on the datatype: for variable sized arrays with offsets (e.g.
> strings), the equality of slot i is something along the lines of:
>
> start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T
> end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T
> lhs_value = lhs.buffer[1][start..end]
> # same for rhs
> lhs_value == rhs_value
>
> This logic is also tricky for any type with childs, where we need to
> compare the slot of the child through recursion.
> These things are not really implementation specific, yet they are really
> important when implementations inter-operate.
>
> Best,
> Jorge
>
>
>
>
> On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi Jorge,
> >
> > The intent when authoring the specification was as follows
> >
> > * If two array slots being compared are both null, then they are equal
> > * If one is null and the other is not, they are not equal
> > * If they are both not null, then they are equal if the data
> > represented in the slot is equal (and if dictionary indices reference
> > the same dictionary value, even if the dictionaries are different,
> > then they are equal because the data they represent is the same)
> >
> > - Wes
> >
> > On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão
> > <jorgecarlei...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Recently, I revisited the code for array equality in Rust. While going
> > > through it, I observed some assumptions about how we conclude that two
> > > elements of an arrow array are equal, and when two arrays are equal.
> > >
> > > The notion of equality is also used throughout the document e.g. when
> we
> > > offer examples using "unspecified", we are implicitly arguing that we
> > > should not care about that value when comparing arrays. It is also used
> > > when we use the wording "unique values" in the dictionary-encoded
> arrays.
> > >
> > > The notion of array equality is important when we want to verify
> > > interoperability between languages, where we often need to compare
> arrays
> > > (e.g. after a round-trip), as some implementations may change the data
> of
> > > the "unspecified" slots and e.g. offsets.
> > >
> > > More fundamentally, IMO the specification offers a physical
> > representation
> > > (buffers, childs, offests, etc) of a logical asset (lists, structs,
> int8,
> > > int32), but currently does not say when two logical assets are
> considered
> > > equal.
> > >
> > > Would it make sense to systematize the notion of equality in the
> > > specification, to align the different implementations into when they
> > should
> > > consider two arrays to be equal?
> > >
> > > Best,
> > > Jorge
> >
>

Reply via email to