Hi Jorge, I think it would make sense to add some clarifications to the document per Wes's comments. Do you want to maybe try to make a PR?
One small edge case to consider is how NaN float values are compared. -Micah On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi Wes, > > Thanks a lot. I agree. My question is whether we should make it explicit in > the specification. AFAIK, "if the data represented in the slot is equal" > depends on the datatype: for variable sized arrays with offsets (e.g. > strings), the equality of slot i is something along the lines of: > > start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T > end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T > lhs_value = lhs.buffer[1][start..end] > # same for rhs > lhs_value == rhs_value > > This logic is also tricky for any type with childs, where we need to > compare the slot of the child through recursion. > These things are not really implementation specific, yet they are really > important when implementations inter-operate. > > Best, > Jorge > > > > > On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Jorge, > > > > The intent when authoring the specification was as follows > > > > * If two array slots being compared are both null, then they are equal > > * If one is null and the other is not, they are not equal > > * If they are both not null, then they are equal if the data > > represented in the slot is equal (and if dictionary indices reference > > the same dictionary value, even if the dictionaries are different, > > then they are equal because the data they represent is the same) > > > > - Wes > > > > On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão > > <jorgecarlei...@gmail.com> wrote: > > > > > > Hi, > > > > > > Recently, I revisited the code for array equality in Rust. While going > > > through it, I observed some assumptions about how we conclude that two > > > elements of an arrow array are equal, and when two arrays are equal. > > > > > > The notion of equality is also used throughout the document e.g. when > we > > > offer examples using "unspecified", we are implicitly arguing that we > > > should not care about that value when comparing arrays. It is also used > > > when we use the wording "unique values" in the dictionary-encoded > arrays. > > > > > > The notion of array equality is important when we want to verify > > > interoperability between languages, where we often need to compare > arrays > > > (e.g. after a round-trip), as some implementations may change the data > of > > > the "unspecified" slots and e.g. offsets. > > > > > > More fundamentally, IMO the specification offers a physical > > representation > > > (buffers, childs, offests, etc) of a logical asset (lists, structs, > int8, > > > int32), but currently does not say when two logical assets are > considered > > > equal. > > > > > > Would it make sense to systematize the notion of equality in the > > > specification, to align the different implementations into when they > > should > > > consider two arrays to be equal? > > > > > > Best, > > > Jorge > > >