Hi Wes, Could you clarify? The logical data type you mean arrow's logical data type? The semantics of the logical data type are the only ones that could IMO justify a clarification, in particular, given a data type, how do we agree that slot i from array "a" and slot j from array "b" are equal.
Best, Jorge On Fri, Nov 13, 2020 at 3:27 PM Wes McKinney <wesmck...@gmail.com> wrote: > On Fri, Nov 13, 2020 at 1:19 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > > Hi Jorge, > > I think it would make sense to add some clarifications to the document > per > > Wes's comments. Do you want to maybe try to make a PR? > > > > One small edge case to consider is how NaN float values are compared. > > I think at the specification level, it should only be bit/byte-level > binary equality without respect to the semantics of the logical data > type. > > > -Micah > > > > On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão < > > jorgecarlei...@gmail.com> wrote: > > > > > Hi Wes, > > > > > > Thanks a lot. I agree. My question is whether we should make it > explicit in > > > the specification. AFAIK, "if the data represented in the slot is > equal" > > > depends on the datatype: for variable sized arrays with offsets (e.g. > > > strings), the equality of slot i is something along the lines of: > > > > > > start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T > > > end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T > > > lhs_value = lhs.buffer[1][start..end] > > > # same for rhs > > > lhs_value == rhs_value > > > > > > This logic is also tricky for any type with childs, where we need to > > > compare the slot of the child through recursion. > > > These things are not really implementation specific, yet they are > really > > > important when implementations inter-operate. > > > > > > Best, > > > Jorge > > > > > > > > > > > > > > > On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > > > hi Jorge, > > > > > > > > The intent when authoring the specification was as follows > > > > > > > > * If two array slots being compared are both null, then they are > equal > > > > * If one is null and the other is not, they are not equal > > > > * If they are both not null, then they are equal if the data > > > > represented in the slot is equal (and if dictionary indices reference > > > > the same dictionary value, even if the dictionaries are different, > > > > then they are equal because the data they represent is the same) > > > > > > > > - Wes > > > > > > > > On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão > > > > <jorgecarlei...@gmail.com> wrote: > > > > > > > > > > Hi, > > > > > > > > > > Recently, I revisited the code for array equality in Rust. While > going > > > > > through it, I observed some assumptions about how we conclude that > two > > > > > elements of an arrow array are equal, and when two arrays are > equal. > > > > > > > > > > The notion of equality is also used throughout the document e.g. > when > > > we > > > > > offer examples using "unspecified", we are implicitly arguing that > we > > > > > should not care about that value when comparing arrays. It is also > used > > > > > when we use the wording "unique values" in the dictionary-encoded > > > arrays. > > > > > > > > > > The notion of array equality is important when we want to verify > > > > > interoperability between languages, where we often need to compare > > > arrays > > > > > (e.g. after a round-trip), as some implementations may change the > data > > > of > > > > > the "unspecified" slots and e.g. offsets. > > > > > > > > > > More fundamentally, IMO the specification offers a physical > > > > representation > > > > > (buffers, childs, offests, etc) of a logical asset (lists, structs, > > > int8, > > > > > int32), but currently does not say when two logical assets are > > > considered > > > > > equal. > > > > > > > > > > Would it make sense to systematize the notion of equality in the > > > > > specification, to align the different implementations into when > they > > > > should > > > > > consider two arrays to be equal? > > > > > > > > > > Best, > > > > > Jorge > > > > > > > >