Hi Wes,

Could you clarify? The logical data type you mean arrow's logical data
type? The semantics of the logical data type are the only ones that could
IMO justify a clarification, in particular, given a data type, how do we
agree that slot i from array "a" and slot j from array "b" are equal.

Best,
Jorge




On Fri, Nov 13, 2020 at 3:27 PM Wes McKinney <wesmck...@gmail.com> wrote:

> On Fri, Nov 13, 2020 at 1:19 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > Hi Jorge,
> > I think it would make sense to add some clarifications to the document
> per
> > Wes's comments. Do you want to maybe try to make a PR?
> >
> > One small edge case to consider is how NaN float values are compared.
>
> I think at the specification level, it should only be bit/byte-level
> binary equality without respect to the semantics of the logical data
> type.
>
> > -Micah
> >
> > On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão <
> > jorgecarlei...@gmail.com> wrote:
> >
> > > Hi Wes,
> > >
> > > Thanks a lot. I agree. My question is whether we should make it
> explicit in
> > > the specification. AFAIK, "if the data represented in the slot is
> equal"
> > > depends on the datatype: for variable sized arrays with offsets (e.g.
> > > strings), the equality of slot i is something along the lines of:
> > >
> > > start = lhs.buffer[0][(lhs.offset + i) * size_of<T>] as T
> > > end = lhs.buffer[0][(lhs.offset + i + 1) * size_of<T>] as T
> > > lhs_value = lhs.buffer[1][start..end]
> > > # same for rhs
> > > lhs_value == rhs_value
> > >
> > > This logic is also tricky for any type with childs, where we need to
> > > compare the slot of the child through recursion.
> > > These things are not really implementation specific, yet they are
> really
> > > important when implementations inter-operate.
> > >
> > > Best,
> > > Jorge
> > >
> > >
> > >
> > >
> > > On Thu, Nov 5, 2020 at 3:44 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > >
> > > > hi Jorge,
> > > >
> > > > The intent when authoring the specification was as follows
> > > >
> > > > * If two array slots being compared are both null, then they are
> equal
> > > > * If one is null and the other is not, they are not equal
> > > > * If they are both not null, then they are equal if the data
> > > > represented in the slot is equal (and if dictionary indices reference
> > > > the same dictionary value, even if the dictionaries are different,
> > > > then they are equal because the data they represent is the same)
> > > >
> > > > - Wes
> > > >
> > > > On Thu, Nov 5, 2020 at 1:13 AM Jorge Cardoso Leitão
> > > > <jorgecarlei...@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Recently, I revisited the code for array equality in Rust. While
> going
> > > > > through it, I observed some assumptions about how we conclude that
> two
> > > > > elements of an arrow array are equal, and when two arrays are
> equal.
> > > > >
> > > > > The notion of equality is also used throughout the document e.g.
> when
> > > we
> > > > > offer examples using "unspecified", we are implicitly arguing that
> we
> > > > > should not care about that value when comparing arrays. It is also
> used
> > > > > when we use the wording "unique values" in the dictionary-encoded
> > > arrays.
> > > > >
> > > > > The notion of array equality is important when we want to verify
> > > > > interoperability between languages, where we often need to compare
> > > arrays
> > > > > (e.g. after a round-trip), as some implementations may change the
> data
> > > of
> > > > > the "unspecified" slots and e.g. offsets.
> > > > >
> > > > > More fundamentally, IMO the specification offers a physical
> > > > representation
> > > > > (buffers, childs, offests, etc) of a logical asset (lists, structs,
> > > int8,
> > > > > int32), but currently does not say when two logical assets are
> > > considered
> > > > > equal.
> > > > >
> > > > > Would it make sense to systematize the notion of equality in the
> > > > > specification, to align the different implementations into when
> they
> > > > should
> > > > > consider two arrays to be equal?
> > > > >
> > > > > Best,
> > > > > Jorge
> > > >
> > >
>

Reply via email to