Having a way to encode sorting (and distribution) information is something
I'd also be very interested in. If provided in a standardized format, this
would enable optimizations across multiple Arrow-based systems. So I'd be
happy to get involved in this!

Best,
Hendrik

On Wed, 12 May 2021 at 00:25, Andrew Lamb <al...@influxdata.com> wrote:

> I see no reason each system that uses Arrow can't add their own notion of
> sortedness (and potentially distribution, as mentioned by Julian), but
> given how common the notion was I felt having some sort of standard way to
> encode the information might make it more useful to the broader Arrow
> ecosystem.
>
> I don't have time in the near term to drive such a standardization effort,
> but would be happy to help with one if anyone else is interested.
>
> Andrew
>
> On Tue, May 11, 2021 at 3:19 PM Adam Hooper <a...@adamhooper.com> wrote:
>
> > Beware with collations: Collation order is not fixed. As per TR10
> > <https://www.unicode.org/reports/tr10/>:
> >
> > Over time, collation order will vary: there may be fixes needed as more
> > > information becomes available about languages; there may be new
> > government
> > > or industry standards for the language that require changes; and
> finally,
> > > new characters added to the Unicode Standard will interleave with the
> > > previously-defined ones. This means that collations must be carefully
> > > versioned.
> >
> >
> > I don't know of any nice solutions.
> >
> > Postgres has plans <https://wiki.postgresql.org/wiki/Collations> to
> > version
> > collations in v13/v14. I'm a Postgres user who experienced index
> corruption
> > between collation versions, To me, Postgres' effort seems both
> cutting-edge
> > and essential.
> >
> > Enjoy life,
> > Adam
> >
> > --
> > Adam Hooper
> > +1-514-882-9694
> > http://adamhooper.com
> >
>

Reply via email to