Having a way to encode sorting (and distribution) information is something I'd also be very interested in. If provided in a standardized format, this would enable optimizations across multiple Arrow-based systems. So I'd be happy to get involved in this!
Best, Hendrik On Wed, 12 May 2021 at 00:25, Andrew Lamb <al...@influxdata.com> wrote: > I see no reason each system that uses Arrow can't add their own notion of > sortedness (and potentially distribution, as mentioned by Julian), but > given how common the notion was I felt having some sort of standard way to > encode the information might make it more useful to the broader Arrow > ecosystem. > > I don't have time in the near term to drive such a standardization effort, > but would be happy to help with one if anyone else is interested. > > Andrew > > On Tue, May 11, 2021 at 3:19 PM Adam Hooper <a...@adamhooper.com> wrote: > > > Beware with collations: Collation order is not fixed. As per TR10 > > <https://www.unicode.org/reports/tr10/>: > > > > Over time, collation order will vary: there may be fixes needed as more > > > information becomes available about languages; there may be new > > government > > > or industry standards for the language that require changes; and > finally, > > > new characters added to the Unicode Standard will interleave with the > > > previously-defined ones. This means that collations must be carefully > > > versioned. > > > > > > I don't know of any nice solutions. > > > > Postgres has plans <https://wiki.postgresql.org/wiki/Collations> to > > version > > collations in v13/v14. I'm a Postgres user who experienced index > corruption > > between collation versions, To me, Postgres' effort seems both > cutting-edge > > and essential. > > > > Enjoy life, > > Adam > > > > -- > > Adam Hooper > > +1-514-882-9694 > > http://adamhooper.com > > >