Awesome. Thanks for explaining that. I imagined it had good historical reasoning. I've changed _all_docs in fdb to follow the raw collation https://github.com/apache/couchdb/commit/9b325b75814418b85ffb3642a5115635416f56a8
On Tue, Mar 31, 2020 at 11:07 AM Jan Lehnardt <j...@apache.org> wrote: > > > > On 26. Mar 2020, at 11:18, Garren Smith <gar...@apache.org> wrote: > > > > Oh interesting, reading the documentation more carefully I see we have > raw > > collation > > > https://docs.couchdb.org/en/stable/ddocs/views/collation.html#raw-collation > > So _all_docs is using that and that explains why an object comes before a > > string. > > So do we want to keep raw collation for _all_docs? > > > The reason for this is a simplified codepath and maybe even performance > for regular database operations. _all_docs internally is the by-id index > that performs any and all document reads and writes, so the original design > tried make this as lean as possible generally. Since we do Unicode > collation in a NIF, that’s an extra step we did not want to take at the > time. > > I can’t judge the impact of this for FDB since we already have to do > key-mangling, is another NIF call there that much of a problem? Has it ever > been? NIFs have vastly improved since the original design, so I don’t > really know. If it doesn’t make a performance difference, I would not > object to changing the behaviour, if it would simplify our _all_docs code. > That said, since we have the raw option and want to keep it, we’ll have two > paths anyway and switching the default for one route doesn’t sound like a > hard problem. > > That leaves compatibility. I’d wager that there are few cases which rely > on raw collation in _all_docs, and for those, it’d be easy enough to adjust > to the new world. That said, If there is no overwhelming reason to change > the current behaviour, I’d say we keep things as-is. > > Best > Jan > — > > > > > > On Thu, Mar 26, 2020 at 11:45 AM Glynn Bird <glynn.b...@gmail.com> > wrote: > > > >> It's not something I was aware of, but it's certainly a known "feature", > >> documented here: > >> https://docs.couchdb.org/en/stable/ddocs/views/collation.html#all-docs > >> > >> (probably because all keys are strings in all_docs, whereas they can be > all > >> sorts of mixed types with a view, and ascii collation would be faster > with > >> that assumption) > >> > >> On Thu, 26 Mar 2020 at 07:12, Garren Smith <gar...@apache.org> wrote: > >> > >>> Hi Everyone, > >>> > >>> While working on the Mango implementation for FDB, I've noticed that > >>> _all_docs has some weird > >>> ordering collation. If you do something like GET > >> /db/_all_docs?startkey={} > >>> it will return all the documents even though in view collation an > object > >> is > >>> ordered after strings. The reason I've noticed this is that in the > >>> pouchdb-find tests we have a few tests that check that {selector: {_id: > >>> {$gt: {}}} return all the docs in the database [0]. > >>> > >>> This ordering feels wrong to me, but I'm guessing its been around for a > >>> while. Currently for _all_docs on FDB, we have it that if you did the > >> above > >>> startkey query, it would not return any documents as we are following > the > >>> view collation ordering. > >>> > >>> I want to know whether we should keep the old _all_docs ordering or > >> rather > >>> standardize on view collation ordering everywhere? > >>> > >>> I would prefer we change it, but I'm not sure the implications of that > >> for > >>> client libraries and users. > >>> Changing it would be a breaking change, but since 4.0 is going to be a > >> lot > >>> of breaking change I think this would be our best chance to do this. > >>> > >>> Cheers > >>> Garren > >>> > >>> > >>> > >>> [0] > >>> > >>> > >> > https://github.com/nolanlawson/pouchdb-find/commit/e1ca2e2d18041f05a3d19bce4254f4d7b349ad20 > >>> > >> > >