On Fri, Jan 18, 2019 at 12:15 PM James Coleman <jtc...@gmail.com> wrote: > I'll attempt to describe a more real world scenario: suppose we have a > schema like: > > users(id serial primary key) > orders(id serial primary key, user_id integer, created_at timestamp) > > And wanted to find the most recent N orders for a specific group of > users (e.g., in a report or search). Your query might look like: > > SELECT * > FROM orders > WHERE orders.user_id IN (1, 2, 3) > ORDER BY orders.created_at DESC > LIMIT 25 > > Currently an index on orders(user_id, created_at) will be used for > this query, but only to satisfy the scalar array op qual. Then all > matching orders (say, years worth) will be fetched, a sort node will > sort all of those results, and then a limit node will take the top N. > > Generalized the problem is something like "find the top N rows across > a group of foreign keys" (though saying foreign keys probably is too > specific). > > But under the scheme I'm proposing that same index would be able to > provide both the filter and guarantee ordering as well. > > Does that more real-world-ish example help place the usefulness of this?
Yes. It didn't make much sense back in 2019, but I understand what you meant now, I think. The latest version of my ScalarArrayOpExpr patch (v2) can execute queries like this efficiently: https://postgr.es/m/CAH2-WzkEyBU9UQM-5GWPcB=weshaukcjdvgfuqvhupuo-iy...@mail.gmail.com Note that your example is similar to the test case from the latest update on the thread. The test case from Benoit Tigeot, that appears here: https://gist.github.com/benoittgt/ab72dc4cfedea2a0c6a5ee809d16e04d?permalink_comment_id=4690491#gistcomment-4690491 You seemed to want to use an index that started with user_id/bar_fk. But I think you'd have to have an index on "created_at DESC, user_id". It could work the other way around with your suggested index, for a query written to match -- "ORDER BY user_id, created_at DESC". With an index on "created_at DESC, user_id", you'd be able to efficiently execute your limit query. The index scan could only terminate when it found (say) 25 matching tuples, so you might still have to scan quite a few index pages. But, you wouldn't have to do heap access to eliminate non-matches (with or without the VM being set) -- you could eliminate all of those non-matches using true SAOP index quals, that don't need to operate on known visible rows. This is possible with the patch, despite the fact that the user_id column is a low-order column (so this isn't one of the cases where it's useful to "skip"). Avoiding heap hits just to eliminate non-matching rows on user_id is what really matters here, though -- not skipping. It would be helpful if you could confirm this understanding, though. Thanks -- Peter Geoghegan