Re: [GENERAL] index problems (again)

2016-03-13 Thread Geoff Winkless
On 12 March 2016 at 22:00, Peter J. Holzer wrote: > I don't think most people's data is perfectly distributed. But as you > say most data is probably within some deviation of being perfectly > distributed and as long as that deviation isn't too big it doesn't > matter. Is that how what I wrote ca

Re: [GENERAL] index problems (again)

2016-03-12 Thread Jeff Janes
On Sat, Mar 12, 2016 at 1:00 PM, Geoff Winkless wrote: > > As Jeff points out I'd have a much larger win in this instance by > someone spending the time implementing skip index scans rather than > messing with the planner :) But I think the hardest part of implementing skip index scans would prob

Re: [GENERAL] index problems (again)

2016-03-12 Thread Jeff Janes
On Tue, Mar 8, 2016 at 2:16 AM, Geoff Winkless wrote: > On 7 March 2016 at 20:40, Peter J. Holzer wrote: >> As Tom wrote, the estimate of having to read only about 140 rows is only >> valid if sc_id and sc_date are uncorrelated. In reality your query has >> to read a lot more than 140 rows, so it

Re: [GENERAL] index problems (again)

2016-03-12 Thread Peter J. Holzer
On 2016-03-12 21:00:04 +, Geoff Winkless wrote: > On 12 March 2016 at 18:43, Peter J. Holzer wrote: > > The question is what can be done to improve the situation. > > > > Tom thinks that correlation statistics would help. That seems plausible > > to me. [...] > > You claim that no statistics a

Re: [GENERAL] index problems (again)

2016-03-12 Thread Geoff Winkless
On 12 March 2016 at 18:43, Peter J. Holzer wrote: > On 2016-03-08 10:16:57 +, Geoff Winkless wrote: >> On 7 March 2016 at 20:40, Peter J. Holzer wrote: >> > As Tom wrote, the estimate of having to read only about 140 rows is only >> > valid if sc_id and sc_date are uncorrelated. In reality yo

Re: [GENERAL] index problems (again)

2016-03-12 Thread Peter J. Holzer
On 2016-03-08 10:16:57 +, Geoff Winkless wrote: > On 7 March 2016 at 20:40, Peter J. Holzer wrote: > > As Tom wrote, the estimate of having to read only about 140 rows is only > > valid if sc_id and sc_date are uncorrelated. In reality your query has > > to read a lot more than 140 rows, so it

Re: [GENERAL] index problems (again)

2016-03-08 Thread Geoff Winkless
On 7 March 2016 at 20:40, Peter J. Holzer wrote: > As Tom wrote, the estimate of having to read only about 140 rows is only > valid if sc_id and sc_date are uncorrelated. In reality your query has > to read a lot more than 140 rows, so it is much slower. But as I've said previously, even if I do

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 20:23, Jeff Janes wrote: > PostgreSQL does not (yet) implement "loose" index scans or "skip > scans", which is what you are asking for. You can roll your own using > the techniques described here: > https://wiki.postgresql.org/wiki/Loose_indexscan, which has the > benefit over

Re: [GENERAL] index problems (again)

2016-03-07 Thread Peter J. Holzer
On 2016-03-07 16:37:37 +, Geoff Winkless wrote: > On 7 March 2016 at 16:02, Tom Lane wrote: > > In English, what that plan is trying to do is scan the index > > in sc_id order until it hits a row with scdate in the target range. > > The first such row, by definition, has the correct min(sc_id)

Re: [GENERAL] index problems (again)

2016-03-07 Thread Jeff Janes
On Mon, Mar 7, 2016 at 9:35 AM, Geoff Winkless wrote: > On 7 March 2016 at 16:44, Tom Lane wrote: >> Geoff Winkless writes: >>> But as far as I can see, apart from the absolute extremes, the >>> index-only scan is _always_ going to be quicker than the index+table >>> scan. >> >> Well, that is a

Re: [GENERAL] index problems (again)

2016-03-07 Thread Jeff Janes
On Mon, Mar 7, 2016 at 8:37 AM, Geoff Winkless wrote: > > But as far as I can see, apart from the absolute extremes, the > index-only scan is _always_ going to be quicker than the index+table > scan. If relallvisible is zero, it thinks it gets zero benefit from an index only scan. It thinks that

Re: [GENERAL] index problems (again)

2016-03-07 Thread Jeff Janes
On Mon, Mar 7, 2016 at 5:01 AM, Geoff Winkless wrote: > On 7 March 2016 at 11:48, Victor Yegorov wrote: >> 2016-03-07 13:38 GMT+02:00 Geoff Winkless : >>> >>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate >>> BETWEEN 20160219 AND 20160221; >> >> >> Will it help if you'll ad

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 16:44, Tom Lane wrote: > Geoff Winkless writes: >> But as far as I can see, apart from the absolute extremes, the >> index-only scan is _always_ going to be quicker than the index+table >> scan. > > Well, that is a different issue: what does the planner think of an > index-only

Re: [GENERAL] index problems (again)

2016-03-07 Thread Tom Lane
Geoff Winkless writes: > But as far as I can see, apart from the absolute extremes, the > index-only scan is _always_ going to be quicker than the index+table > scan. Well, that is a different issue: what does the planner think of an index-only scan as compared to a regular index scan. I suspect

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 16:02, Tom Lane wrote: > In English, what that plan is trying to do is scan the index > in sc_id order until it hits a row with scdate in the target range. > The first such row, by definition, has the correct min(sc_id) value. > The problem is that we're guessing at how soon we'

Re: [GENERAL] index problems (again)

2016-03-07 Thread Tom Lane
Geoff Winkless writes: > On 7 March 2016 at 14:51, Tom Lane wrote: >> Because the other way is estimated to be cheaper. The estimate is >> wrong, because it's based on a statistical assumption that's wrong >> (ie that sc_id and scdate are uncorrelated), but it's what we have >> to work with at t

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 14:51, Tom Lane wrote: > Geoff Winkless writes: >> So it seems that it should in fact be usable after all. So I'm still >> stumped as to why the (scdate,sc_id) index isn't used :( > > Because the other way is estimated to be cheaper. The estimate is > wrong, because it's based

Re: [GENERAL] index problems (again)

2016-03-07 Thread Tom Lane
Geoff Winkless writes: > So it seems that it should in fact be usable after all. So I'm still > stumped as to why the (scdate,sc_id) index isn't used :( Because the other way is estimated to be cheaper. The estimate is wrong, because it's based on a statistical assumption that's wrong (ie that s

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 14:27, I wrote: > So it seems that it should in fact be usable after all. So I'm still > stumped as to why the (scdate,sc_id) index isn't used :( Also, while the index on sc_id will be sorted there's no guarantee that sc_id values will be in order in the table itself, so you're

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 14:18, I wrote: > That assumes that I've not completely misunderstood, of course :) Always a dangerous assumption, I'm rapidly learning. The very next section: Constraints on columns to the right of these columns are checked in the index, so they save visits to the table

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 13:23, Victor Yegorov wrote: > Your `sc_id` and `scdate` columns are correlated. Actually not necessarily, although in the majority case that's mostly true. > Planner has no such knowledge and assumes columns being independent. Your > `scdate` predicate is > estimate to return

Re: [GENERAL] index problems (again)

2016-03-07 Thread Victor Yegorov
2016-03-07 15:01 GMT+02:00 Geoff Winkless : > Unfortunately the cost of changing all the code that uses MIN() in > this way would be higher than just adding an extra index :( > > I suppose the thought is that for selecting just the MIN() value, by > traipsing through the index you immediately find

Re: [GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
On 7 March 2016 at 11:48, Victor Yegorov wrote: > 2016-03-07 13:38 GMT+02:00 Geoff Winkless : >> >> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate >> BETWEEN 20160219 AND 20160221; > > > Will it help if you'll add `count(*)` to your query like this: > > SELECT min(sc_id),

Re: [GENERAL] index problems (again)

2016-03-07 Thread Victor Yegorov
2016-03-07 13:38 GMT+02:00 Geoff Winkless : > # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate > BETWEEN 20160219 AND 20160221; > Will it help if you'll add `count(*)` to your query like this: SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND 2016022

[GENERAL] index problems (again)

2016-03-07 Thread Geoff Winkless
Hi all Firstly, I appreciate that my index problems are fairly difficult to debug given that I can't upload the data anywhere (it's commercially sensitive); I tried creating an equivalent dataset for my last problem using a lot of random() inserts, but unfortunately, even though the sizes and inde