On 12 March 2016 at 22:00, Peter J. Holzer wrote:
> I don't think most people's data is perfectly distributed. But as you
> say most data is probably within some deviation of being perfectly
> distributed and as long as that deviation isn't too big it doesn't
> matter.
Is that how what I wrote ca
On Sat, Mar 12, 2016 at 1:00 PM, Geoff Winkless wrote:
>
> As Jeff points out I'd have a much larger win in this instance by
> someone spending the time implementing skip index scans rather than
> messing with the planner :)
But I think the hardest part of implementing skip index scans would
prob
On Tue, Mar 8, 2016 at 2:16 AM, Geoff Winkless wrote:
> On 7 March 2016 at 20:40, Peter J. Holzer wrote:
>> As Tom wrote, the estimate of having to read only about 140 rows is only
>> valid if sc_id and sc_date are uncorrelated. In reality your query has
>> to read a lot more than 140 rows, so it
On 2016-03-12 21:00:04 +, Geoff Winkless wrote:
> On 12 March 2016 at 18:43, Peter J. Holzer wrote:
> > The question is what can be done to improve the situation.
> >
> > Tom thinks that correlation statistics would help. That seems plausible
> > to me.
[...]
> > You claim that no statistics a
On 12 March 2016 at 18:43, Peter J. Holzer wrote:
> On 2016-03-08 10:16:57 +, Geoff Winkless wrote:
>> On 7 March 2016 at 20:40, Peter J. Holzer wrote:
>> > As Tom wrote, the estimate of having to read only about 140 rows is only
>> > valid if sc_id and sc_date are uncorrelated. In reality yo
On 2016-03-08 10:16:57 +, Geoff Winkless wrote:
> On 7 March 2016 at 20:40, Peter J. Holzer wrote:
> > As Tom wrote, the estimate of having to read only about 140 rows is only
> > valid if sc_id and sc_date are uncorrelated. In reality your query has
> > to read a lot more than 140 rows, so it
On 7 March 2016 at 20:40, Peter J. Holzer wrote:
> As Tom wrote, the estimate of having to read only about 140 rows is only
> valid if sc_id and sc_date are uncorrelated. In reality your query has
> to read a lot more than 140 rows, so it is much slower.
But as I've said previously, even if I do
On 7 March 2016 at 20:23, Jeff Janes wrote:
> PostgreSQL does not (yet) implement "loose" index scans or "skip
> scans", which is what you are asking for. You can roll your own using
> the techniques described here:
> https://wiki.postgresql.org/wiki/Loose_indexscan, which has the
> benefit over
On 2016-03-07 16:37:37 +, Geoff Winkless wrote:
> On 7 March 2016 at 16:02, Tom Lane wrote:
> > In English, what that plan is trying to do is scan the index
> > in sc_id order until it hits a row with scdate in the target range.
> > The first such row, by definition, has the correct min(sc_id)
On Mon, Mar 7, 2016 at 9:35 AM, Geoff Winkless wrote:
> On 7 March 2016 at 16:44, Tom Lane wrote:
>> Geoff Winkless writes:
>>> But as far as I can see, apart from the absolute extremes, the
>>> index-only scan is _always_ going to be quicker than the index+table
>>> scan.
>>
>> Well, that is a
On Mon, Mar 7, 2016 at 8:37 AM, Geoff Winkless wrote:
>
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.
If relallvisible is zero, it thinks it gets zero benefit from an index
only scan. It thinks that
On Mon, Mar 7, 2016 at 5:01 AM, Geoff Winkless wrote:
> On 7 March 2016 at 11:48, Victor Yegorov wrote:
>> 2016-03-07 13:38 GMT+02:00 Geoff Winkless :
>>>
>>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
>>> BETWEEN 20160219 AND 20160221;
>>
>>
>> Will it help if you'll ad
On 7 March 2016 at 16:44, Tom Lane wrote:
> Geoff Winkless writes:
>> But as far as I can see, apart from the absolute extremes, the
>> index-only scan is _always_ going to be quicker than the index+table
>> scan.
>
> Well, that is a different issue: what does the planner think of an
> index-only
Geoff Winkless writes:
> But as far as I can see, apart from the absolute extremes, the
> index-only scan is _always_ going to be quicker than the index+table
> scan.
Well, that is a different issue: what does the planner think of an
index-only scan as compared to a regular index scan. I suspect
On 7 March 2016 at 16:02, Tom Lane wrote:
> In English, what that plan is trying to do is scan the index
> in sc_id order until it hits a row with scdate in the target range.
> The first such row, by definition, has the correct min(sc_id) value.
> The problem is that we're guessing at how soon we'
Geoff Winkless writes:
> On 7 March 2016 at 14:51, Tom Lane wrote:
>> Because the other way is estimated to be cheaper. The estimate is
>> wrong, because it's based on a statistical assumption that's wrong
>> (ie that sc_id and scdate are uncorrelated), but it's what we have
>> to work with at t
On 7 March 2016 at 14:51, Tom Lane wrote:
> Geoff Winkless writes:
>> So it seems that it should in fact be usable after all. So I'm still
>> stumped as to why the (scdate,sc_id) index isn't used :(
>
> Because the other way is estimated to be cheaper. The estimate is
> wrong, because it's based
Geoff Winkless writes:
> So it seems that it should in fact be usable after all. So I'm still
> stumped as to why the (scdate,sc_id) index isn't used :(
Because the other way is estimated to be cheaper. The estimate is
wrong, because it's based on a statistical assumption that's wrong
(ie that s
On 7 March 2016 at 14:27, I wrote:
> So it seems that it should in fact be usable after all. So I'm still
> stumped as to why the (scdate,sc_id) index isn't used :(
Also, while the index on sc_id will be sorted there's no guarantee
that sc_id values will be in order in the table itself, so you're
On 7 March 2016 at 14:18, I wrote:
> That assumes that I've not completely misunderstood, of course :)
Always a dangerous assumption, I'm rapidly learning.
The very next section:
Constraints on columns to the right of these columns are checked
in the index, so they save visits to the table
On 7 March 2016 at 13:23, Victor Yegorov wrote:
> Your `sc_id` and `scdate` columns are correlated.
Actually not necessarily, although in the majority case that's mostly true.
> Planner has no such knowledge and assumes columns being independent. Your
> `scdate` predicate is
> estimate to return
2016-03-07 15:01 GMT+02:00 Geoff Winkless :
> Unfortunately the cost of changing all the code that uses MIN() in
> this way would be higher than just adding an extra index :(
>
> I suppose the thought is that for selecting just the MIN() value, by
> traipsing through the index you immediately find
On 7 March 2016 at 11:48, Victor Yegorov wrote:
> 2016-03-07 13:38 GMT+02:00 Geoff Winkless :
>>
>> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
>> BETWEEN 20160219 AND 20160221;
>
>
> Will it help if you'll add `count(*)` to your query like this:
>
> SELECT min(sc_id),
2016-03-07 13:38 GMT+02:00 Geoff Winkless :
> # EXPLAIN (ANALYZE,BUFFERS) SELECT MIN(sc_id) FROM legs WHERE scdate
> BETWEEN 20160219 AND 20160221;
>
Will it help if you'll add `count(*)` to your query like this:
SELECT min(sc_id), count(*) FROM legs WHERE scdate BETWEEN 20160219 AND
2016022
Hi all
Firstly, I appreciate that my index problems are fairly difficult to
debug given that I can't upload the data anywhere (it's commercially
sensitive); I tried creating an equivalent dataset for my last problem
using a lot of random() inserts, but unfortunately, even though the
sizes and inde
25 matches
Mail list logo