Hi,
On 06/22/2015 07:47 AM, Jeff Janes wrote:
On Sat, Jun 20, 2015 at 8:28 AM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:
Hi Tomas,
I've lobotomized the sampling a bit to really produce a random set
of blocks first, and that produces way better estimates:
stati
On 06/22/2015 07:21 AM, Jeff Janes wrote:
On Sat, Jun 20, 2015 at 9:55 AM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:
Hi,
On 06/20/2015 03:01 AM, Jeff Janes wrote:
I don't think we need to really assume the density to be
constant,
ma
On Sat, Jun 20, 2015 at 8:28 AM, Tomas Vondra
wrote:
Hi Tomas,
I've lobotomized the sampling a bit to really produce a random set of
> blocks first, and that produces way better estimates:
>
>statistics target estimate random
>--
On Sat, Jun 20, 2015 at 9:55 AM, Tomas Vondra
wrote:
> Hi,
>
> On 06/20/2015 03:01 AM, Jeff Janes wrote:
>
>>
>>
>> I don't think we need to really assume the density to be constant,
>> maybe we can verify that while collecting the sample? I mean, we're
>> already reading the pages, s
On 6/20/15 12:55 PM, Tomas Vondra wrote:
Well, actually I think it would be even more appropriate for very large
tables. With a 2.5TB table, you don't really care whether analyze
collects 5GB or 8GB sample, the difference is rather minor compared to
I/O generated by the other queries etc. The cur
Hi,
On 06/20/2015 03:01 AM, Jeff Janes wrote:
Hmmm, that's probably true. OTOH correlated columns are not all that
uncommon (e.g. table storing time-series data etc.), and this blowup
is quite bad ...
True, but we don't know how big of a problem the density-skew
problem might be (
Hi,
On 06/20/2015 05:29 PM, Feng Tian wrote:
I have not read Jeff's patch, but here is how I think hash agg should work,
Hash agg scan lineitem table, perform aggregation in memory. Once
workmem is exhausted, it write intermediate state to disk, bucket by
bucket. When lineitem table is fini
Feng Tian wrote:
> I have not read Jeff's patch, but here is how I think hash agg should work,
I think you should discuss that in Jeff's thread, not here.
--
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sen
On Sat, Jun 20, 2015 at 7:56 AM, Tomas Vondra
wrote:
> Hi,
>
> On 06/20/2015 08:54 AM, Feng Tian wrote:
>
>>
>> While better sample/stats is important for choosing a good plan, in
>> this query, hash agg is really the right plan. If a sort agg is
>> chosen, the performance will be really really b
On 06/20/2015 04:17 PM, Robert Haas wrote:
On Wed, Jun 17, 2015 at 1:52 PM, Tomas Vondra
wrote:
I'm currently running some tests on a 3TB TPC-H data set, and I
tripped over a pretty bad n_distinct underestimate, causing OOM in
>> HashAgg (which somehow illustrates the importance of the
>> me
Hi,
On 06/20/2015 08:54 AM, Feng Tian wrote:
While better sample/stats is important for choosing a good plan, in
this query, hash agg is really the right plan. If a sort agg is
chosen, the performance will be really really bad. The patch that
Jeff is working on is critical for a decent TPCH num
On Wed, Jun 17, 2015 at 1:52 PM, Tomas Vondra
wrote:
> I'm currently running some tests on a 3TB TPC-H data set, and I tripped over
> a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow
> illustrates the importance of the memory-bounded hashagg patch Jeff Davis is
> workin
On Wed, Jun 17, 2015 at 10:52 AM, Tomas Vondra wrote:
> Hi,
>
> I'm currently running some tests on a 3TB TPC-H data set, and I tripped
> over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which
> somehow illustrates the importance of the memory-bounded hashagg patch Jeff
> Davis
On Fri, Jun 19, 2015 at 1:39 PM, Tomas Vondra
wrote:
> On 06/19/2015 09:48 PM, Jeff Janes wrote:
>
>> On Fri, Jun 19, 2015 at 12:27 PM, Tomas Vondra
>> mailto:tomas.von...@2ndquadrant.com>>
>> wrote:
>>
>> But I think you might be on to something, because I manually
>> collected a random
On 06/19/2015 09:48 PM, Jeff Janes wrote:
On Fri, Jun 19, 2015 at 12:27 PM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:
But I think you might be on to something, because I manually
collected a random sample with 30k rows (by explicitly generating
30k random TIDs), and I
On Fri, Jun 19, 2015 at 12:27 PM, Tomas Vondra wrote:
> But I think you might be on to something, because I manually collected a
> random sample with 30k rows (by explicitly generating 30k random TIDs), and
> I get this:
>
> tpch=# select cnt, count(*) from (select l_orderkey, count(*) AS cnt fr
On 06/19/2015 08:32 PM, Jeff Janes wrote:
On Wed, Jun 17, 2015 at 10:52 AM, Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>> wrote:
Hi,
I'm currently running some tests on a 3TB TPC-H data set, and I
tripped over a pretty bad n_distinct underestimate, causing OOM in
HashAgg
On Wed, Jun 17, 2015 at 10:52 AM, Tomas Vondra wrote:
> Hi,
>
> I'm currently running some tests on a 3TB TPC-H data set, and I tripped
> over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which
> somehow illustrates the importance of the memory-bounded hashagg patch Jeff
> Davis
18 matches
Mail list logo