> Applied with some significant editorialization. The biggest problem I
> found was that the code for expression indexes didn't really work, and
> would leak memory like there's no tomorrow even when it did work.
Sorry I couldn't write the way it was supposed to... I'll look at the
difference
On Fri, Oct 8, 2010 at 10:49 AM, Tom Lane wrote:
> Itagaki Takahiro writes:
>> I wrote a patch to improve CLUSTER VERBOSE (and VACUUM FULL VERBOSE).
>> The patch should be applied after sorted_cluster-20100721.patch .
>
> Applied with minor fixes; in particular, I think you got the effects of
> r
Itagaki Takahiro writes:
> I wrote a patch to improve CLUSTER VERBOSE (and VACUUM FULL VERBOSE).
> The patch should be applied after sorted_cluster-20100721.patch .
Applied with minor fixes; in particular, I think you got the effects of
rewrite_heap_dead_tuple backwards. When a tuple is removed
Takahiro Itagaki writes:
> BTW, we could have LogicalTapeReadExact() as an alias of
> LogicalTapeRead() and checking the result because we have
> many duplicated codes for "unexpected end of data" errors.
Good idea, done.
regards, tom lane
--
Sent via pgsql-hackers mail
Itagaki Takahiro writes:
> I re-ordered some description in the doc. Does it look better?
> Comments and suggestions welcome.
Applied with some significant editorialization. The biggest problem I
found was that the code for expression indexes didn't really work, and
would leak memory like there'
Josh Kupershmidt writes:
> So I think there are definitely cases where this patch helps, but it
> looks like a seq. scan is being chosen in some cases where it doesn't
> help.
I've been poking through this patch, and have found two different ways
in which it underestimates the cost of the seqscan
On Mon, Oct 4, 2010 at 4:47 PM, Leonardo Francalanci wrote:
>> It sounds like the costing model might need a bit more work before we commit
>>this.
>
>
> I tried again the simple sql tests I posted a while ago, and I still get the
> same ratios.
> I've tested the applied patch on a dual opteron +
On Mon, Oct 4, 2010 at 8:42 AM, Robert Haas wrote:
> Did you also adjust random_page_cost?
No, my seq_page_cost (1) and random_page_cost (4) are from the
defaults. Here are all my non-default settings:
test=# SELECT name, unit, setting FROM pg_settings WHERE source !=
'default'; nam
> It sounds like the costing model might need a bit more work before we commit
>this.
I tried again the simple sql tests I posted a while ago, and I still get the
same ratios.
I've tested the applied patch on a dual opteron + disk array Solaris machine.
I really don't get how a laptop hard dr
On Oct 1, 2010, at 8:36 PM, Josh Kupershmidt wrote:
> On Fri, Oct 1, 2010 at 4:33 AM, Leonardo Francalanci wrote:
>>> I ran a few more performance tests on this patch. Here's what I got
>>> for the tests Leonardo posted originally:
>>>* 2M rows: 22 seconds for seq. scan, 24 seconds for ind
On Sat, Oct 2, 2010 at 9:36 AM, Josh Kupershmidt wrote:
> Hrm, this is interesting. I set up a test table with 5M rows like so:
Such discussions are for the planner itself, right? The sorted cluster
patch uses the existing planner's costing model, so we can discuss the
clustering feature and the
On Fri, Oct 1, 2010 at 4:33 AM, Leonardo Francalanci wrote:
>> I ran a few more performance tests on this patch. Here's what I got
>> for the tests Leonardo posted originally:
>> * 2M rows: 22 seconds for seq. scan, 24 seconds for index scan
>> * 5M rows: 139 seconds for seq. scan, 97 s
On Fri, Oct 1, 2010 at 4:33 AM, Leonardo Francalanci wrote:
> I guess that if the planner makes a wrong choice in this case (that is,
> seq scan + sort instead of index scan) there's no way for "cluster" to
> behave in a different way. If, on the contrary, the "create table..." uses
> the right pl
> I ran a few more performance tests on this patch. Here's what I got
> for the tests Leonardo posted originally:
>* 2M rows: 22 seconds for seq. scan, 24 seconds for index scan
>* 5M rows: 139 seconds for seq. scan, 97 seconds for index scan
>* 10M rows: 256 seconds seq. scan, 61
On Sep 30, 2010, at 9:07 PM, Itagaki Takahiro
wrote:
> Hi, Leonardo-san,
>
> On Fri, Oct 1, 2010 at 4:04 AM, Tom Lane wrote:
>> The wording should be something like "CLUSTER requires transient disk
>> space equal to about twice the size of the table plus its indexes".
>
> Could you merge those
Hi, Leonardo-san,
On Fri, Oct 1, 2010 at 4:04 AM, Tom Lane wrote:
> The wording should be something like "CLUSTER requires transient disk
> space equal to about twice the size of the table plus its indexes".
Could you merge those discussions into the final patch?
Also, please check whether my mo
Simon Riggs writes:
> On Wed, 2010-09-29 at 08:44 -0400, Alvaro Herrera wrote:
> So, I think "twice disk space of the sum of table and indexes" would be
> the simplest explanation for safe margin.
>>
>> Agreed.
> Surely the peak space is x3?
> Old space + sort space + new space.
The wording sho
On Wed, 2010-09-29 at 08:44 -0400, Alvaro Herrera wrote:
> > So, I think "twice disk space of the sum of table and indexes" would be
> > the simplest explanation for safe margin.
>
> Agreed.
Surely the peak space is x3?
Old space + sort space + new space.
--
Simon Riggs www.2ndQuad
On Wed, Sep 29, 2010 at 11:55 AM, Leonardo Francalanci wrote:
> Can someone else test the patch to see if what I found is still valid?
> I don't think it makes much sense if I'm the only one that says
> "this is faster" :)
I ran a few more performance tests on this patch. Here's what I got
for th
> > Here's my post with a (very simple) performance test:
> > http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php
> I think the 10M rows test is more in line with what we want (83s vs. 646).
Can someone else test the patch to see if what I found is still valid?
I don't think it ma
On Wed, Sep 29, 2010 at 10:14 PM, Robert Haas wrote:
>> http://reorg.projects.postgresql.org/
>
> Can you reproduce that with this patch?
No, I can't use the machine anymore.
--
Itagaki Takahiro
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subs
On Wed, Sep 29, 2010 at 1:12 AM, Itagaki Takahiro
wrote:
> On Wed, Sep 29, 2010 at 1:27 PM, Alvaro Herrera
> wrote:
>>> I see a consistent
>>> ~10% advantage for the sequential scan clusters.
>>
>> 10% is nothing. I was expecting this patch would give an order of
>> magnitude of improvement or s
Excerpts from Itagaki Takahiro's message of mié sep 29 01:25:38 -0400 2010:
> To be exact, It's very complex.
> During reconstructing tables, it requires about twice disk space of
> the old table (for sort tapes and the new table).
> After sorting the table, CLUSTER performs REINDEX. We need
> {sa
Excerpts from Leonardo Francalanci's message of mié sep 29 03:17:07 -0400 2010:
> Here's my post with a (very simple) performance test:
>
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php
I think the 10M rows test is more in line with what we want (83s vs. 646).
--
Álvaro Her
> > 10% is nothing. I was expecting this patch would give an order of
> > magnitude of improvement or somethine like that in the worst cases of
> > the current code (highly unsorted input)
>
> Yes. It should be x10 faster than ordinary method in the worst cases.
Here's my post with a (very
On Wed, Sep 29, 2010 at 12:53 PM, Josh Kupershmidt wrote:
> I thought this paragraph was a little confusing:
Thanks for checking.
> ! In the second case, a full table scan is followed by a sort operation.
> ! The method is faster than the first one when the table is highly
> fragmented.
On Wed, Sep 29, 2010 at 1:27 PM, Alvaro Herrera
wrote:
>> I see a consistent
>> ~10% advantage for the sequential scan clusters.
>
> 10% is nothing. I was expecting this patch would give an order of
> magnitude of improvement or somethine like that in the worst cases of
> the current code (highly
Excerpts from Josh Kupershmidt's message of mar sep 28 23:53:33 -0400 2010:
> I started looking at the performance impact of this patch based on
> Leonardo's SQL file. On the 2 million row table, I see a consistent
> ~10% advantage for the sequential scan clusters. I'm going to try to
> run the bi
On Mon, Sep 27, 2010 at 10:05 PM, Itagaki Takahiro
wrote:
> I re-ordered some description in the doc. Does it look better?
> Comments and suggestions welcome.
I thought this paragraph was a little confusing:
! In the second case, a full table scan is followed by a sort operation.
! The m
On Tue, Aug 31, 2010 at 8:04 PM, Itagaki Takahiro
wrote:
> I think the patch is almost ready to commit, but still
> have some comments for the usability and documentations.
> I hope native English speakers would help improving docs.
I'm checking the latest patch for applying.
I found we actually
On Tue, Aug 31, 2010 at 8:04 PM, Itagaki Takahiro
wrote:
> * How to determine which method was used?
> We can get some information from trace_sort logs,
> but CLUSTER VERBOSE would be better to log
> which CLUSTER method was used.
I wrote a patch to improve CLUSTER VERBOSE (and VACUUM FULL VER
Sorry for the very delayed review.
On Wed, Jul 21, 2010 at 11:15 PM, Leonardo Francalanci wrote:
> 2) what other areas can I comment more?
I think the patch is almost ready to commit, but still
have some comments for the usability and documentations.
I hope native English speakers would help imp
> I think writetup_rawheap() and readtup_rawheap() are a little complex,
> but should work as long as there are no padding between t_len and t_self
> in HeapTupleData struct.
>
> - It might be cleaner if you write the total item length
> and tuple data separately.
> - "(char *) tuple + sizeo
Excerpts from Takahiro Itagaki's message of mié jul 07 04:39:38 -0400 2010:
> BTW, we could have LogicalTapeReadExact() as an alias of
> LogicalTapeRead() and checking the result because we have
> many duplicated codes for "unexpected end of data" errors.
I'd just add a boolean "exact required" t
Leonardo F wrote:
> I saw that you also changed the writing:
(snip)
> Are we sure it's 100% equivalent?
I think writetup_rawheap() and readtup_rawheap() are a little complex,
but should work as long as there are no padding between t_len and t_self
in HeapTupleData struct.
- It might be cleaner
> I reviewed
> your patch. It seems to be in good shape, and worked as
> expected. I
> suppressed a compiler warning in the patch and cleaned up
> whitespaces in it.
> Patch attached.
Thanks for the review!
I saw that you also changed the writing:
LogicalTapeWrite(state->tapeset, tapenum,
Leonardo F wrote:
> Attached the updated patch (should solve a bug) and a script.
I reviewed your patch. It seems to be in good shape, and worked as
expected. I suppressed a compiler warning in the patch and cleaned up
whitespaces in it. Patch attached.
I think we need some documentation for t
>Perhaps you could supply a .sql file containing a testcase
> illustrating the performance benefits you tested with your patch
Sure.
Attached the updated patch (should solve a bug) and a script.
The sql scripts generates a 2M rows table ("orig"); then the
table is copied and the copy clustered
I think I've found the problem:
tuple->t_data wasn't at HEAPTUPLESIZE distance from tuple.
I guess the code makes that assumption somewhere, so I had
to do:
tuple->t_data = (HeapTupleHeader) ((char *) tuple +
HEAPTUPLESIZE);
Now that
> I think you're confusing HeapTuple and HeapTupleHeader. SortTuple->tuple
> field should point to a HeapTupleHeader, not a HeapTuple.
Mmh, I don't get it: that would mean I might be using more info than required,
but I still don't understand why it's not working... at the end, CLUSTER is
going
Leonardo F wrote:
> static void
> writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup)
> {
> HeapTupletuple = (HeapTuple) stup->tuple;
I think you're confusing HeapTuple and HeapTupleHeader. SortTuple->tuple
field should point to a HeapTupleHeader, not a HeapTuple.
The right
Hi all,
while testing the seq scan + sort CLUSTER implementation, I've found a
bug in write/readtup_rawheap.
The functions are needed by the sort part.
The code in
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php
didn't copy the whole tuple, but only the HeapTuple "header": t
On Tue, Feb 9, 2010 at 5:49 AM, Leonardo F wrote:
> Not even a comment? As I said, performance results on my system
> were very good
>
Hi Leonardo,
Perhaps you could supply a .sql file containing a testcase illustrating the
performance benefits you tested with your patch -- as I understand,
Not even a comment? As I said, performance results on my system
were very good
> I know you're all very busy getting 9.0 out, but I think the results in
> heap scanning + sort instead of index scanning for CLUSTER are
> very good... I would like to know if I did something wrong/I should
> impr
44 matches
Mail list logo