[PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some profiling

2006-05-16 Thread Arjen van der Meijden

Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at cpu/platform-performance. 
Not really to have a look at how fast mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on it is 
modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the relatively 
lightweight, read-heavy webapplications out there and therefore decided 
to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output to 
the application/interface layer. While the initial structure only got 
postgresql at about half the performance of mysql 4.1.x, the current 
version of our postgresql-benchmark has quite similar results to mysql 
4.1.x, but both are quite a bit slower than 5.0.x (I think its about 
30-40% faster).


Since the results from those benchmarks are not yet public (they will be 
put together in a story at our website), I won't go into too much 
details about this benchmark.


Currently we're having a look at a Sun T2000 and will be looking at will 
be looking at other machines as well in the future. We are running the 
sun-release of postgresql 8.1.3 on that T2000 now, but are looking at 
compiling the cvs-head version (for its index-root-cache) somewhere this 
week.


My guess is there are a few people on this list who are interested in 
some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrace are very limited, I 
already took some samples of the system and user calls. I used Jignesh 
Shah's scripts for that: 
http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on


You can find the samples here:
http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log

And I also did the memcpy-scripts, here:
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
(this last log is 3.5MB)

If anyone is interested in some more dtrace results, let me know (and 
tell me what commands to run ;-) ).


Best regards,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] IMMUTABLE?

2006-05-16 Thread Joachim Wieland
On Tue, May 16, 2006 at 12:31:41AM -0400, Tom Lane wrote:
> It's true that the system *could* memoize (or in our more usual
> parlance, cache function values) given the assumptions embodied in
> IMMUTABLE.  But we don't, and I don't see any statement in the docs
> that promises that we do.  For 99% of the functions that the planner
> deals with, memoization would be seriously counterproductive because
> the function evaluation cost is comparable to if not less than the
> lookup cost in a memo table.  (int4pl is a good case in point.)

This seems to change as soon as one takes into account user functions.

While most immutable functions really seem to be small and their execution
fast, stable functions often hide complex sql (sometimes combined with
if-then-else or other program flow logic).

So irrespective of caching to prevent evaluation across statements, within a
single statement, is there a strong reason why for example in
WHERE col = f(const) with f() declared as immutable or stable and without an
index on col, f() still gets called for every row? Or is this optimization
just not done yet?


Joachim


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some profiling

2006-05-16 Thread Qingqing Zhou

"Arjen van der Meijden" <[EMAIL PROTECTED]> wrote
>
> Long story short, we think the test is a nice example of the relatively
> lightweight, read-heavy webapplications out there and therefore decided
> to have a go at postgresql as well.
>

Some sort of web query behavior is quite optimized in MySQL. For example,
the query below is runing very fast due to the query result cache
implementation in MySQL.

Loop N times
SELECT * FROM A WHERE i = 1;
End loop.

> You can find the samples here:
> http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
> http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log
>

IMHO, without knowing the exact queries you sent, these logs are not very
useful :-(. I would suggest you compare the queries in pair and then post
their dtrace/timing results here (just like the previous Firebird vs.
PostgreSQL comparison did).

Regards,
Qingqing



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-16 Thread Arjen van der Meijden

Qingqing Zhou wrote:

"Arjen van der Meijden" <[EMAIL PROTECTED]> wrote
Some sort of web query behavior is quite optimized in MySQL. For example,
the query below is runing very fast due to the query result cache
implementation in MySQL.

Loop N times
SELECT * FROM A WHERE i = 1;
End loop.


Yeah, I know. But our queries get random parameters though for 
identifiers and the like, so its not just a few queries getting executed 
a lot of times, there are. In a run for which I just logged all queries, 
almost 42k distinct queries executed from 128k in total (it may actually 
be more random than real life).
Besides that, they are not so extremely simple queries as your example. 
Most join at least two tables, while the rest often joins three to five.


But I agree, MySQL has a big advantage with its query result cache. That 
makes the current performance of postgresql even more impressive in this 
situation, since the query cache of the 4.1.x run was enabled as well.



IMHO, without knowing the exact queries you sent, these logs are not very
useful :-(. I would suggest you compare the queries in pair and then post
their dtrace/timing results here (just like the previous Firebird vs.
PostgreSQL comparison did).


Well, I'm bound to some privacy and copyright laws, but I'll see if I 
can show some example plans of at least the top few queries later today 
(the top two is resp 27% and 21% of the total time).
But those top queries aren't the only ones run during the benchmarks or 
in the production environment, nor are they run exclusively at any given 
time. So the overall load-picture should be usefull too, shouldn't it?


Best regards,

Arjen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-16 Thread Jignesh K. Shah

Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too many 
connections and they are contending to get a lock.. which is potentially 
the WAL log lock



* llseek is high which means you can obviously gain a bit with the right 
file system/files tuning by caching them right.



Have you set the values for Solaris for T2000 tuned for Postgresql?

Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see if it 
changes/improves your performance.



Regards,
Jignesh


Arjen van der Meijden wrote:

Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at cpu/platform-performance. 
Not really to have a look at how fast mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on it is 
modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the relatively 
lightweight, read-heavy webapplications out there and therefore decided 
to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output to 
the application/interface layer. While the initial structure only got 
postgresql at about half the performance of mysql 4.1.x, the current 
version of our postgresql-benchmark has quite similar results to mysql 
4.1.x, but both are quite a bit slower than 5.0.x (I think its about 
30-40% faster).


Since the results from those benchmarks are not yet public (they will be 
put together in a story at our website), I won't go into too much 
details about this benchmark.


Currently we're having a look at a Sun T2000 and will be looking at will 
be looking at other machines as well in the future. We are running the 
sun-release of postgresql 8.1.3 on that T2000 now, but are looking at 
compiling the cvs-head version (for its index-root-cache) somewhere this 
week.


My guess is there are a few people on this list who are interested in 
some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrace are very limited, I 
already took some samples of the system and user calls. I used Jignesh 
Shah's scripts for that: 
http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on 



You can find the samples here:
http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log

And I also did the memcpy-scripts, here:
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
(this last log is 3.5MB)

If anyone is interested in some more dtrace results, let me know (and 
tell me what commands to run ;-) ).


Best regards,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] IMMUTABLE?

2006-05-16 Thread Tom Lane
Joachim Wieland <[EMAIL PROTECTED]> writes:
> So irrespective of caching to prevent evaluation across statements, within a
> single statement, is there a strong reason why for example in
> WHERE col = f(const) with f() declared as immutable or stable and without an
> index on col, f() still gets called for every row? Or is this optimization
> just not done yet?

The above statement is not correct, at least not for immutable functions.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-16 Thread Arjen van der Meijden

Hi Jignesh,

Jignesh K. Shah wrote:

Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too many 
connections and they are contending to get a lock.. which is potentially 
the WAL log lock


* llseek is high which means you can obviously gain a bit with the right 
file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?


Not particularly, we got a "special T2000 Solaris dvd" from your 
colleagues here in the Netherlands and installed that (actually one of 
your colleagues did). Doing so all the "better default" 
/etc/system-settings are supposed to be set. I haven't really checked 
that they are, since two of your colleagues have been working on it for 
the mysql-version of the benchmark and I assumed they'd have verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see if it 
changes/improves your performance.


I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point at 
the I/O being in the way? But we only have about 600k/sec i/o according 
to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where altering 
queries, of which almost all where inserts in log-tables which aren't 
actively read in this benchmark.


But I'll give it a try.

Best regards,

Arjen



Arjen van der Meijden wrote:

Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at 
cpu/platform-performance. Not really to have a look at how fast mysql 
can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on it 
is modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the 
relatively lightweight, read-heavy webapplications out there and 
therefore decided to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output to 
the application/interface layer. While the initial structure only got 
postgresql at about half the performance of mysql 4.1.x, the current 
version of our postgresql-benchmark has quite similar results to mysql 
4.1.x, but both are quite a bit slower than 5.0.x (I think its about 
30-40% faster).


Since the results from those benchmarks are not yet public (they will 
be put together in a story at our website), I won't go into too much 
details about this benchmark.


Currently we're having a look at a Sun T2000 and will be looking at 
will be looking at other machines as well in the future. We are 
running the sun-release of postgresql 8.1.3 on that T2000 now, but are 
looking at compiling the cvs-head version (for its index-root-cache) 
somewhere this week.


My guess is there are a few people on this list who are interested in 
some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrace are very limited, I 
already took some samples of the system and user calls. I used Jignesh 
Shah's scripts for that: 
http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on 



You can find the samples here:
http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log

And I also did the memcpy-scripts, here:
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
(this last log is 3.5MB)

If anyone is interested in some more dtrace results, let me know (and 
tell me what commands to run ;-) ).


Best regards,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-16 Thread Jignesh K. Shah

Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative weightage 
about what portion in that low is associated with what call.. So even if 
you have say..1% system time.. it says that most of it was IO related or 
semsys related. So iostat output with -c option to include CPU times 
helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too many 
connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?



Not particularly, we got a "special T2000 Solaris dvd" from your 
colleagues here in the Netherlands and installed that (actually one of 
your colleagues did). Doing so all the "better default" 
/etc/system-settings are supposed to be set. I haven't really checked 
that they are, since two of your colleagues have been working on it 
for the mysql-version of the benchmark and I assumed they'd have 
verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see if 
it changes/improves your performance.



I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point at 
the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where altering 
queries, of which almost all where inserts in log-tables which aren't 
actively read in this benchmark.


But I'll give it a try.

Best regards,

Arjen



Arjen van der Meijden wrote:


Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at 
cpu/platform-performance. Not really to have a look at how fast 
mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on it 
is modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the 
relatively lightweight, read-heavy webapplications out there and 
therefore decided to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output 
to the application/interface layer. While the initial structure only 
got postgresql at about half the performance of mysql 4.1.x, the 
current version of our postgresql-benchmark has quite similar 
results to mysql 4.1.x, but both are quite a bit slower than 5.0.x 
(I think its about 30-40% faster).


Since the results from those benchmarks are not yet public (they 
will be put together in a story at our website), I won't go into too 
much details about this benchmark.


Currently we're having a look at a Sun T2000 and will be looking at 
will be looking at other machines as well in the future. We are 
running the sun-release of postgresql 8.1.3 on that T2000 now, but 
are looking at compiling the cvs-head version (for its 
index-root-cache) somewhere this week.


My guess is there are a few people on this list who are interested 
in some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrace are very limited, I 
already took some samples of the system and user calls. I used 
Jignesh Shah's scripts for that: 
http://blogs.sun.com/roller/page/jkshah?entry=profiling_postgresql_using_dtrace_on 



You can find the samples here:
http://achelois.tweakers.net/~acm/pgsql-t2000/syscall.log
http://achelois.tweakers.net/~acm/pgsql-t2000/usrcall.log

And I also did the memcpy-scripts, here:
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpysize.log
http://achelois.tweakers.net/~acm/pgsql-t2000/memcpystack.log
(this last log is 3.5MB)

If anyone is interested in some more dtrace results, let me know 
(and tell me what commands to run ;-) ).


Best regards,

Arjen

---(end of 
broadcast)---

TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---

Re: [PERFORM] IMMUTABLE?

2006-05-16 Thread Joachim Wieland
On Tue, May 16, 2006 at 09:33:14AM -0400, Tom Lane wrote:
> Joachim Wieland <[EMAIL PROTECTED]> writes:
> > So irrespective of caching to prevent evaluation across statements, within a
> > single statement, is there a strong reason why for example in
> > WHERE col = f(const) with f() declared as immutable or stable and without an
> > index on col, f() still gets called for every row? Or is this optimization
> > just not done yet?

> The above statement is not correct, at least not for immutable functions.

So an immutable function gets evaluated once but a stable function still gets
called for every row? Wouldn't it make sense to call a stable function only
once as well?


Joachim

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] IMMUTABLE?

2006-05-16 Thread David Wheeler

On May 15, 2006, at 21:31, Tom Lane wrote:


Sure.  As I read it, that's talking about a static transformation:
planner sees 2 + 2 (or if you prefer, int4pl(2,2)), planner runs the
function and replaces the expression with 4.  Nothing there about
memoization.


Oh, I see. So it's more like a constant or C macro.


It's true that the system *could* memoize (or in our more usual
parlance, cache function values) given the assumptions embodied in
IMMUTABLE.  But we don't, and I don't see any statement in the docs
that promises that we do.  For 99% of the functions that the planner
deals with, memoization would be seriously counterproductive because
the function evaluation cost is comparable to if not less than the
lookup cost in a memo table.  (int4pl is a good case in point.)


Yes, but there are definitely programming cases where memoization/ 
caching definitely helps. And it's easy to tell for a given function  
whether or not it really helps by simply trying it with CACHED and  
without.


Would this be a simple thing to implement?

Best,

David

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Pgsql (and mysql) benchmark on T2000/Solaris and some

2006-05-16 Thread Arjen van der Meijden

Hi Jignesh,

The settings from that 'special T2000 dvd' differed from the recommended 
settings on the website you provided. But I don't see much difference in 
performance with any of the adjustments, it appears to be more or less 
the same.


Here are a few iostat lines by the way:

sd0   sd1   sd2   nfs1   cpu
kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
  7   1   12  958  50   350   070   00   13  1  0 85
  0   00  2353 29630   000   00   92  7  0  1
  0   00  2062 32620   000   00   93  7  0  0
  1   11  1575 35000   000   00   92  7  0  1
  0   00  1628 36200   000   00   92  8  0  1

It appears to be doing a little less kps/tps on sd1 when I restore my 
own postgresql.conf-settings. (default wal/checkpoints, 20k buffers, 2k 
work mem).


Is it possible to trace the stack's for semsys, like the memcpy-traces, 
or are those of no interest here?


Best regards,

Arjen


On 16-5-2006 17:52, Jignesh K. Shah wrote:

Hi Arjen,

Can you send me my colleagues's names in a private email?

One of the drawbacks of the syscall.d script is relative timings and 
hence if system CPU usage is very low, it gives the relative weightage 
about what portion in that low is associated with what call.. So even if 
you have say..1% system time.. it says that most of it was IO related or 
semsys related. So iostat output with -c option to include CPU times 
helps to put it in  the right perspective.



Also do check the tunables mentioned and make sure they are set.

Regards,
Jignesh


Arjen van der Meijden wrote:


Hi Jignesh,

Jignesh K. Shah wrote:


Hi Arjen,

Looking at your outputs...of syscall and usrcall it looks like

* Spending too much time in semsys  which means you have too many 
connections and they are contending to get a lock.. which is 
potentially the WAL log lock


* llseek is high which means you can obviously gain a bit with the 
right file system/files tuning by caching them right.


Have you set the values for Solaris for T2000 tuned for Postgresql?



Not particularly, we got a "special T2000 Solaris dvd" from your 
colleagues here in the Netherlands and installed that (actually one of 
your colleagues did). Doing so all the "better default" 
/etc/system-settings are supposed to be set. I haven't really checked 
that they are, since two of your colleagues have been working on it 
for the mysql-version of the benchmark and I assumed they'd have 
verified that.



Check out the tunables from the following URL

http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp

Try specially the /etc/system and postgresql.conf changes  and see if 
it changes/improves your performance.



I will see that those tunables are verified to be set.

I am a bit surprised though about your remarks, since they'd point at 
the I/O being in the way? But we only have about 600k/sec i/o 
according to vmstat. The database easily fits in memory.
In total I logged about 500k queries of which only 70k where altering 
queries, of which almost all where inserts in log-tables which aren't 
actively read in this benchmark.


But I'll give it a try.

Best regards,

Arjen



Arjen van der Meijden wrote:


Hi List,

In the past few weeks we have been developing a read-heavy 
mysql-benchmark to have an alternative take at 
cpu/platform-performance. Not really to have a look at how fast 
mysql can be.


This benchmark runs on mysql 4.1.x, 5.0.x and 5.1.x and is modelled 
after our website's production database and the load generated on it 
is modelled after a simplified version of our visitor behaviour.


Long story short, we think the test is a nice example of the 
relatively lightweight, read-heavy webapplications out there and 
therefore decided to have a go at postgresql as well.
Of course the queries and indexes have been adjusted to (by our 
knowledge) best suit postgresql, while maintaining the same output 
to the application/interface layer. While the initial structure only 
got postgresql at about half the performance of mysql 4.1.x, the 
current version of our postgresql-benchmark has quite similar 
results to mysql 4.1.x, but both are quite a bit slower than 5.0.x 
(I think its about 30-40% faster).


Since the results from those benchmarks are not yet public (they 
will be put together in a story at our website), I won't go into too 
much details about this benchmark.


Currently we're having a look at a Sun T2000 and will be looking at 
will be looking at other machines as well in the future. We are 
running the sun-release of postgresql 8.1.3 on that T2000 now, but 
are looking at compiling the cvs-head version (for its 
index-root-cache) somewhere this week.


My guess is there are a few people on this list who are interested 
in some dtrace results taken during our benchmarks on that T2000.
Although my knowledge of both Solaris and Dtrac

[PERFORM] Adding and filling new column on big table

2006-05-16 Thread Jonathan Blitz








I have a table of about 500,000 rows. 

 

I need to add a new column and populate it.

 

So, I have tried to run the following command. The command
never finishes (I gave up after about and hour and a half!).

Note that none of the columns have indexes.

 

Update mytable set new_column =    

case when column_1 = column_2                        then
1   

when column_1+column_3= column_2 and column_3 > 0              then
2   

when column_1+column_3+column_4 = column_2 and column_4 >
0         then 3   

when column_1+column_3+column_4+column_5 = column_2 and
column_5 > 0     then 4   

else     0


end

 

 

My computer is a Pentium 4 – 2.4 GHZ and 1G RAM – so it
should be fast enough.

 

Any ideas?

 

Jonathan Blitz

 








--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.392 / Virus Database: 268.5.6/340 - Release Date: 05/15/2006
 


Re: [PERFORM] IMMUTABLE?

2006-05-16 Thread Christopher Kings-Lynne
Yes, but there are definitely programming cases where 
memoization/caching definitely helps. And it's easy to tell for a given 
function whether or not it really helps by simply trying it with CACHED 
and without.


Would this be a simple thing to implement?


It's called a "table" :)


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Speed Up Offset and Limit Clause

2006-05-16 Thread Christian Paul Cosinas
I am creating an application that gets the value of a large table and write
it to a file.

Why I want to use offset and limit is for me to create a threaded
application so that they will not get the same results.

For example:

Thread 1 : gets offset 0 limit 5000
Thread 2 : gets offset 5000 limit 5000
Thread 3 : gets offset 1 limit 5000

And so on...

Would there be any other faster way than what It thought?

-Original Message-
From: PFC [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 11, 2006 7:06 AM
To: Christian Paul Cosinas; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Speed Up Offset and Limit Clause


Why do you want to use it this way ?
Explain what you want to do, there probably is another faster
solution...

On Thu, 11 May 2006 16:45:33 +0200, Christian Paul Cosinas  
<[EMAIL PROTECTED]> wrote:

> Hi!
>
> How can I speed up my server's performance when I use offset and limit
> clause.
>
> For example I have a query:
> SELECT * FROM table ORDER BY id, name OFFSET 10 LIMIT 1
>
> This query takes a long time about more than 2 minutes.
>
> If my query is:
> SELECT * FROM table ORDER BY id, name OFFSET 5 LIMIT 1
> It takes about 2 seconds.
>
> Thanks
>
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] IMMUTABLE?

2006-05-16 Thread David Wheeler

On May 16, 2006, at 18:29, Christopher Kings-Lynne wrote:

Yes, but there are definitely programming cases where memoization/ 
caching definitely helps. And it's easy to tell for a given  
function whether or not it really helps by simply trying it with  
CACHED and without.

Would this be a simple thing to implement?


It's called a "table" :)


  http://www.justatheory.com/computers/databases/postgresql/ 
higher_order_plpgsql.html


Yes, I know. :-P But it'd be easier to have a CACHED keyword, of course.

Best,

David

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Speed Up Offset and Limit Clause

2006-05-16 Thread Craig A. James

Christian Paul Cosinas wrote:

I am creating an application that gets the value of a large table and write
it to a file.

Why I want to use offset and limit is for me to create a threaded
application so that they will not get the same results.

For example:

Thread 1 : gets offset 0 limit 5000
Thread 2 : gets offset 5000 limit 5000
Thread 3 : gets offset 1 limit 5000

And so on...

Would there be any other faster way than what It thought?


In order to return rows 1 to 15000, it must select all rows from zero to 
15000 and then discard the first 1 -- probably not what you were hoping for.

You might add a "thread" column.  Say you want to run ten threads:

  create sequence thread_seq 
increment by 1

minvalue 1 maxvalue 10
cycle
start with 1;

  create table mytable(
 column1integer,
 ... other columns..., 
 thread integer default nextval('thread_seq')

  );

  create bitmap index i_mytable_thread on mytable(thread);

Now whenever you insert into mytable, you get a value in mytable.thread between 
1 and 10, and it's indexed with a highly efficient bitmap index.  So your query 
becomes:

  Thread 1:  select ... from mytable where ... and thread = 1;
  Thread 2:  select ... from mytable where ... and thread = 2;
  ... and so forth.

Craig

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Adding and filling new column on big table

2006-05-16 Thread Michael Fuhr
On Wed, May 17, 2006 at 03:19:26AM +0200, Jonathan Blitz wrote:
> I have a table of about 500,000 rows. 
> 
> I need to add a new column and populate it.
> 
> So, I have tried to run the following command. The command never finishes (I
> gave up after about and hour and a half!).

If you install contrib/pgstattuple you can figure out how fast the
update is running.  Run "SELECT * FROM pgstattuple('mytable')" a
few times and note the rate at which dead_tuple_count is increasing.
If it's not increasing at all then query pg_locks and look for locks
where "granted" is false.

I created a test table, populated it with 500,000 rows of random
data, and ran the update you posted.  On a 500MHz Pentium III with
512M RAM and a SCSI drive from the mid-to-late 90s, running PostgreSQL
8.1.3 on FreeBSD 6.1, the update finished in just over two minutes.
The table had one index (the primary key).

> Note that none of the columns have indexes.

Do you mean that no columns in the table have indexes?  Or that the
columns referenced in the update don't have indexes but that other
columns do?  What does "\d mytable" show?  Do other tables have
foreign key references to this table?  What non-default settings
do you have in postgresql.conf?  What version of PostgreSQL are you
running and on what platform?  How busy is the system?  What's the
output of "EXPLAIN UPDATE mytable ..."?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Performance incorporate with JReport

2006-05-16 Thread kah_hang_ang




Hi,

Currently I'm using postgresql v8.1.3 and the latest jdbc.

I try to open a JReports' report and the time taken to completely open the
report is 137453ms.
Then I open the same report but this time I connect to postgresql v7.2.2
but the completion time is even faster than connect to postgresql v8.1.3
which took 15516ms to finish.

I try many times and the result is still the same.

So I think it might be compatibility problem between JReport & Postgresql
8.1.3 so i add in 'protocolVersion=2' in the connection string.
Then i open the same report again and this time it just as what i expected,
the execution time for the report become 6000ms only,
it is 20x times faster than previous test without 'protocolVersion=2'
option.

May I know what is the reason of this?
Is it because of the compatibility between JDBC driver with JReport?

Thanks!


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match