from:"Dimitri"

[PERFORM] Parallel Vacuum

2007-03-22 Thread Dimitri

Folks,

is there any constrains/problems/etc. to run several vacuum processes in
parallel while each one is 'vaccuming' one different table?

Example:

  vacuum -d db1 -t table1 &
  vacuum -d db1 -t table2 &
  vacuum -d db1 -t table3 &
  wait

(sorry if it was already asked, but I did not find an explicit
answer in archives)

Thanks for any inputs!

Rgds,
-Dimitri

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Parallel Vacuum

2007-03-22 Thread Dimitri

On Thursday 22 March 2007 14:52, Alvaro Herrera wrote:
> Dimitri escribió:
> > Folks,
> >
> > is there any constrains/problems/etc. to run several vacuum processes in
> > parallel while each one is 'vaccuming' one different table?
>
> No, no problem.  Keep in mind that if one of them takes a very long
> time, the others will not be able to remove dead tuples that were
> killed while the long vacuum was running -- unless you are in 8.2.

Yes, I'm using the last 8.2.3 version. So, will they *really* processing in 
parallel, or will block each other step by step?

Rgds,
-Dimitri

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Parallel Vacuum

2007-03-22 Thread Dimitri

On Thursday 22 March 2007 16:12, Alvaro Herrera wrote:
> Dimitri escribió:
> > On Thursday 22 March 2007 14:52, Alvaro Herrera wrote:
> > > Dimitri escribió:
> > > > Folks,
> > > >
> > > > is there any constrains/problems/etc. to run several vacuum processes
> > > > in parallel while each one is 'vaccuming' one different table?
> > >
> > > No, no problem.  Keep in mind that if one of them takes a very long
> > > time, the others will not be able to remove dead tuples that were
> > > killed while the long vacuum was running -- unless you are in 8.2.
> >
> > Yes, I'm using the last 8.2.3 version. So, will they *really* processing
> > in parallel, or will block each other step by step?
>
> They won't block.

Wow! Excellent! :)
So, in this case why not to add 'parallel' option integrated directly into 
the 'vacuumdb' command? 

In my case I have several CPU on the server and quite powerful storage box 
which is not really busy with a single vacuum. So, my idea is quite simple - 
speed-up vacuum with parallel execution (just an algorithm):

--
PLL=parallel_degree
select tab_size, tabname, dbname from ... order by tab_size desc;
  vacuumdb -d $dbname -t $tabname  2>&1 > /tmp/vac.$dbname.$tabname.log &
  while (pgrep vacuumdb | wc -l ) >= $PLL
   sleep 1
  end
end
wait
------

biggest tables are vacuumed first, etc.

But of course it will be much more cool to have something like:

   vacuumdb -a -P parallel_degree

What do you think? ;)

Rgds,
-Dimitri

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Parallel Vacuum

2007-03-22 Thread Dimitri

Mike, 

you're right until you're using a single disk :) 
Now, imagine you have more disks - more I/O operations you may perform, and 
you'll need also a CPU time to process them :)  until you fully use one CPU 
per 'vacuumdb' - and then you stop... 

As well, even in case when CPU is not highly used by vacuumdb - single process 
is still not able to get a max performance of the storage array, just because 
you need several concurrent I/O running in the system to reach max 
throughput. And even filesystem might help you here - it's not all... More 
concurrent writers you have - higher performance you reach (until real 
limit)...

In my case I have a small storage array capable to give you more than 
500MB/sec and say 5000 op/s. All my data are striped throw all array disks. 
Single 'vacuumdb' process here become more CPU-bound rather I/O as it cannot 
fully load storage array... So, more vacuum processes I start in parallel - 
faster I'll finish database vacuuming.

Best regards!
-Dimitri

On Thursday 22 March 2007 18:10, Michael Stone wrote:
> On Thu, Mar 22, 2007 at 04:55:02PM +0100, Dimitri wrote:
> >In my case I have several CPU on the server and quite powerful storage box
> >which is not really busy with a single vacuum. So, my idea is quite simple
> > - speed-up vacuum with parallel execution (just an algorithm):
>
> Vacuum is I/O intensive, not CPU intensive. Running more of them will
> probably make things slower rather than faster, unless each thing you're
> vacuuming has its own (separate) disks. The fact that your CPU isn't
> pegged while vacuuming suggests that your disk is already your
> bottleneck--and doing multiple sequential scans on the same disk will
> definitely be slower than doing one.
>
> Mike Stone
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Sunfire X4500 recommendations

2007-03-23 Thread Dimitri

On Friday 23 March 2007 03:20, Matt Smiley wrote:
> My company is purchasing a Sunfire x4500 to run our most I/O-bound
> databases, and I'd like to get some advice on configuration and tuning. 
> We're currently looking at: - Solaris 10 + zfs + RAID Z
>  - CentOS 4 + xfs + RAID 10
>  - CentOS 4 + ext3 + RAID 10
> but we're open to other suggestions.
>

Matt,

for Solaris + ZFS you may find answers to all your questions here:

  http://blogs.sun.com/roch/category/ZFS
  http://blogs.sun.com/realneel/entry/zfs_and_databases

Think to measure log (WAL) activity and use separated pool for logs if needed. 
Also, RAID-Z is more security-oriented rather performance, RAID-10 should be 
a better choice...

Rgds,
-Dimitri

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Parallel Vacuum

2007-03-23 Thread Dimitri

On Thursday 22 March 2007 19:46, Michael Stone wrote:
> On Thu, Mar 22, 2007 at 07:24:38PM +0100, Dimitri wrote:
> >you're right until you're using a single disk :)
> >Now, imagine you have more disks
>
> I do have more disks. I maximize the I/O performance by dedicating
> different sets of disks to different tables. YMMV. I do suggest watching
> your I/O rates and wallclock time if you try this to see if your
> aggregate is actually substantially faster than the single case. (I
> assume that you haven't yet gotten far enough to actually do performance
> testing.) You may also want to look into tuning your sequential I/O
> performance.
>
> Mike Stone

Mike, specially for you :)

Parallel Vacuum Test
==

- Database 'db_OBJ'
   PgSQL 8.2.3
   tables: object1, object2, ... object8 (all the same)
   volume: 10.000.000 rows in each table, 22GB in total

- Script Mono Vacuum
  $ cat vac_mono.sh
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8
  $

- Script Parallel Vacuum
  $ cat vac_pll.sh
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8 &
wait
  $


Test 1: Cold Clean database (already previously vacuumed)
=
 Scenario:
  - stop database
  - flush FS cache (umount/mount)
  - start database
  - execute vacuum script

$ time sh vac_mono.sh
real4m24.23s
user0m0.00s
sys 0m0.01s

$ time sh vac_pll.sh
real1m9.36s
user0m0.00s
sys 0m0.01s


Test 2: Hot Dirty database (modified and not vacuumed)
==
 Scenario:
  - stop database
  - flush FS cache (umount/mount)
  - start database
  - execute 200.000 updates against each from 8 object' tables
  - execute vacuum script

$ time sh vac_mono.sh
real9m36.90s
user0m0.00s
sys 0m0.01s

$ time sh vac_pll.sh
real2m10.41s
user0m0.00s
sys 0m0.02s


Speed-up x4 is obtained just because single vacuum process reaching max 
80MB/sec in throughput, while with 8 parallel vacuum processes I'm jumping to 
360MB/sec... And speakink about Sequential I/O: while you're doing read - 
file system may again prefetch incoming data in way once you reclaim next 
read - your data will be already in FS cache. However, file system 
cannot 'pre-write' data for you - so having more concurrent writers helps a 
lot! (Of course in case you have a storage configured to keep concurrent 
I/O :))

Well, why all this staff?...
Let's imagine once you need more performance, and you buy 10x times more 
performant storage box, will you still able to kill it with a single-process 
I/O activity? No... :)  To scale well you need to be able to split your work 
in several task executed in parallel. And personally, I'm very happy we can 
do it with vacuum now - the one of the most critical part of PostgreSQL...

Best regards!
-Dimitri

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Sunfire X4500 recommendations

2007-03-26 Thread Dimitri

On Friday 23 March 2007 14:32, Matt Smiley wrote:
> Thanks Dimitri!  That was very educational material!  I'm going to think
> out loud here, so please correct me if you see any errors.

Your mail is so long - I was unable to answer all questions same day :))

>
> The section on tuning for OLTP transactions was interesting, although my
> OLAP workload will be predominantly bulk I/O over large datasets of
> mostly-sequential blocks.

I supposed mostly READ operations, right?

>
> The NFS+ZFS section talked about the zil_disable control for making zfs
> ignore commits/fsyncs.  Given that Postgres' executor does single-threaded
> synchronous I/O like the tar example, it seems like it might benefit
> significantly from setting zil_disable=1, at least in the case of
> frequently flushed/committed writes.  However, zil_disable=1 sounds unsafe
> for the datafiles' filesystem, and would probably only be acceptible for
> the xlogs if they're stored on a separate filesystem and you're willing to
> loose recently committed transactions.  This sounds pretty similar to just
> setting fsync=off in postgresql.conf, which is easier to change later, so
> I'll skip the zil_disable control.

yes, you don't need it for PostgreSQL, it may be useful for other database 
vendors, but not here.

>
> The RAID-Z section was a little surprising.  It made RAID-Z sound just like
> RAID 50, in that you can customize the trade-off between iops versus usable
> diskspace and fault-tolerance by adjusting the number/size of
> parity-protected disk groups.  The only difference I noticed was that
> RAID-Z will apparently set the stripe size across vdevs (RAID-5s) to be as
> close as possible to the filesystem's block size, to maximize the number of
> disks involved in concurrently fetching each block.  Does that sound about
> right?

Well, look at RAID-Z just as wide RAID solution. More you have disks in your 
system - more high is probability you may loose 2 disks on the same time, and 
in this case wide RAID-10 will simply make loose you whole the data set (and 
again if you loose both disks in mirror pair). So, RAID-Z brings you more 
security as you may use wider parity, but the price for it is I/O 
performance...

>
> So now I'm wondering what RAID-Z offers that RAID-50 doesn't.  I came up
> with 2 things: an alleged affinity for full-stripe writes and (under
> RAID-Z2) the added fault-tolerance of RAID-6's 2nd parity bit (allowing 2
> disks to fail per zpool).  It wasn't mentioned in this blog, but I've heard
> that under certain circumstances, RAID-Z will magically decide to mirror a
> block instead of calculating parity on it.  I'm not sure how this would
> happen, and I don't know the circumstances that would trigger this
> behavior, but I think the goal (if it really happens) is to avoid the
> performance penalty of having to read the rest of the stripe required to
> calculate parity.  As far as I know, this is only an issue affecting small
> writes (e.g. single-row updates in an OLTP workload), but not large writes
> (compared to the RAID's stripe size).  Anyway, when I saw the filesystem's
> intent log mentioned, I thought maybe the small writes are converted to
> full-stripe writes by deferring their commit until a full stripe's worth of
> data had been accumulated.  Does that sound plausible?

The problem here that within the same workload you're able to do less I/O 
operations with RAID-Z then in RAID-10. So, bigger your I/O block size or 
smaller - you'll still obtain lower throughput, no? :)

>
> Are there any other noteworthy perks to RAID-Z, rather than RAID-50?  If
> not, I'm inclined to go with your suggestion, Dimitri, and use zfs like
> RAID-10 to stripe a zpool over a bunch of RAID-1 vdevs.  Even though many
> of our queries do mostly sequential I/O, getting higher seeks/second is
> more important to us than the sacrificed diskspace.

There is still one point to check: if you do mostly READ on your database 
probably RAID-Z will be not *too* bad and will give you more used space. 
However, if you need to update your data or load frequently - RAID-10 will be 
better...

>
> For the record, those blogs also included a link to a very helpful ZFS Best
> Practices Guide:
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

oh yes, it's constantly growing wiki, good start for any Solaris questions as 
well performance points :)

>
> To sum up, so far the short list of tuning suggestions for ZFS includes:
>  - Use a separate zpool and filesystem for xlogs if your apps write often.
>  - Consider setting zil_disable=1 on the xlogs' dedicated filesystem.  ZIL
> is the intent log, and it sounds like disabling it may be like d

Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris

2007-03-29 Thread Dimitri


>>
> Erik,
>
> using 'forcedirectio' simply brings your write operations to the
> *real* volume - means while you need to write 10 bytes you'll write 10
> bytes (instead of UFS block size (8K)). So it explains me why your
> write volume became slower.


I men 'lower' (not slower)



Sorry, that's not true.  Google "ufs forcedirectio" go to the first
link and you will find:

"forcedirectio

The forcedirectio (read "force direct IO") UFS option causes data to
be buffered in kernel address whenever data is transferred between
user address space and the disk. In other words, it bypasses the file
system cache. For certain types of applications -- primarily database
systems -- this option can dramatically improve performance. In fact,
some database experts have argued that a file using the forcedirectio
option will outperform a raw partition, though this opinion seems
fairly controversial.

The forcedirectio improves file system performance by eliminating
double buffering, providing a small, efficient code path for file
system reads and writes and removing pressure on memory."


Erik, please, don't take me wrong, but reading Google (or better man pages)
don't replace brain and basic practice... Direct IO option is not a silver
bullet which will solve all your problems (try to do 'cp' on the mounted in
'forcedirectio' filesystem, or use your mailbox on it - you'll quickly
understand impact)...



However, what this does mean is that writes will be at the actual
filesystem block size and not the cache block size (8K v. 512K).


while UFS filesystem mounted normally, it uses its own cache for all
operations (read and write) and saves data modifications on per
page basis, means: when a process writes 200 bytes there will be 200
bytes modified in cache, then whole page is written (8K) once data
demanded to be flushed (and WAL is writing per each commit)...

Now, mounted with 'forcedirectio' option UFS is free of page size constraint
and will write like a raw device an exactly demanded amount of data, means:
when a process writes 200 bytes it'll write exactly 200 bytes to the disk. For
WAL it may be very benefit, because you'll be able to perform more I/O
operations/sec, means more commit/sec. But on the same time it may
dramatically slow down all your read operations (no more data prefetch
nor dynamic cache)... The best solution probably is to separate WAL from
data (BTW, it'll be nice to have such an option as WAL_PATH in conf file),
it may be resolved by simple use of tablespace or at least directory links, etc.
But if your major activity is writing - probably it's already ok for you.

However, to understand TX number mystery I think the only possible solution
is to reproduce a small live test:

(I'm sure you're aware you can mount/unmount forcedirectio dynamically?)

during stable workload do:

  # mount -o remount,logging  /path_to_your_filesystem

and check if I/O volume is increasing as well TX numbers
than come back:

  # mount -o remount,forcedirectio  /path_to_your_filesystem

and see if I/O volume is decreasing as well TX numbers...

Best regards!
-Dimitri




>
> Now, why TX number is reduced - is a small mystery :)
>
> Options:
>   - you really do 10 times less commits, means you work 10 times
> slower? ;)
> what about users? how do you measure your work performance?

We are an email marketing service provider with a web front end
application.  We measure work performance via web requests (counts,
types, etc...), mailer activity and the resulting database activity.
We are doing as much or more work now than previously, and faster.

>
>   - TX reported in pg_* tables are not exact, but I don't believe
> at all :)

Even if they aren't exact, being off by a factor of 10 wouldn't be
believable.  the forcedirectio mount setting for ufs can definitely
explain the drop in data written volume, but doesn't do much to
explain the difference in xact commits.

erik jones <[EMAIL PROTECTED]>
software developer
615-296-0838
emma(r)






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris

2007-03-30 Thread Dimitri



You are right in that the page size constraint is lifted in that
directio cuts out the VM filesystem cache.  However, the Solaris
kernel still issues io ops in terms of its logical block size (which
we have at the default 8K).  It can issue io ops for fragments as
small as 1/8th of the block size, but Postgres issues its io requests
in terms of the block size which means that io ops from Postgres will
be in 8K chunks which is exactly what we see when we look at our
system io stats.  In fact, if any io request is made that isn't a
multiple of 512 bytes (the disk sector size), the file system
switches back to the buffered io.


Oh, yes, of course! yes, you still need to respect multiple of 512
bytes block size on read and write - sorry, I was tired :)

Then it's seems to be true - default XLOG block size is 8K, means for
every even small auto-committed transaction we should write 8K?... Is
there any reason to use so big default block size?...

Probably it may be a good idea to put it as 'initdb' parameter? and
have such value per database server?

Rgds,
-Dimitri



>
> However, to understand TX number mystery I think the only possible
> solution
> is to reproduce a small live test:
>
> (I'm sure you're aware you can mount/unmount forcedirectio
> dynamically?)
>
> during stable workload do:
>
>   # mount -o remount,logging  /path_to_your_filesystem
>
> and check if I/O volume is increasing as well TX numbers
> than come back:
>
>   # mount -o remount,forcedirectio  /path_to_your_filesystem
>
> and see if I/O volume is decreasing as well TX numbers...

That's an excellent idea and I'll run it by the rest of our team
tomorrow.

erik jones <[EMAIL PROTECTED]>
software developer
615-296-0838
emma(r)






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris

2007-03-30 Thread Dimitri

The problem is while your goal is to commit as fast as possible - it's
pity to vast I/O operation speed just keeping common block size...
Let's say if your transaction modification entering into 512K - you'll
be able to write much more 512K blocks per second rather 8K per second
(for the same amount of data)... Even we rewrite probably several
times the same block with incoming transactions - it still costs on
traffic, and we will process slower even H/W can do better. Don't
think it's good, no? ;)

Rgds,
-Dimitri

On 3/30/07, Erik Jones <[EMAIL PROTECTED]> wrote:

On Mar 30, 2007, at 8:14 AM, Dimitri wrote:

>>
>> You are right in that the page size constraint is lifted in that
>> directio cuts out the VM filesystem cache.  However, the Solaris
>> kernel still issues io ops in terms of its logical block size (which
>> we have at the default 8K).  It can issue io ops for fragments as
>> small as 1/8th of the block size, but Postgres issues its io requests
>> in terms of the block size which means that io ops from Postgres will
>> be in 8K chunks which is exactly what we see when we look at our
>> system io stats.  In fact, if any io request is made that isn't a
>> multiple of 512 bytes (the disk sector size), the file system
>> switches back to the buffered io.
>
> Oh, yes, of course! yes, you still need to respect multiple of 512
> bytes block size on read and write - sorry, I was tired :)
>
> Then it's seems to be true - default XLOG block size is 8K, means for
> every even small auto-committed transaction we should write 8K?... Is
> there any reason to use so big default block size?...
>
> Probably it may be a good idea to put it as 'initdb' parameter? and
> have such value per database server?

I believe it's because that is a pretty normal Unix kernal block size
and you want the two to match.

erik jones <[EMAIL PROTECTED]>
software developer
615-296-0838
emma(r)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [PERFORM] Shared buffers, db transactions commited, and write IO on Solaris

2007-04-03 Thread Dimitri

Folks,

to close topic with "LOG block size=1K" idea - I took a time to test
it (yes) and in best cases there is only 15% gain comparing to 8K -
storage protocol is quite heavy itself, so less or more data sent
within it doesn't reduce service time too much... As well even this
gain is quickly decreasing with growing workload! So, yes 8K is good
enough and probably the most optimal choice for LOG (as well data)
block size.

Rgds,
-Dimitri

Well, to check if there is a real potential gain all we need is a
small comparing test using PgSQL compiled with LOG block size equal to
say 1K and direct IO enabled.

Rgds,
-Dimitri

On 3/30/07, Kenneth Marshall <[EMAIL PROTECTED]> wrote:
> On Fri, Mar 30, 2007 at 04:25:16PM +0200, Dimitri wrote:
> > The problem is while your goal is to commit as fast as possible - it's
> > pity to vast I/O operation speed just keeping common block size...
> > Let's say if your transaction modification entering into 512K - you'll
> > be able to write much more 512K blocks per second rather 8K per second
> > (for the same amount of data)... Even we rewrite probably several
> > times the same block with incoming transactions - it still costs on
> > traffic, and we will process slower even H/W can do better. Don't
> > think it's good, no? ;)
> >
> > Rgds,
> > -Dimitri
> >
> With block sizes you are always trading off overhead versus space
> efficiency. Most OS write only in 4k/8k to the underlying hardware
> regardless of the size of the write you issue. Issuing 16 512byte
> writes has much more overhead than 1 8k write. On the light transaction
> end, there is no real benefit to a small write and it will slow
> performance for high throughput environments. It would be better to,
> and I think that someone is looking into, batching I/O.
>
> Ken
>

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[PERFORM] Equivalents in PostgreSQL of MySQL's "ENGINE=MEMORY" "MAX_ROWS=1000"

2007-04-04 Thread Dimitri

Probably another helpful solution may be to implement:

  ALTER TABLE LOGGING OFF/ON;

just to disable/enable WAL?

First it may help in all cases of intensive data load while you slow
down other sessions with increasing WAL activity.
Then you have a way to implement MEMORY-like tables on RAM disk
tablespace (well, you still need to take care to drop them
auto-manually :))

However, if we speak about performance of MEMORY table - it should be
much better in Tom's solution with big temp buffers rather RAM disk...
The strong point in implementation of MEMORY table is it *knows* it
sits in RAM! and it changes completely all I/O kind logic...

BTW, before NDB was bough by MySQL we done a benchmark to rich a
highest possible TPS numbers with it. We got 1.500.000 TPS(!) (yes,
one million and half per second!) knowing all current TPC records are
measured in thousands of transactions per minute - you see impact...

And of course for my education I tried to do the same with other
database vendors running only SELECT queries and placing tablespaces
on RAM disk... After trying all possible combinations I was still
*very* far :))

MEMORY databases is something like a parallel world, very interesting,
but very different :))

Rgds,
-Dimitri

On 4/3/07, A.M. <[EMAIL PROTECTED]> wrote:

On Apr 3, 2007, at 16:00 , Alan Hodgson wrote:

> On Tuesday 03 April 2007 12:47, "A.M."
> <[EMAIL PROTECTED]> wrote:
>> On Apr 3, 2007, at 15:39 , C. Bergström wrote:
>> I would like to use transactional semantics over tables that can
>> disappear whenever the server fails. memcached does not offer that.
>
> How would temporary tables?

The only difference between temporary tables and standard tables is
the WAL. Global temporary tables would be accessible by all sessions
and would be truncated on postmaster start. For a further potential
speed boost, global temp tables could be put in a ramdisk tablespace.

Well, that's at least how I envision them.

Cheers,
M
---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Equivalents in PostgreSQL of MySQL's "ENGINE=MEMORY" "MAX_ROWS=1000"

2007-04-05 Thread Dimitri


Wow, it's excellent!  :))

probably the next step is:

  ALTER TABLE CACHE ON/OFF;

just to force keeping any table in the cache. What do you think?...

Rgds,
-Dimitri

On 4/5/07, Josh Berkus  wrote:

Dimitri,

> Probably another helpful solution may be to implement:
>
>ALTER TABLE LOGGING OFF/ON;
>
> just to disable/enable WAL?

Actually, a patch similar to this is currently in the queue for 8.3.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Bad RAID1 read performance

2007-05-30 Thread Dimitri

As there is no 'continuous space' option on ext3/ext2 (or probably "-f
fragment_size" may do a trick?) - I think after some filesystem
activity you simply loose continuous space allocation and rather
expected sequential reading may be transformed into random seeking of
'logically' sequentual blocks...

Rgds,
-Dimitri

On 5/30/07, Albert Cervera Areny <[EMAIL PROTECTED]> wrote:

Hardware isn't very good I believe, and it's about 2-3 years old, but the
RAID
is Linux software, and though not very good the difference between reading
and writing should probably be greater... (?)

Would you set 512Kb readahead on both drives and RAID? I tried various
configurations and none seemed to make a big difference. It seemed correct
to
me to set 512kb per drive and 1024kb for md0.

A Dimecres 30 Maig 2007 16:09, Luke Lonergan va escriure:
> This sounds like a bad RAID controller - are you using a built-in hardware
> RAID?  If so, you will likely want to use Linux software RAID instead.
>
> Also - you might want to try a 512KB readahead - I've found that is
optimal
> for RAID1 on some RAID controllers.
>
> - Luke
>
> On 5/30/07 2:35 AM, "Albert Cervera Areny" <[EMAIL PROTECTED]> wrote:
> > Hi,
> > after doing the "dd" tests for a server we have at work I obtained:
> > Read: 47.20 Mb/s
> > Write: 39.82 Mb/s
> > Some days ago read performance was around 20Mb/s due to no readahead in
> > md0 so I modified it using hdparm. However, it seems to me that being it
> > a RAID1 read speed could be much better. These are SATA disks with 3Gb
of
> > RAM so I did 'time bash -c "dd if=/dev/zero of=bigfile bs=8k
count=786432
> > && sync"'. File system is ext3 (if read many times in the list that XFS
> > is faster), but I don't want to change the file system right now.
> > Modifing the readahead from the current 1024k to 2048k doesn't make any
> > difference. Are there any other tweaks I can make?
> >
> >
> > ---(end of broadcast)---
> > TIP 4: Have you searched our list archives?
> >
> >http://archives.postgresql.org
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend

--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

  AVISO LEGAL  
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurarni  la   confidencialidad   de   los  mensajes
nisucorrecta recepción.   Enel  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.

... DISCLAIMER .
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individualsender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internete-mail,pleaseinform usinmmediately.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[PERFORM] Some info to share: db_STRESS Benchmark results

2007-05-31 Thread Dimitri


Folks,

just wanted to share some benchmark results from one long performance
study comparing MySQL, PostgreSQL and Oracle transactions throughput
and engine scalability on T2000 and V890 (under Solaris). Oracle
results are removed (of course :), but other are quite interesting...
Findings are presented as it, following step by step learning and
tuning curve :)

So well, you may find:
-  http://dimitrik.free.fr/db_STRESS.html  - Benchmark kit description
-  http://dimitrik.free.fr/db_STRESS_BMK_Part1.html  -- first main part
-  http://dimitrik.free.fr/db_STRESS_BMK_Part2_ZFS.html  -- second
part including ZFS specific tuning

Tests were executed in Mar/Apr.2007 with latest v8.2.3 on that time.
Due limited spare time I was able to publish results only now...
Any comments are welcome! :)

Best regards!
-Dimitri

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Some info to share: db_STRESS Benchmark results

2007-05-31 Thread Dimitri

Well, let's say I want to have compact graphs  :)

So, few comments on graphs:
 - Title: compact name of test and execution conditions
 - X-axis: is always representing time scale
 - Y-axis: is showing a value level (whatever)
 - Legend: gives you a value Name and its metric (KB/s, Op/s, TPS, etc)

TPS: (transactions per second)
 - ALL-tps TR_all: all transactions (READ+WRITE) per second level
 - ALL-tps TR_Read: only READ tps level
 - ALL-tps TR_Write: only WRITE tps level

I must say I was more intrested by databases tuning rather documenting
each my step... But well, without documenting there is no result :)
As well I did not think to compare database initially (don't know why
but it's always starting a small war between DB vendors :)), but
results were so surprising so I just continued until it was possible
:))

Rgds,
-Dimitri

On 5/31/07, Alexander Staubo <[EMAIL PROTECTED]> wrote:

On 5/31/07, Dimitri <[EMAIL PROTECTED]> wrote:
> just wanted to share some benchmark results from one long performance
> study comparing MySQL, PostgreSQL and Oracle transactions throughput
> and engine scalability on T2000 and V890 (under Solaris).

Interesting, if awfully cryptic. The lack of axis labels, the lack of
axis normalization, and the fact that you put the graphs for different
databases and parameters on separate pages makes it rather hard to
compare the various results.

Alexander.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Autodetect of software RAID1+0 fails

2007-06-01 Thread Dimitri


Craig,

to make things working properly here you need to create a config file
keeping both raid1 and raid0 information (/etc/mdadm/mdadm.conf).
However if your root filesystem is corrupted, or you loose this file,
or move disks somewhere else - you are back to the same initial issue
:))

So, the solution I've found 100% working in any case is: use mdadm to
create raid1 devices (as you do already) and then use LVM to create
raid0 volume on it - LVM writes its own labels on every MD devices and
will find its volumes peaces automatically! Tested for crash several
times and was surprised by its robustness :))

Rgds,
-Dimitri

On 6/1/07, Craig James <[EMAIL PROTECTED]> wrote:

Apologies for a somewhat off-topic question, but...

The Linux kernel doesn't properly detect my software RAID1+0 when I boot up.
 It detects the two RAID1 arrays, the partitions of which are marked
properly.  But it can't find the RAID0 on top of that, because there's no
corresponding device to auto-detect.  The result is that it creates /dev/md0
and /dev/md1 and assembles the RAID1 devices on bootup, but /dev/md2 isn't
created, so the RAID0 can't be assembled at boot time.

Here's what it looks like:

$ cat /proc/mdstat
Personalities : [raid0] [raid1]
md2 : active raid0 md0[0] md1[1]
  234436224 blocks 64k chunks

md1 : active raid1 sde1[1] sdc1[2]
  117218176 blocks [2/2] [UU]

md0 : active raid1 sdd1[1] sdb1[0]
  117218176 blocks [2/2] [UU]

$ uname -r
2.6.12-1.1381_FC3

After a reboot, I always have to do this:

  mknod /dev/md2 b 9 2
  mdadm --assemble /dev/md2 /dev/md0 /dev/md1
  mount /dev/md2

What am I missing here?

Thanks,
Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Data transfer very slow when connected via DSL

2007-06-21 Thread Dimitri

Hi Rainer,

but did you try to execute your query directly from 'psql' ?...

Why I'm asking: seems to me your case is probably just network latency
dependent, and what I noticed during last benchmarks with PostgreSQL
the SELECT query become very traffic hungry if you are using CURSOR.
Program 'psql' is implemented to not use CURSOR by default, so it'll
be easy to check if you're meeting this issue or not just by executing
your query remotely from 'psql'...

Rgds,
-Dimitri

On 6/21/07, Rainer Bauer <[EMAIL PROTECTED]> wrote:

Hello Tom,

>I seem to recall that we've seen similar reports before, always
>involving Windows :-(.  Check whether you have any nonstandard
>components hooking into the network stack on that machine.

I just repeated the test by booting into "Safe Mode with Network Support",
but
the results are the same. So I don't think that's the cause.

Apart from that, what response times could I expect?

Rainer

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Data transfer very slow when connected via DSL

2007-06-21 Thread Dimitri

Let's stay optimist - at least now you know the main source of your problem! :))

Let's see now with CURSOR...

Firstly try this:
munnin=>\timing
munnin=>\set FETCH_COUNT 1;
munnin=>select * from "tblItem";

what's the time you see here? (I think your application is working in
this manner)

Now, change the FETCH_COUNT to 10, then 50, then 100 - your query
execution time should be better (at least I hope so :))

And if it's better - you simply need to modify your FETCH clause with
adapted "FORWARD #" value (the best example is psql source code
itself, you may find ExecQueryUsingCursor function implementation
(file common.c))...

Rgds,
-Dimitri

On 6/22/07, Rainer Bauer <[EMAIL PROTECTED]> wrote:

Hello Dimitri,

>but did you try to execute your query directly from 'psql' ?...

munnin=>\timing
munnin=>select * from "tblItem";

(50 rows)
Time: 391,000 ms

>Why I'm asking: seems to me your case is probably just network latency
>dependent, and what I noticed during last benchmarks with PostgreSQL
>the SELECT query become very traffic hungry if you are using CURSOR.
>Program 'psql' is implemented to not use CURSOR by default, so it'll
>be easy to check if you're meeting this issue or not just by executing
>your query remotely from 'psql'...

Yes, see also my other post.

Unfortunatelly this means that using my program to connect via DSL to the
Postgres database is not possible.

Rainer

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Data transfer very slow when connected via DSL

2007-06-22 Thread Dimitri

Tom,

seems to me the problem here is rather simple: current issue depends
completely on the low level 'implementation' of SELECT query in the
application. In case it's implemented with using of "DECLARE ...
CURSOR ..." and then "FETCH NEXT" by default (most common case) it
brings application into "ping-pong condition" with database server:
each next FETCH is possible only if the previous one is finished and
server received feedback from client with explicit fetch next order.
In this condition query response time became completely network
latency dependent:
  - each packet send/receive has a significant cost
  - you cannot reduce this cost as you cannot group more data within
a single packet and you waste your traffic
  - that's why TCP_NODELAY become so important here
  - with 150ms network latency the cost is ~300ms per FETCH (15sec(!)
for 50 lines)

You may think if you're working in LAN and your network latency is
0.1ms you're not concerned by this issue - but in reality yes, you're
impacted!  Each network card/driver has it's own max packet/sec
traffic capability (independent to volume) and once you hit it - your
response time may only degrade with more concurrent sessions (even if
your CPU usage is still low)...

The solution here is simple:
  - don't use CURSOR in simple cases when you just reading/printing a
SELECT results
  - in case it's too late to adapt your code or you absolutely need
CURSOR for some reasons: replace default "FETCH" or "FETCH NEXT" by
"FETCH 100"  (100 rows generally will be enough) normally it'll work
just straight forward (otherwise check you're verifying PQntuples()
value correctly and looping to read all tuples)

To keep default network workload more optimal, I think we need to
bring "FETCH N" more popular for developers and enable it (even
hidden) by default in any ODBC/JDBC and other generic modules...

Rgds,
-Dimitri

On 6/22/07, Tom Lane <[EMAIL PROTECTED]> wrote:

Rainer Bauer <[EMAIL PROTECTED]> writes:
> Fetching the 50 rows takes 12 seconds (without logging 8 seconds) and
> examining the log I found what I suspected: the performance is directly
> related to the ping time to the server since fetching one tuple requires a
> round trip to the server.

Hm, but surely you can get it to fetch more than one row at once?

This previous post says that someone else solved an ODBC
performance problem with UseDeclareFetch=1:
http://archives.postgresql.org/pgsql-odbc/2006-08/msg00014.php

It's not immediately clear why pgAdmin would have the same issue,
though, because AFAIK it doesn't rely on ODBC.

I just finished looking through our archives for info about
Windows-specific network performance problems.  There are quite a few
threads, but the ones that were solved seem not to bear on your problem
(unless the one above does).  I found one pretty interesting thread
suggesting that the problem was buffer-size dependent:
http://archives.postgresql.org/pgsql-performance/2006-12/msg00269.php
but that tailed off with no clear resolution.  I think we're going to
have to get someone to watch the problem with a packet sniffer before
we can get much further.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Data transfer very slow when connected via DSL

2007-06-22 Thread Dimitri

Rainer, but did you try initial query with FETCH_COUNT equal to 100?...

Rgds,
-Dimitri

On 6/22/07, Rainer Bauer <[EMAIL PROTECTED]> wrote:

Hello Dimitri,

>Let's stay optimist - at least now you know the main source of your
problem! :))
>
>Let's see now with CURSOR...
>
>Firstly try this:
>munnin=>\timing
>munnin=>\set FETCH_COUNT 1;
>munnin=>select * from "tblItem";
>
>what's the time you see here? (I think your application is working in
>this manner)

That's it! It takes exactly 8 seconds like my program.

I retrieve the data through a bound column:
SELECT * FROM tblItem WHERE intItemIDCnt = ?

After converting this to
SELECT * FROM tblItem WHERE intItemIDCnt IN (...)
the query is as fast as psql: 409ms

So the problem is identified and the solution is to recode my application.

Rainer

PS: When enabling UseDeclareFetch as suggested by Tom then the runtime is
still three times slower: 1192ms. But I guess that problem is for the ODBC
list.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Data transfer very slow when connected via DSL

2007-06-22 Thread Dimitri


Hello Rainer,

initially I was surprised you did not match non-CURSOR time with FETCH
100, but then thinking little bit the explanation is very simple -
let's analyze what's going in both cases:

Without CURSOR:
 1.) app calls PQexec() with "Query" and waiting for the result
 2.) PG sends the result to app, data arriving grouped into max
possible big packets, network latency is hidden by huge amount per
single send

With CURSOR and FETCH 100:
 1.) app calls PQexec() with "BEGIN" and waiting
 2.) PG sends ok
 3.) app calls PQexec() with  "DECLARE cursor for Query" and waiting
 4.) PG sends ok
 5.) app calls PQexec() with "FETCH 100" and waiting
 6.) PG sends the result of 100 rows to app, data arriving grouped
into max possible big packets, network latency is hidden by huge data
amount per single send
 7.) no more data (as you have only 50 rows in output) and app calls
PQexec() with "CLOSE cursor" and waiting
 8.) PG sends ok
 9.) app calls PQexec() with "COMMIT" and waiting
 10.) PG sends ok

as you see the difference is huge, and each step add your network
latency delay. So, with "FETCH 100" we save only cost of steps 5 and 6
(default "FETCH 1" will loop here for all 50 rows adding 50x times
latency delay again). But we cannot solve cost of other steps as they
need to be executed one by one to keep execution logic and clean error
handling...

Hope it's more clear now and at least there is a choice :))
As well, if your query result will be 500 (for ex.) I think the
difference will be less important between non-CURSOR and "FETCH 500"
execution...

Rgds,
-Dimitri


On 6/22/07, Rainer Bauer <[EMAIL PROTECTED]> wrote:

Hello Dimitri,

>Rainer, but did you try initial query with FETCH_COUNT equal to 100?...

Yes I tried it with different values and it's like you suspected:

FETCH_COUNT  1 Time: 8642,000 ms
FETCH_COUNT  5 Time: 2360,000 ms
FETCH_COUNT 10 Time: 1563,000 ms
FETCH_COUNT 25 Time: 1329,000 ms
FETCH_COUNT 50 Time: 1140,000 ms
FETCH_COUNT 100 Time: 969,000 ms

\unset FETCH_COUNT Time: 390,000 ms

Rainer

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Data transfer very slow when connected via DSL

2007-06-22 Thread Dimitri

Rainer, seeking psqlODBC code source it seems to work in similar way
and have an option "SQL_ROWSET_SIZE" to execute FETCH query in the
same way as "FETCH_COUNT" in psql. Try to set it to 100 and let's see
if it'll be better...

Rgds,
-Dimitri

On 6/22/07, Rainer Bauer <[EMAIL PROTECTED]> wrote:

Hello Joshua,

>That opens up some questions. What ODBC driver are you using (with exact
>version please).

psqlODBC 8.2.4.2 (build locally).

I have restored the 8.2.4.0 from the official msi installer, but the results
are the same.

Rainer

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] Filesystem Direct I/O and WAL sync option

2007-07-03 Thread Dimitri


All,

I'm very curious to know if we may expect or guarantee any data
consistency with WAL sync=OFF but using file system mounted in Direct
I/O mode (means every write() system call called by PG really writes
to disk before return)...

So may we expect data consistency:
  - none?
  - per checkpoint basis?
  - full?...

Thanks a lot for any info!

Rgds,
-Dimitri

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Filesystem Direct I/O and WAL sync option

2007-07-03 Thread Dimitri

Yes, disk drives are also having cache disabled or having cache on
controllers and battery protected (in case of more  high-level
storage) - but is it enough to expect data consistency?... (I was
surprised about checkpoint sync, but does it always calls write()
anyway? because in this way it should work without fsync)...

On 7/3/07, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:

Dimitri wrote:
> I'm very curious to know if we may expect or guarantee any data
> consistency with WAL sync=OFF but using file system mounted in Direct
> I/O mode (means every write() system call called by PG really writes
> to disk before return)...

You'd have to turn that mode on on the data drives as well to get
consistency, because fsync=off disables checkpoint fsyncs of the data
files as well.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [PERFORM] Filesystem Direct I/O and WAL sync option

2007-07-04 Thread Dimitri

Yes Gregory, that's why I'm asking, because from 1800 transactions/sec
I'm jumping to 2800 transactions/sec!  and it's more than important
performance level increase :))

Rgds,
-Dimitri

On 7/4/07, Gregory Stark <[EMAIL PROTECTED]> wrote:

"Dimitri" <[EMAIL PROTECTED]> writes:

> Yes, disk drives are also having cache disabled or having cache on
> controllers and battery protected (in case of more  high-level
> storage) - but is it enough to expect data consistency?... (I was
> surprised about checkpoint sync, but does it always calls write()
> anyway? because in this way it should work without fsync)...

Well if everything is mounted in sync mode then I suppose you have the same
guarantee as if fsync were called after every single write. If that's true
then surely that's at least as good. I'm curious how it performs though.

Actually it seems like in that configuration fsync should be basically
zero-cost. In other words, you should be able to leave fsync=on and get the
same performance (whatever that is) and not have to worry about any risks.

--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Filesystem Direct I/O and WAL sync option

2007-07-05 Thread Dimitri

Gregory, thanks for good questions! :))
I got more lights on my throughput here :))

The running OS is Solaris9 (customer is still not ready to upgrade to
Sol10), and I think the main "sync" issue is coming from the old UFS
implementation... UFS mounted with 'forcedirectio' option uses
different "sync" logic as well accepting concurrent writing to the
same file which is giving here a higher performance level. I did not
expect really so big gain, so did not think to replay the same test
with direct I/O on and fsync=on too. For my big surprise - it also
reached 2800 tps as with fsync=off !!! So, initial question is no more
valid :))

As well my tests are executed just to validate server + storage
capabilities, and honestly it's really pity to see them used under old
Solaris version :))
but well, at least we know what kind of performance they may expect
currently, and think about migration before the end of this year...

Seeing at least 10.000 random writes/sec on storage sub-system during
live database test was very pleasant to customer and make feel them
comfortable for their production...

Thanks a lot for all your help!

Best regards!
-Dimitri

On 7/4/07, Gregory Stark <[EMAIL PROTECTED]> wrote:

"Dimitri" <[EMAIL PROTECTED]> writes:

> Yes Gregory, that's why I'm asking, because from 1800 transactions/sec
> I'm jumping to 2800 transactions/sec!  and it's more than important
> performance level increase :))

wow. That's kind of suspicious though. Does the new configuration take
advantage of the lack of the filesystem cache by increasing the size of
shared_buffers? Even then I wouldn't expect such a big boost unless you got
very lucky with the size of your working set compared to the two sizes of
shared_buffers.

It seems likely that somehow this change is not providing the same
guarantees
as fsync. Perhaps fsync is actually implementing IDE write barriers and the
sync mode is just flushing buffers to the hard drive cache and then
returning.

What transaction rate do you get if you just have a single connection
streaming inserts in autocommit mode? What kind of transaction rate do you
get
with both sync mode on and fsync=on in Postgres?

And did you say this with a battery backed cache? In theory fsync=on/off and
shouldn't make much difference at all with a battery backed cache. Stranger
and stranger.

--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Filesystem Direct I/O and WAL sync option

2007-07-10 Thread Dimitri

Yes, I tried all WAL sync methods, but there was no difference...
However, there was a huge difference when I run the same tests under
Solaris10 - 'fdatasync' option gave the best performance level. On the
same time direct I/O did not make difference on Solaris 10 :)

So the main rule - there is no universal rule :)
just adapt system options according your workload...

Direct I/O will generally speed-up write operation due avoiding buffer
flashing overhead as well concurrent writing (breaking POSIX
limitation of single writer per given file on the same time). But on
the same time it may slow-down your read operations, and you may need
64bit PG version to use big cache to still keep same performance level
on SELECT queries. And again, there are other file systems like QFS
(for ex.) which may give you the best of both worlds: direct write and
buffered read on the same time! etc. etc. etc. :)

Rgds,
-Dimitri

On 7/9/07, Jonah H. Harris <[EMAIL PROTECTED]> wrote:

On 7/9/07, Jim C. Nasby <[EMAIL PROTECTED]> wrote:
> BTW, it might be worth trying the different wal_sync_methods. IIRC,
> Jonah's seen some good results from open_datasync.

On Linux, using ext3, reiser, or jfs, I've seen open_sync perform
quite better than fsync/fdatasync in most of my tests.  But, I haven't
done significant testing with direct I/O lately.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-30 Thread Dimitri

Luke,

ZFS tuning is not coming from general suggestion ideas, but from real
practice...

So,
  - limit ARC is the MUST for the moment to keep your database running
comfortable (specially DWH!)
  - 8K blocksize is chosen to read exactly one page when PG ask to
read one page - don't mix it with prefetch! when prefetch is detected,
ZFS will read next blocks without any demand from PG; but otherwise
why you need to read more  pages each time PG asking only one?...
  - prefetch of course not needed for OLTP, but helps on OLAP/DWH, agree :)

Rgds,
-Dimitri


On 7/22/07, Luke Lonergan <[EMAIL PROTECTED]> wrote:
> Josh,
>
> On 7/20/07 4:26 PM, "Josh Berkus" <[EMAIL PROTECTED]> wrote:
>
> > There are some specific tuning parameters you need for ZFS or performance
> > is going to suck.
> >
> > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> > (scroll down to "PostgreSQL")
> > http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
> > http://bugs.opensolaris.org/view_bug.do?bug_id=6437054
> >
> > You also don't say anything about what kind of workload you're running.
>
>
> I think we're assuming that the workload is OLTP when putting these tuning
> guidelines forward.  Note that the ZFS tuning guidance referred to in this
> bug article recommend "turning vdev prefetching off" for "random I/O
> (databases)".  This is exactly the opposite of what we should do for OLAP
> workloads.
>
> Also, the lore that setting recordsize on ZFS is mandatory for good database
> performance is similarly not appropriate for OLAP work.
>
> If the workload is OLAP / Data Warehousing, I'd suggest ignoring all of the
> tuning information from Sun that refers generically to "database".  The
> untuned ZFS performance should be far better in those cases.  Specifically,
> these three should be ignored:
> - (ignore this) limit ARC memory use
> - (ignore this) set recordsize to 8K
> - (ignore this) turn off vdev prefetch
>
> - Luke
>
>
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
>

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-31 Thread Dimitri

Hi Luke,

On the same page of Solaris internals wiki you may find links to the
study with db_STRESS benchmark (done on UFS and ZFS with PostgreSQL,
MySQL and Oracle (well, Oracle results are removed, but at least I may
say it entered into the same tuning as PgSQL). Tests were done on
Sol10u3 (as well you may find any other platform details in report
document)...

Also, if block size adjustment  is less or more transparent (don't
read 32K if you need only 8K - with huge data volume you'll simply
waste your cache; in  case you're doing full scan - leave prefetch
algorithm to work for you); probably ARC (cache) limitation need more
light. Well, I even cannot say there is any problem, etc. with it - it
just has too much aggressive implementation :)) If all your running
programs fitting into 1GB of RAM - you may leave ARC size by default
(leaves 1GB free of system RAM). Otherwise, you should limit ARC to
keep your workload execution comfortable: ARC allocating memory very
quickly and every time your program need more RAM - it entering into
concurrency with ARC... In my tests I observed short workload freezes
during such periods and I did not like it too much :)) specially with
high connection numbers :))

well, we may spend hours to discuss :)  (sorry to be short, I have a
very limited mail access for the moment)...

However, ZFS is improving all the time and works better and better
with every Solaris release, so probably all current tuning will be
different or obsolete at the end of this year :))

BTW, forgot to mention, you'll need Solaris 10u4 or at least 10u3 but
with all recent patches applied to run M8000 on full power.

Best regards!
-Dimitri

On 7/30/07, Luke Lonergan <[EMAIL PROTECTED]> wrote:
> Hi Dimitri,
>
> Can you post some experimental evidence that these settings matter?
>
> At this point we have several hundred terabytes of PG databases running on
> ZFS, all of them setting speed records for data warehouses.
>
> We did testing on these settings last year on S10U2, perhaps things have
> changed since then.
>
> - Luke
>
> Msg is shrt cuz m on ma treo
>
>  -Original Message-
> From: Dimitri [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 30, 2007 05:26 PM Eastern Standard Time
> To:   Luke Lonergan
> Cc:   Josh Berkus; pgsql-performance@postgresql.org; Marc Mamin
> Subject:  Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
>
> Luke,
>
> ZFS tuning is not coming from general suggestion ideas, but from real
> practice...
>
> So,
>   - limit ARC is the MUST for the moment to keep your database running
> comfortable (specially DWH!)
>   - 8K blocksize is chosen to read exactly one page when PG ask to
> read one page - don't mix it with prefetch! when prefetch is detected,
> ZFS will read next blocks without any demand from PG; but otherwise
> why you need to read more  pages each time PG asking only one?...
>   - prefetch of course not needed for OLTP, but helps on OLAP/DWH, agree :)
>
> Rgds,
> -Dimitri
>
>
> On 7/22/07, Luke Lonergan <[EMAIL PROTECTED]> wrote:
> > Josh,
> >
> > On 7/20/07 4:26 PM, "Josh Berkus" <[EMAIL PROTECTED]> wrote:
> >
> > > There are some specific tuning parameters you need for ZFS or
> performance
> > > is going to suck.
> > >
> > > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> > > (scroll down to "PostgreSQL")
> > > http://www.sun.com/servers/coolthreads/tnb/applications_postgresql.jsp
> > > http://bugs.opensolaris.org/view_bug.do?bug_id=6437054
> > >
> > > You also don't say anything about what kind of workload you're running.
> >
> >
> > I think we're assuming that the workload is OLTP when putting these tuning
> > guidelines forward.  Note that the ZFS tuning guidance referred to in this
> > bug article recommend "turning vdev prefetching off" for "random I/O
> > (databases)".  This is exactly the opposite of what we should do for OLAP
> > workloads.
> >
> > Also, the lore that setting recordsize on ZFS is mandatory for good
> database
> > performance is similarly not appropriate for OLAP work.
> >
> > If the workload is OLAP / Data Warehousing, I'd suggest ignoring all of
> the
> > tuning information from Sun that refers generically to "database".  The
> > untuned ZFS performance should be far better in those cases.
> Specifically,
> > these three should be ignored:
> > - (ignore this) limit ARC memory use
> > - (ignore this) set recordsize to 8K
> > - (ignore this) turn off vdev prefetch
> >
> > - Luke
> >
> >
> >
> > ---(end of broadcast)---
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >choose an index scan if your joining column's datatypes do not
> >match
> >
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] optimize query with a maximum(date) extraction

2007-09-08 Thread Dimitri

BTW, will it improve something if you change your index to "my_table(
id, the_date )"?

Rgds,
-Dimitri

On 9/5/07, JS Ubei <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I need to improve a query like :
>
> SELECT id, min(the_date), max(the_date) FROM my_table GROUP BY id;
>
> Stupidly, I create a B-tree index on my_table(the_date), witch is logically
> not used in my query, because it's not with a constant ? isn't it ?
>
> I know that I can't create a function index with an aggregative function.
>
> How I can do ?
>
> thanks,
>
> jsubei
>
>
>
>
>
> _
> Ne gardez plus qu'une seule adresse mail ! Copiez vos mails vers Yahoo! Mail
>
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] 8K recordsize bad on ZFS?

2010-05-09 Thread Dimitri

Josh,

it'll be great if you explain how did you change the records size to
128K? - as this size is assigned on the file creation and cannot be
changed later - I suppose that you made a backup of your data and then
process a full restore.. is it so?

Rgds,
-Dimitri


On 5/8/10, Josh Berkus  wrote:
> Jignesh, All:
>
> Most of our Solaris users have been, I think, following Jignesh's advice
> from his benchmark tests to set ZFS page size to 8K for the data zpool.
>  However, I've discovered that this is sometimes a serious problem for
> some hardware.
>
> For example, having the recordsize set to 8K on a Sun 4170 with 8 drives
> recently gave me these appalling Bonnie++ results:
>
> Version  1.96   --Sequential Output-- --Sequential Input-
> --Random-
> Concurrency   4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> db111   24G   260044  33 62110  17   89914  15
> 1167  25
> Latency6549ms4882ms  3395ms
> 107ms
>
> I know that's hard to read.  What it's saying is:
>
> Seq Writes: 260mb/s combined
> Seq Reads: 89mb/s combined
> Read Latency: 3.3s
>
> Best guess is that this is a result of overloading the array/drives with
> commands for all those small blocks; certainly the behavior observed
> (stuttering I/O, latency) is in line with that issue.
>
> Anyway, since this is a DW-like workload, we just bumped the recordsize
> up to 128K and the performance issues went away ... reads up over 300mb/s.
>
> --
>   -- Josh Berkus
>  PostgreSQL Experts Inc.
>  http://www.pgexperts.com
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] 8K recordsize bad on ZFS?

2010-05-10 Thread Dimitri

As I said, the record size is applied on the file creation :-)
so by copying your data from one directory to another one you've made
the new record size applied on the newly created files :-)  (equal to
backup restore if there was not enough space)..

Did you try to redo the same but still keeping record size equal 8K ? ;-)

I think the problem you've observed is simply related to the
copy-on-write nature of ZFS - if you bring any modification to the
data your sequential order of pages was broken with a time, and
finally the sequential read was transformed to the random access.. But
once you've re-copied your files again - the right order was applied
again.

BTW, 8K is recommended for OLTP workloads, but for DW you may stay
with 128K without problem.

Rgds,
-Dimitri

On 5/10/10, Josh Berkus  wrote:
> On 5/9/10 1:45 AM, Dimitri wrote:
>> Josh,
>>
>> it'll be great if you explain how did you change the records size to
>> 128K? - as this size is assigned on the file creation and cannot be
>> changed later - I suppose that you made a backup of your data and then
>> process a full restore.. is it so?
>
> You can change the recordsize of the zpool dynamically, then simply copy
> the data directory (with PostgreSQL shut down) to a new directory on
> that zpool.  This assumes that you have enough space on the zpool, of
> course.
>
> We didn't test how it would work to let the files in the Postgres
> instance get gradually replaced by "natural" updating.
>
> --
>   -- Josh Berkus
>  PostgreSQL Experts Inc.
>  http://www.pgexperts.com
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow query with planner row strange estimation

2010-07-12 Thread Dimitri

It's probably one of the cases when having HINTS in PostgreSQL may be
very helpful..

SELECT /*+ enable_nestloop=off */ ... FROM ...

will just fix this query without impacting other queries and without
adding any additional instructions into the application code..

So, why there is a such resistance to implement hints withing SQL
queries in PG?..

Rgds,
-Dimitri


On 7/9/10, Robert Haas  wrote:
> On Fri, Jul 9, 2010 at 6:13 AM, damien hostin 
> wrote:
>>> Have you tried running ANALYZE on the production server?
>>>
>>> You might also want to try ALTER TABLE ... SET STATISTICS to a large
>>> value on some of the join columns involved in the query.
>>
>> Hello,
>>
>> Before comparing the test case on the two machines, I run analyse on the
>> whole and look at pg_stats table to see if change occurs for the columns.
>> but on the production server the stats never became as good as on the
>> desktop computer. I set statistic at 1 on column used by the join, run
>> analyse which take a 300 row sample then look at the stats. The stats
>> are not as good as on the desktop. Row number is nearly the same but only
>> 1
>> or 2 values are found.
>>
>> The data are not balanced the same way on the two computer :
>> - Desktop is 12000 rows with 6000 implicated in the query (50%),
>> - "Production" (actually a dev/test server) is 6 million rows with 6000
>> implicated in the query (0,1%).
>> Columns used in the query are nullable, and in the 5994000 other rows that
>> are not implicated in the query these columns are null.
>>
>> I don't know if the statistic target is a % or a number of value to
>> obtain,
>
> It's a number of values to obtain.
>
>> but event set at max (1), it didn't managed to collect good stats (for
>> this particular query).
>
> I think there's a cutoff where it won't collect values unless they
> occur significantly more often than the average frequency.  I wonder
> if that might be biting you here: without the actual values in the MCV
> table, the join selectivity estimates probably aren't too good.
>
>> As I don't know what more to do, my conclusion is that the data need to be
>> better balanced to allow the analyse gather better stats. But if there is
>> a
>> way to improve the stats/query with this ugly balanced data, I'm open to
>> it
>> !
>>
>> I hope that in real production, data will never be loaded this way. If
>> this
>> appened we will maybe set enable_nestloop to off, but I don't think it's a
>> good solution, other query have a chance to get slower.
>
> Yeah, that usually works out poorly.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] 32 vs 64 bit build on Solaris Sparc

2010-08-12 Thread Dimitri

Hi Joe,

the general rule on Solaris SPARC is:
  - if you need to address a big size of memory (over 4G): compile in 64bit
  - otherwise: compile in 32bit ;-)

It's true that 32bit code will run faster comparing to 64bit ont the
64bit SPARC - you'll operate with 2 times shorter addresses, and in
some cases SPARC will be able to execute 2 operations in parallel on
32bit code, while it'll be still one operation on 64bit code.. - But
it's all about the code, because once you start to do I/O requests all
kind of optimization on the instructions will be lost due I/O latency
;-))

So, as usual, a real answer in each case may be obtained only by a real test..
Just test both versions and you'll see yourself what is a valid in
your case :-))

Same problem regarding compilers: in some cases GCC4 will give a
better result, in some cases Sun Studio will be better (there are many
posts in blogs about optimal compiler options to use).. - don't
hesitate to try and don't forget to share here with others :-))

Rgds,
-Dimitri

On 8/11/10, Joseph Conway  wrote:
> With a 16 CPU, 32 GB Solaris Sparc server, is there any conceivable
> reason to use a 32 bit build rather than a 64 bit build? Apparently the
> Sun PostgreSQL package includes a README that indicates you might want
> to think twice about using 64 bit because it is slower -- this seems
> like outdated advice, but I was looking for confirmation one way or the
> other.
>
> Also semi-related question: when building from source, using gcc,
> enabling debug (but *not* assert) is normally not much of a performance
> hit. Is the same true if you build with the Sun CC?
>
> Thanks in advance for any thoughts/experiences.
>
> Joe
>
>
>
> --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Fwd: [PERFORM] Vacuum Full + Cluster + Vacuum full = non removable dead rows

2010-08-21 Thread Dimitri

So, does it mean that VACUUM will never clean dead rows if you have a
non-stop transactional activity in your PG database???... (24/7 OLTP
for ex.)

Rgds,
-Dimitri


On 8/19/10, Kevin Grittner  wrote:
> Alexandre de Arruda Paes  wrote:
>> 2010/8/18 Tom Lane 
>
>>> There's an open transaction somewhere that VACUUM is preserving
>>> the tuples for.  This transaction need not ever have touched the
>>> table, or ever intend to touch the table --- but VACUUM cannot
>>> know that, so it saves any tuples that the transaction might be
>>> entitled to see if it looked.
>>>
>>> > carmen=# select * from vlocks where relname='tp93t'; select *
>>> > from pg_stat_activity where usename='webpa';
>>>
>>> You keep on showing us only subsets of pg_stat_activity :-(
>
>> select * from pg_stat_activity where usename='webpa';
>
> You keep on showing us only subsets of pg_stat_activity :-(
>
> *ANY* open transaction, including "idle in transaction" including
> transactions by other users in other databases will prevent vacuum
> from cleaning up rows, for the reasons Tom already gave you.
>
> What do you get from?:
>
> select * from pg_stat_activity where current_query <> ''
>   order by xact_start limit 10;
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Fwd: [PERFORM] Vacuum Full + Cluster + Vacuum full = non removable dead rows

2010-08-21 Thread Dimitri

Great! - it's what I expected until now :-)
but discussion in this thread put my mind in trouble :-))

So, the advice for Alexandre here is just to check the age of the
oldest running transaction and the last time when the table in
question was modified.. - if modification time is older than the
oldest transaction = we have a problem in PG.. Otherwise it works as
expected to match MVCC.

Rgds,
-Dimitri


On 8/21/10, Scott Marlowe  wrote:
> No, it means it can't clean rows that are younger than the oldest
> transaction currently in progress.  if you started a transaction 5
> hours ago, then all the dead tuples created in the last 5 hours are
> not recoverable.  Dead tuples created before that transaction are
> recoverable.  If you run transactions for days or weeks, then you're
> gonna have issues.
>
> On Sat, Aug 21, 2010 at 2:25 AM, Dimitri  wrote:
>> So, does it mean that VACUUM will never clean dead rows if you have a
>> non-stop transactional activity in your PG database???... (24/7 OLTP
>> for ex.)
>>
>> Rgds,
>> -Dimitri
>>
>>
>> On 8/19/10, Kevin Grittner  wrote:
>>> Alexandre de Arruda Paes  wrote:
>>>> 2010/8/18 Tom Lane 
>>>
>>>>> There's an open transaction somewhere that VACUUM is preserving
>>>>> the tuples for.  This transaction need not ever have touched the
>>>>> table, or ever intend to touch the table --- but VACUUM cannot
>>>>> know that, so it saves any tuples that the transaction might be
>>>>> entitled to see if it looked.
>>>>>
>>>>> > carmen=# select * from vlocks where relname='tp93t'; select *
>>>>> > from pg_stat_activity where usename='webpa';
>>>>>
>>>>> You keep on showing us only subsets of pg_stat_activity :-(
>>>
>>>> select * from pg_stat_activity where usename='webpa';
>>>
>>> You keep on showing us only subsets of pg_stat_activity :-(
>>>
>>> *ANY* open transaction, including "idle in transaction" including
>>> transactions by other users in other databases will prevent vacuum
>>> from cleaning up rows, for the reasons Tom already gave you.
>>>
>>> What do you get from?:
>>>
>>> select * from pg_stat_activity where current_query <> ''
>>>   order by xact_start limit 10;
>>>
>>> -Kevin
>>>
>>> --
>>> Sent via pgsql-performance mailing list
>>> (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
>
> --
> To understand recursion, one must first understand recursion.
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Fwd: [PERFORM] Vacuum Full + Cluster + Vacuum full = non removable dead rows

2010-08-22 Thread Dimitri

The problem here is that we're trying to keep an image of a whole
world for any transaction which is in most cases will need to get a
look on few streets around.. ;-)
I understand well that it's respecting the standard and so on, but the
background problem that you may see your table bloated just because
there is a long running transaction appeared in another database, and
if it's maintained/used/etc by another team - the problem very quickly
may become human rather technical :-))

So, why simply don't add a FORCE option to VACUUM?.. - In this case if
one executes "VACUUM FORCE TABLE" will be just aware about what he's
doing and be sure no one of the active transactions will be ever
access this table.

What do you think?.. ;-)

Rgds,
-Dimitri

On 8/22/10, Robert Haas  wrote:
> On Sat, Aug 21, 2010 at 9:49 AM, Alexandre de Arruda Paes
>  wrote:
>> Only for discussion: the CLUSTER command, in my little knowledge, is a
>> intrusive command that's cannot recover the dead tuples too.
>>
>> Only TRUNCATE can do this job, but obviously is not applicable all the
>> time.
>
> Either VACUUM or CLUSTER will recover *dead* tuples.  What you can't
> recover are tuples that are still visible to some running transaction.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How to achieve sustained disk performance of 1.25 GB write for 5 mins

2010-11-20 Thread Dimitri

You may also try the Sun's F5100 (flash storage array) - you may
easily get 700 MB/s just with a single I/O stream (single process), so
just with 2 streams you'll get your throughput.. - The array has 2TB
total space and max throughput should be around 4GB/s..

Rgds,
-Dimitri


On 11/18/10, Greg Smith  wrote:
> Eric Comeau wrote:
>> Ideally 1 large file, but it may have to be multiple. We find that if
>> we send multiple files it just causes the disk to thrash more so we
>> get better throughput by sending one large file.
>
> If it's really one disk, sure.  The problem you're facing is that your
> typical drive controller is going to top out at somewhere between 300 -
> 500MB/s of sequential writes before it becomes the bottleneck.  Above
> somewhere between 6 and 10 drives attached to one controller on current
> hardware, adding more to a RAID-0 volume only increases the ability to
> handle seeks quickly.  If you want to try and do this with traditional
> hard drives, I'd guess you'd need 3 controllers with at least 4
> short-stroked drives attached to each to have any hope of hitting
> 1.25GB/s.  Once you do that, you'll run into CPU time as the next
> bottleneck.  At that point, you'll probably need one CPU per controller,
> all writing out at once, to keep up with your target.
>
> The only popular hardware design that comes to mind aimed at this sort
> of thing was Sun's "Thumper" design, most recently seen in the Sun Fire
> X4540.  That put 8 controllers with 6 disks attached to each, claiming
> "demonstrated up to 2 GB/sec from disk to network".  It will take a
> design like that, running across multiple controllers, to get what
> you're looking for on the disk side--presuming everything else keeps up.
>
> One of the big SSD-on-PCI-e designs mentioned here already may very well
> end up being a better choice for you here though, as those aren't going
> to require quite as much hardware all get wired up.
>
> --
> Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
> PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-11 Thread Dimitri

Seems to me there is more thread model implementation problem on
FreeBSD, and databases just reflecting it... Most of the test I done
on Solaris show the same performance level on the same short READ-only
queries for MySQL and PostgreSQL.

And to be honest till the end, thread model should be far faster
(context switching between threads is way faster vs processes), but -
as I say usually - even a very good idea may be just wasted by a poor
implementation... And in case of MySQL they have too much locking to
manage concurrency between threads which kills all thread model
benefits... Also, to compare apples to apples, they should run this
test from remote client  rather locally on the same host - however in
this case the result for PostgreSQL will mostly depends on client
implementation: if client implements reading via CURSOR (quite often),
reading will generate 4x times more intensive network traffic than
necessary and final PostgreSQL result will be worse...

Reading this article I'm just happy for them to see progress done on FreeBSD :-)
As well to demonstrate OS parallelism it's not so impressive to see
4CPU server results rather 8CPU or 32threaded Niagara... Don't know
why they did not present similar performance graphs for these
platform, strange no?...

Rgds,
-Dimitri

On 11/9/07, Ron Mayer <[EMAIL PROTECTED]> wrote:
> Bill Moran wrote:
> > On Fri, 9 Nov 2007 11:11:18 -0500 (EST)
> > Greg Smith <[EMAIL PROTECTED]> wrote:
> >> On Fri, 9 Nov 2007, Sebastian Hennebrueder wrote:
> >>> If the queries are complex, this is understable.
> >> The queries used for this comparison are trivial.  There's only one table
> >> involved and there are no joins.  It's testing very low-level aspects of
> >> performance.
> >
> > Actually, what it's really showing is parallelism, and I've always
> > expected PostgreSQL to come out on top in that arena.
>
> Isn't it showing Postgres winning even without parallelism.
>
> At 1 threads, Postgres looks like 800TPS where MysQL comes
> in at about 600TPS on their Opteron charts.
>
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

Hi,

any idea if there is a more optimal execution plan possible for this query:

select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as hbeg,
H.END_DATE as hend, H.NOTE as hnote
 from HISTORY H, STAT S
 where S.REF = H.REF_STAT
 and H.REF_OBJECT = '01'
 order by H.HORDER ;

EXPLAIN ANALYZE output on 8.4:
   QUERY PLAN

 Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
time=1.341..1.343 rows=20 loops=1)
   Sort Key: h.horder
   Sort Method:  quicksort  Memory: 30kB
   ->  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
time=1.200..1.232 rows=20 loops=1)
 Hash Cond: (h.ref_stat = s.ref)
 ->  Index Scan using history_ref_idx on history h
(cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
rows=20 loops=1)
   Index Cond: (ref_object = '01'::bpchar)
 ->  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
time=1.147..1.147 rows=1000 loops=1)
   ->  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
width=45) (actual time=0.005..0.325 rows=1000 loops=1)
 Total runtime: 1.442 ms
(10 rows)

Table HISTORY contains 200M rows, only 20 needed
Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY values.

Table definitions:
"""""""""""""""""""""""""""""""""""""""""""""""""""
create table STAT
(
REF CHAR(3)not null,
NAMECHAR(40)   not null,
NUMBINTnot null
);

create table HISTORY
(
REF_OBJECT  CHAR(10)  not null,
HORDER  INT   not null,
REF_STATCHAR(3)   not null,
BEGIN_DATE  CHAR(12)  not null,
END_DATECHAR(12)      ,
NOTECHAR(100)
);

create unique index stat_ref_idx on STAT( ref );
create index history_ref_idx on HISTORY( ref_object, horder );
"""""""""""""""""""""""""""""""""""""""""""""""""""

NOTE: The same query runs 2 times faster on MySQL.

Any idea?..

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

Hi Craig,

yes, you detailed very well the problem! :-)
all those CHAR columns are so just due historical issues :-) as well
they may contains anything else and not only numbers, that's why..
Also, all data inside are fixed, so VARCHAR will not save place, or
what kind of performance issue may we expect with CHAR vs VARCHAR if
all data have a fixed length?..

Any way to force nested loop without additional index?..

It's 2 times faster on InnoDB, and as it's just a SELECT query no need
to go in transaction details :-)

Rgds,
-Dimitri

On 5/6/09, Craig Ringer  wrote:
> Dimitri wrote:
>> Hi,
>>
>> any idea if there is a more optimal execution plan possible for this
>> query:
>>
>> select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
>> hbeg,
>> H.END_DATE as hend, H.NOTE as hnote
>>  from HISTORY H, STAT S
>>  where S.REF = H.REF_STAT
>>  and H.REF_OBJECT = '01'
>>  order by H.HORDER ;
>
> OK, so you're taking a simple:
>
> history INNER JOIN stat ON (stat.ref = history.ref_stat)
>
> then filtering for records with a particular value of history.ref_object
> and finally performing a sort.
>
> If I'm reading it right, the plan below does a sequential scan on the
> `stat' table. The stat table only has 1000 rows, so this isn't
> necessarily an unreasonable choice even if there is an appropriate index
> and even if not many of the rows will be needed.
>
> It then does an index scan of the history table looking for tuples with
> ref_object = '01' (text match). It hash joins the hashed results
> of the initial seq scan to the results of the index scan, and sorts the
> result.
>
> To me, that looks pretty reasonable. You might be able to avoid the hash
> join in favour of a nested loop scan of stat_ref_idx (looping over
> records from history.ref_stat where ref_object = '001') by
> providing a composite index on HISTORY(ref_stat, ref_object). I'm really
> not too sure, though; plan optimization isn't my thing, I'm just seeing
> if I can offer a few ideas.
>
>> Table definitions:
>
> While not strictly necessary, it's a *REALLY* good idea to define a
> suitable PRIMARY KEY.
>
> Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for
> bounded-length values, or `text' for unbounded fields, unless you REALLY
> want the crazy behaviour of `CHAR(n)'.
>
> I'm a little bit puzzled about why you seem to be doing lots of things
> with integer values stored in text strings, but that probably doesn't
> matter too much for the issue at hand.
>
>> NOTE: The same query runs 2 times faster on MySQL.
>
> With InnoDB tables and proper transactional safety? Or using scary
> MyISAM tables and a "just pray" approach to data integrity? If you're
> using MyISAM tables I'm not surprised; MySQL with MyISAM is stunningly
> fast, but oh-my-god dangerous.
>
> --
> Craig Ringer
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

Hi Heikki,

I've already tried a target 1000 and the only thing it changes
comparing to the current 100 (default) is instead of 2404 rows it says
240 rows, but the plan remaining the same..

Rgds,
-Dimitri

On 5/6/09, Heikki Linnakangas  wrote:
> Dimitri wrote:
>> any idea if there is a more optimal execution plan possible for this
>> query:
>>
>> select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
>> hbeg,
>> H.END_DATE as hend, H.NOTE as hnote
>>  from HISTORY H, STAT S
>>  where S.REF = H.REF_STAT
>>  and H.REF_OBJECT = '01'
>>  order by H.HORDER ;
>>
>> EXPLAIN ANALYZE output on 8.4:
>>QUERY
>> PLAN
>> 
>>  Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
>> time=1.341..1.343 rows=20 loops=1)
>>Sort Key: h.horder
>>Sort Method:  quicksort  Memory: 30kB
>>->  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
>> time=1.200..1.232 rows=20 loops=1)
>>  Hash Cond: (h.ref_stat = s.ref)
>>  ->  Index Scan using history_ref_idx on history h
>> (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
>> rows=20 loops=1)
>>Index Cond: (ref_object = '01'::bpchar)
>>  ->  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
>> time=1.147..1.147 rows=1000 loops=1)
>>->  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
>> width=45) (actual time=0.005..0.325 rows=1000 loops=1)
>>  Total runtime: 1.442 ms
>> (10 rows)
>>
>> Table HISTORY contains 200M rows, only 20 needed
>> Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY
>> values.
>
> The bad doesn't look too bad to me, although the planner is
> over-estimating the number of matches in the history table (2404 vs 20).
> That's a bit surprising given how simple the predicate is. Make sure
> you've ANALYZEd the table. If that's not enough, you can try to increase
> the statistics target for ref_object column, ie. ALTER TABLE history
> ALTER COLUMN ref_object SET STATISTICS 500. That might give you a
> different plan, maybe with a nested loop join instead of hash join,
> which might be faster in this case.
>
> --
>Heikki Linnakangas
>EnterpriseDB   http://www.enterprisedb.com
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

Hi Chris,

the only problem I see here is it's 2 times slower vs InnoDB, so
before I'll say myself it's ok I want to be sure there is nothing else
to do.. :-)

Rgds,
-Dimitri


On 5/6/09, Chris  wrote:
> Dimitri wrote:
>> Hi Craig,
>>
>> yes, you detailed very well the problem! :-)
>> all those CHAR columns are so just due historical issues :-) as well
>> they may contains anything else and not only numbers, that's why..
>> Also, all data inside are fixed, so VARCHAR will not save place, or
>> what kind of performance issue may we expect with CHAR vs VARCHAR if
>> all data have a fixed length?..
>
> None in postgres, but the char/varchar thing may or may not bite you at
> some point later - sounds like you have it covered though.
>
>> It's 2 times faster on InnoDB, and as it's just a SELECT query no need
>> to go in transaction details :-)
>
>   Total runtime: 1.442 ms
> (10 rows)
>
> You posted a query that's taking 2/1000's of a second. I don't really
> see a performance problem here :)
>
> --
> Postgresql & php tutorials
> http://www.designmagick.com/
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

Hi Richard,

no, of course it's not based on explain :-)
I've run several tests before and now going in depth to understand if
there is nothing wrong. Due such a single query time difference InnoDB
is doing 2-3 times better TPS level comparing to PostgreSQL..

Rgds,
-Dimitri

On 5/6/09, Richard Huxton  wrote:
> Dimitri wrote:
>> Hi Chris,
>>
>> the only problem I see here is it's 2 times slower vs InnoDB
>
> How do you know? This isn't just based on the explain values reported,
> is it?
>
> --
>Richard Huxton
>Archonet Ltd
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

I'll try to answer all mails at once :-))

- query is running fully in RAM, no I/O, no network, only CPU time

- looping 100 times the same query gives 132ms total time (~1.32ms per
query), while it's 44ms on InnoDB (~0.44ms per query)

- disabling seq scan forcing a planner to use an index scan, and
finally it worse as gives 1.53ms per query..

- prepare the query helps: prepare statement takes 16ms, but execute
runs in 0.98ms  = which make me think it's not only a planner
overhead... And it's still 2 times lower vs 0.44ms.
Also, generally prepare cannot be used in this test case as we suppose
any query may be of any kind (even if it's not always true :-))

- char or varchar should be used here because the reference code is
supposed to accept any characters (alphanumeric)

- it also reminds me that probably there are some extra CPU time due
locale setting - but all my "lc_*" variables are set to "C"...

Rgds,
-Dimitri


On 5/6/09, Merlin Moncure  wrote:
> On Wed, May 6, 2009 at 7:46 AM, Merlin Moncure  wrote:
>> prepare history_stat(char(10) as
>
> typo:
> prepare history_stat(char(10)) as
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

The story is simple: for the launching of MySQL 5.4 I've done a
testing comparing available on that time variations of InnoDB engines,
and at the end by curiosity started the same test with PostgreSQL
8.3.7 to see if MySQL performance level is more close to PostgreSQL
now (PG was a strong true winner before). For my big surprise MySQL
5.4 outpassed 8.3.7...
However, analyzing the PostgreSQL processing I got a feeling something
goes wrong on PG side.. So, now I've installed both 8.3.7 and 8.4beta1
to see more in depth what's going on. Currently 8.4 performs much
better than 8.3.7, but there is still a room for improvement if such a
small query may go faster :-)

Rgds,
-Dimitri

On 5/6/09, Albe Laurenz  wrote:
> Dimitri wrote:
>> I've run several tests before and now going in depth to understand if
>> there is nothing wrong. Due such a single query time difference InnoDB
>> is doing 2-3 times better TPS level comparing to PostgreSQL..
>
> Why don't you use MySQL then?
> Or tune PostgreSQL?
>
> Yours,
> Laurenz Albe
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

Folks, first of all:

  - I used a fixed reference value just to simplify the case analyzing
and isolate it as max as possible, of course during my tests all
values are random :-)

- final goal of the test is to analyze scalability, so yes, concurrent
sessions with random keys are growing from 1 to 256  (I run it on
32cores server, no think time, just stressing), and the result is
still not yet better comparing to InnoDB

- I'm analyzing this query running in memory to understand what's
blocking while all main bottlenecks are avoided (no I/O anymore nor
network, etc.)

- initial explain analyze and table details were posted in the first message


Now, let's go more further:

 -  so "as it" query execution took 1.50ms

 - after removing "order by" it took 1.19ms

 - select count(*)  instead of columns and with removed "order by" took 0.98ms

- execute of the same prepared "select count(*) ..." took 0.68ms

So, where the time is going?...

Rgds,
-Dimitri


On 5/6/09, Ries van Twisk  wrote:
>
> On May 6, 2009, at 7:53 AM, Richard Huxton wrote:
>
>> Dimitri wrote:
>>> I'll try to answer all mails at once :-))
>>> - query is running fully in RAM, no I/O, no network, only CPU time
>>> - looping 100 times the same query gives 132ms total time (~1.32ms
>>> per
>>> query), while it's 44ms on InnoDB (~0.44ms per query)
>>
>> Well, assuming you're happy that PG is tuned reasonably for your
>> machine and that MySQL's query cache isn't returning the results
>> here it looks like MySQL is faster for this particular query.
>>
>> The only obvious place there could be a big gain is with the hashing
>> algorithm. If you remove the ORDER BY and the query-time doesn't
>> fall by much then it's the hash phase.
>>
>> The other thing to try is to alter the query to be a SELECT count(*)
>> rather than returning rows - that will let you measure the time to
>> transfer the result rows.
>>
>> --
>>  Richard Huxton
>>  Archonet Ltd
>>
>
>
> Do you expect to run this query 100 times per second during your
> application?
> or is this just a test to see how fast the query is for optimalisation.
>
> I always get scared myself with such a test as 'runs out of memory',
> reason
> given is that usually this is not really the case in a production
> environment.
>
> Try to make a little test case where you give the query random
> parameters
> so different result sets are returned. This will give you a better
> idea on how
> fast the query really is and might give you better comparison results.
>
> instead of count(*) I isusallt do explain analyze to see how fast
> PostgreSQL handles to query.
>
> Ries
>
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

I supposed in case with prepare and then execute a query optimizer is
no more coming in play on "execute" phase, or did I miss something?..

Forget to say: query cache is disabled on MySQL side.

Rgds,
-Dimitri

On 5/6/09, Craig Ringer  wrote:
> Dimitri wrote:
>> Hi Chris,
>>
>> the only problem I see here is it's 2 times slower vs InnoDB, so
>> before I'll say myself it's ok I want to be sure there is nothing else
>> to do.. :-)
>
> Can the genetic query optimizer come into play on small queries?
>
> --
> Craig Ringer
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-06 Thread Dimitri

Hi Ken,

yes, I may do it, but I did not expect to come into profiling initially :-)
I expected there is just something trivial within a plan that I just
don't know.. :-)

BTW, is there already an integrated profiled within a code? or do I
need external tools?..

Rgds,
-Dimitri

On 5/6/09, Kenneth Marshall  wrote:
> On Wed, May 06, 2009 at 09:38:59AM +0200, Dimitri wrote:
>> Hi,
>>
>> any idea if there is a more optimal execution plan possible for this
>> query:
>>
>> select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
>> hbeg,
>> H.END_DATE as hend, H.NOTE as hnote
>>  from HISTORY H, STAT S
>>  where S.REF = H.REF_STAT
>>  and H.REF_OBJECT = '01'
>>  order by H.HORDER ;
>>
>> EXPLAIN ANALYZE output on 8.4:
>>QUERY
>> PLAN
>> 
>>  Sort  (cost=4549.75..4555.76 rows=2404 width=176) (actual
>> time=1.341..1.343 rows=20 loops=1)
>>Sort Key: h.horder
>>Sort Method:  quicksort  Memory: 30kB
>>->  Hash Join  (cost=33.50..4414.75 rows=2404 width=176) (actual
>> time=1.200..1.232 rows=20 loops=1)
>>  Hash Cond: (h.ref_stat = s.ref)
>>  ->  Index Scan using history_ref_idx on history h
>> (cost=0.00..4348.20 rows=2404 width=135) (actual time=0.042..0.052
>> rows=20 loops=1)
>>Index Cond: (ref_object = '01'::bpchar)
>>  ->  Hash  (cost=21.00..21.00 rows=1000 width=45) (actual
>> time=1.147..1.147 rows=1000 loops=1)
>>->  Seq Scan on stat s  (cost=0.00..21.00 rows=1000
>> width=45) (actual time=0.005..0.325 rows=1000 loops=1)
>>  Total runtime: 1.442 ms
>> (10 rows)
>>
>> Table HISTORY contains 200M rows, only 20 needed
>> Table STAT contains 1000 rows, only 20 needed to be joined to HISTORY
>> values.
>>
>> Table definitions:
>> """""""""""""""""""""""""""""""""""""""""""""""""""
>> create table STAT
>> (
>> REF CHAR(3)not null,
>> NAMECHAR(40)   not null,
>> NUMBINTnot null
>> );
>>
>> create table HISTORY
>> (
>>     REF_OBJECT      CHAR(10)  not null,
>> HORDER  INT   not null,
>> REF_STATCHAR(3)   not null,
>> BEGIN_DATE  CHAR(12)  not null,
>> END_DATECHAR(12)  ,
>> NOTECHAR(100)
>> );
>>
>> create unique index stat_ref_idx on STAT( ref );
>> create index history_ref_idx on HISTORY( ref_object, horder );
>> """""""""""""""""""""""""""""""""""""""""""""""""""
>>
>> NOTE: The same query runs 2 times faster on MySQL.
>>
>> Any idea?..
>>
>> Rgds,
>> -Dimitri
>>
> Dimitri,
>
> Is there any chance of profiling the postgres backend to see
> where the time is used?
>
> Just an idea,
> Ken
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Dimitri

Hi Simon,

may you explain why REINDEX may help here?.. - database was just
created, data loaded, and then indexes were created + analyzed.. What
may change here after REINDEX?..

With hashjoin disabled was a good try!
Running this query "as it" from 1.50ms we move to 0.84ms now,
and the plan is here:

  QUERY PLAN
--
 Sort  (cost=4562.83..4568.66 rows=2329 width=176) (actual
time=0.225..0.229 rows=20 loops=1)
   Sort Key: h.horder
   Sort Method:  quicksort  Memory: 30kB
   ->  Merge Join  (cost=4345.89..4432.58 rows=2329 width=176) (actual
time=0.056..0.205 rows=20 loops=1)
 Merge Cond: (s.ref = h.ref_stat)
 ->  Index Scan using stat_ref_idx on stat s
(cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079
rows=193 loops=1)
 ->  Sort  (cost=4345.89..4351.72 rows=2329 width=135) (actual
time=0.041..0.043 rows=20 loops=1)
   Sort Key: h.ref_stat
   Sort Method:  quicksort  Memory: 30kB
   ->  Index Scan using history_ref_idx on history h
(cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024
rows=20 loops=1)
 Index Cond: (ref_object = '01'::bpchar)
 Total runtime: 0.261 ms
(12 rows)

Curiously planner expect to run it in 0.26ms

Any idea why planner is not choosing this plan from the beginning?..
Any way to keep this plan without having a global or per sessions
hashjoin disabled?..

Rgds,
-Dimitri


On 5/6/09, Simon Riggs  wrote:
>
> On Wed, 2009-05-06 at 10:31 +0200, Dimitri wrote:
>
>> I've already tried a target 1000 and the only thing it changes
>> comparing to the current 100 (default) is instead of 2404 rows it says
>> 240 rows, but the plan remaining the same..
>
> Try both of these things
> * REINDEX on the index being used in the query, then re-EXPLAIN
> * enable_hashjoin = off, then re-EXPLAIN
>
> You should first attempt to get the same plan, then confirm it really is
> faster before we worry why the optimizer hadn't picked that plan.
>
> We already know that MySQL favors nested loop joins, so turning up a
> plan that on this occasion is actually better that way is in no way
> representative of general performance. Does MySQL support hash joins?
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Dimitri

The problem with "gprof" - it'll profile all stuff from the beginning
to the end, and a lot of function executions unrelated to this query
will be profiled...

As well when you look on profiling technology - all such kind of
solutions are based on the system clock frequency and have their
limits on time resolution. On my system this limit is 0.5ms, and it's
too big comparing to the query execution time :-)

So, what I've done - I changed little bit a reference key criteria from
= '01' to < '51', so instead of 20 rows I have 1000
rows on output now, it's still slower than InnoDB (12ms vs 9ms), but
at least may be profiled (well, we also probably moving far from the
problem as time may be spent mostly on the output traffic now, but
I've tried anyway) - I've made a loop of 100 iterations of this query
which is reading but not printing data. The total execution time of
this loop is 1200ms, and curiously under profiling was not really
changed. Profiler was able to catch 733ms of total execution time (if
I understand well, all functions running faster than 0.5ms are remain
un-profiled). The top profiler output is here:

Excl. Incl.  Name
User CPU  User CPU
 sec.  sec.
0.733 0.733  
0.103 0.103  memcpy
0.045 0.045  slot_deform_tuple
0.037 0.040  AllocSetAlloc
0.021 0.021  AllocSetFree
0.018 0.037  pfree
0.018 0.059  appendBinaryStringInfo
0.017 0.031  heap_fill_tuple
0.017 0.017  _ndoprnt
0.016 0.016  nocachegetattr
0.015 0.065  heap_form_minimal_tuple
0.015 0.382  ExecProcNode
0.015 0.015  strlen
0.014 0.037  ExecScanHashBucket
0.014 0.299  printtup
0.013 0.272  ExecHashJoin
0.011 0.011  enlargeStringInfo
0.011 0.086  index_getnext
0.010 0.010  hash_any
0.009 0.076  FunctionCall1
0.009 0.037  MemoryContextAlloc
0.008 0.008  LWLockAcquire
0.007 0.069  pq_sendcountedtext
0.007 0.035  ExecProject
0.007 0.127  ExecScan
...

Curiously "memcpy" is in top. Don't know if it's impacted in many
cases, but probably it make sense to see if it may be optimized, etc..

Rgds,
-Dimitri

On 5/7/09, Euler Taveira de Oliveira  wrote:
> Dimitri escreveu:
>> BTW, is there already an integrated profiled within a code? or do I
>> need external tools?..
>>
> Postgres provides support for profiling. Add --enable-profiling flag. Use
> gprof to get the profile.
>
>
> --
>   Euler Taveira de Oliveira
>   http://www.timbira.com/
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Dimitri

I've simply restarted a full test with hashjoin OFF. Until 32
concurrent users things are going well. Then since 32 users response
time is jumping to 20ms, with 64 users it's higher again, and with 256
users reaching 700ms, so TPS is dropping from 5.000 to ~200..

With hashjoin ON it's not happening, and I'm reaching at least 11.000
TPS on fully busy 32 cores.

I should not use prepare/execute as the test conditions should remain "generic".

About scalability issue - there is one on 8.3.7, because on 32 cores
with such kind of load it's using only 50% CPU and not outpassing
6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS..

On the same time while I'm comparing 8.3 and 8.4 - the response time
is 2 times lower in 8.4, and seems to me the main gain for 8.4 is
here.

I'll publish all details, just need a time :-)

Rgds,
-Dimitri

On 5/7/09, Merlin Moncure  wrote:
> On Thu, May 7, 2009 at 4:20 AM, Dimitri  wrote:
>> Hi Simon,
>>
>> may you explain why REINDEX may help here?.. - database was just
>> created, data loaded, and then indexes were created + analyzed.. What
>> may change here after REINDEX?..
>>
>> With hashjoin disabled was a good try!
>> Running this query "as it" from 1.50ms we move to 0.84ms now,
>> and the plan is here:
>>
>>  QUERY
>> PLAN
>> --
>>  Sort  (cost=4562.83..4568.66 rows=2329 width=176) (actual
>> time=0.225..0.229 rows=20 loops=1)
>>   Sort Key: h.horder
>>   Sort Method:  quicksort  Memory: 30kB
>>   ->  Merge Join  (cost=4345.89..4432.58 rows=2329 width=176) (actual
>> time=0.056..0.205 rows=20 loops=1)
>> Merge Cond: (s.ref = h.ref_stat)
>> ->  Index Scan using stat_ref_idx on stat s
>> (cost=0.00..49.25 rows=1000 width=45) (actual time=0.012..0.079
>> rows=193 loops=1)
>> ->  Sort  (cost=4345.89..4351.72 rows=2329 width=135) (actual
>> time=0.041..0.043 rows=20 loops=1)
>>   Sort Key: h.ref_stat
>>   Sort Method:  quicksort  Memory: 30kB
>>   ->  Index Scan using history_ref_idx on history h
>> (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.013..0.024
>> rows=20 loops=1)
>> Index Cond: (ref_object = '01'::bpchar)
>>  Total runtime: 0.261 ms
>> (12 rows)
>>
>> Curiously planner expect to run it in 0.26ms
>>
>> Any idea why planner is not choosing this plan from the beginning?..
>> Any way to keep this plan without having a global or per sessions
>> hashjoin disabled?..
>
> can you work prepared statements into your app?  turn off hash join,
> prepare the query, then turn it back on.
>
> merlin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Dimitri

Folks, it's completely crazy, but here is what I found:

- if HISTORY table is analyzed with target 1000 my SELECT response
time is jumping to 3ms, and the max throughput is limited to 6.000 TPS
(it's what happenned to 8.3.7)

-if HISTORY table is analyzed with target 5 - my SELECT response time
is decreasing to 1.2ms (!)  and then my max TPS level is ~12.000 !
and CPU is used up to 95% even by 8.3.7 :-)  and 8.4 performed better
just because I left its analyze target to default 100 value.

Anyone may explain me why analyze target may have so huge negative
secondary effect?..

Next point: SCALABILITY ISSUE

Now both 8.3.7 and 8.4 have similar performance levels, but 8.3.7 is
always slightly better comparing to 8.4, but well. The problem I have:
   - on 8 cores: ~5.000 TPS  / 5.500 MAX
   - on 16 cores: ~10.000 TPS / 11.000 MAX
   - on  32 cores: ~10.500 TPS  / 11.500 MAX

What else may limit concurrent SELECTs here?..

Yes, forget, MySQL is reaching 17.500 TPS here.

Rgds,
-Dimitri

On 5/7/09, Simon Riggs  wrote:
>
> On Thu, 2009-05-07 at 20:36 +0200, Dimitri wrote:
>
>> I've simply restarted a full test with hashjoin OFF. Until 32
>> concurrent users things are going well. Then since 32 users response
>> time is jumping to 20ms, with 64 users it's higher again, and with 256
>> users reaching 700ms, so TPS is dropping from 5.000 to ~200..
>>
>> With hashjoin ON it's not happening, and I'm reaching at least 11.000
>> TPS on fully busy 32 cores.
>
> Much better to stick to the defaults.
>
> Sounds like a problem worth investigating further, but not pro bono.
>
>> About scalability issue - there is one on 8.3.7, because on 32 cores
>> with such kind of load it's using only 50% CPU and not outpassing
>> 6.000 TPS, while 8.4 uses 90% CPU and reaching 11.000 TPS..
>
> Yeh, small changes make a big difference. Thanks for the info.
>
> How does MySQL perform?
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-11 Thread Dimitri

Hi,

what may you suggest as the most optimal postgresql.conf to keep
writing as stable as possible?..

What I want is to avoid "throughput waves"  - I want to keep my
response times stable without any activity holes. I've tried to reduce
checkpoint timeout from 5min to 30sec - it helped, throughput is more
stable now, but instead of big waves I have now short waves anyway..

What is the best options combination here?..

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Dimitri

Hi Tom,

it was not willing :-)
it just stayed so after various probes with a query plan.

Anyway, on 8.4 the default target is 100, an just by move it to 5 I
reached on 16cores  10.500 TPS instead of 8.000 initially. And I think
you have a good reason to keep it equal to 100 by default, isn't it?
;-)

And what about scalability on 32cores?..
Any idea?

Rgds,
-Dimitri

On 5/11/09, Tom Lane  wrote:
> Dimitri  writes:
>> Anyone may explain me why analyze target may have so huge negative
>> secondary effect?..
>
> If these are simple queries, maybe what you're looking at is the
> increase in planning time caused by having to process 10x as much
> statistical data.  Cranking statistics_target to the max just because
> you can is not necessarily a good strategy.
>
>   regards, tom lane
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-11 Thread Dimitri

Hi Kevin,

PostgreSQL: 8.3.7 & 8.4
Server: Sun M5000 32cores
OS: Solaris 10

current postgresql.conf:

#
max_connections = 2000  # (change requires restart)
effective_cache_size = 48000MB
shared_buffers = 12000MB
temp_buffers = 200MB
work_mem = 100MB# min 64kB
maintenance_work_mem = 600MB# min 1MB

max_fsm_pages = 2048000
fsync = on  # turns forced synchronization on or off
synchronous_commit = off# immediate fsync at commit
wal_sync_method = fdatasync
wal_buffers = 2MB
wal_writer_delay = 400ms# 1-1 milliseconds

checkpoint_segments = 128
checkpoint_timeout = 30s

archive_mode = off
track_counts = on
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_max_workers = 4
autovacuum_naptime = 20 # time between autovacuum runs
autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.001

lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'

#

Rgds,
-Dimitri


On 5/11/09, Kevin Grittner  wrote:
> Dimitri  wrote:
>
>> what may you suggest as the most optimal postgresql.conf to keep
>> writing as stable as possible?..
>>
>> What I want is to avoid "throughput waves"  - I want to keep my
>> response times stable without any activity holes. I've tried to
>> reduce checkpoint timeout from 5min to 30sec - it helped, throughput
>> is more stable now, but instead of big waves I have now short waves
>> anyway..
>>
>> What is the best options combination here?..
>
> What version of PostgreSQL?  What operating system?  What hardware?
>
> The answers are going to depend on the answers to those questions.
>
> It would also be good to show all lines from postgresql.conf which are
> not commented out.
>
> -Kevin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-11 Thread Dimitri

Thanks a lot, I'll try them all!

Yes, I have a good external storage box with battery backed cache enabled.
There are 64GB of RAM so I expected it'll help little bit to increase
a buffer cache, but ok, will see if with 256MB it'll be better :-)

What about "full_page_writes" ? seems it's "on"  by default. Does it
makes sense to put if off?..

Rgds,
-Dimitri




On 5/11/09, Kevin Grittner  wrote:
> Dimitri  wrote:
>
>> PostgreSQL: 8.3.7 & 8.4
>> Server: Sun M5000 32cores
>> OS: Solaris 10
>
> Does that have a battery backed RAID controller?  If so, is it
> configured for write-back?  These both help a lot with smoothing
> checkpoint I/O gluts.
>
> We've minimized problems by making the background writer more
> aggressive.  8.3 and later does a better job in general, but we've
> still had to go with:
>
> bgwriter_lru_maxpages = 1000
> bgwriter_lru_multiplier = 4.0
>
>> shared_buffers = 12000MB
>
> You might want to test with that set to something much lower, to see
> what the checkpoint delays look like.  We've found it best to use a
> small (256MB) setting, and leave caching to the OS; in our
> environment, it seems to do a better job of scheduling the disk I/O.
> YMMV, of course.
>
> -Kevin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-11 Thread Dimitri

Hi Scott,

good point - the current checkpoint completion target is a default
0.5, and it makes sense to set it to 0.8 to make writing more smooth,
great!

yes, data and xlog are separated, each one is sitting on an
independent storage LUN RAID1, and storage box is enough performant

Thank you!

Rgds,
-Dimitri


On 5/11/09, Scott Marlowe  wrote:
> On Mon, May 11, 2009 at 10:31 AM, Dimitri  wrote:
>> Hi Kevin,
>>
>> PostgreSQL: 8.3.7 & 8.4
>> Server: Sun M5000 32cores
>> OS: Solaris 10
>>
>> current postgresql.conf:
>>
>> #
>> max_connections = 2000  # (change requires restart)
>> effective_cache_size = 48000MB
>> shared_buffers = 12000MB
>> temp_buffers = 200MB
>> work_mem = 100MB# min 64kB
>> maintenance_work_mem = 600MB# min 1MB
>>
>> max_fsm_pages = 2048000
>> fsync = on  # turns forced synchronization on
>> or off
>> synchronous_commit = off# immediate fsync at commit
>> wal_sync_method = fdatasync
>> wal_buffers = 2MB
>> wal_writer_delay = 400ms# 1-1 milliseconds
>>
>> checkpoint_segments = 128
>> checkpoint_timeout = 30s
>
> What's your checkpoint completion target set to? Crank that up a bit
> ot 0.7, 0.8 etc and make the timeout more, not less.  That should
> help.
>
> Also, look into better hardware (RAID controller with battery backed
> cache) and also putting pg_xlog on a separate RAID-1 set (or RAID-10
> set if you've got a lot of drives under the postgres data set).
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-11 Thread Dimitri

OK, it'll be better to avoid a such improvement :-)
Performance - yes, but not for any price :-)

Thank you!

Rgds,
-Dimitri

On 5/11/09, Kevin Grittner  wrote:
> Dimitri  wrote:
>
>> What about "full_page_writes" ? seems it's "on"  by default. Does it
>> makes sense to put if off?..
>
> It would probably help with performance, but the description is a
> little disconcerting in terms of crash recovery.  We tried running
> with it off for a while (a year or so back), but had problems with
> corruption.  I think the specific cause of that has since been fixed,
> it's left us a bit leery of the option.
>
> Maybe someone else can speak to how safe (or not) the current
> implementation of that option is.
>
> -Kevin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Dimitri

Hi Simon,

it's too early yet to speak about MySQL scalability... :-)
it's only since few months there is *no more* regression on MySQL
performance while moving from 8 to 16 cores. But looking how quickly
it's progressing now things may change very quickly :-)

For the moment on my tests it gives:
  - on 8 cores: 14.000 TPS
  - on 16 cores: 17.500 TPS
  - on 32 cores: 15.000 TPS (regression)

Rgds,
-Dimitri

On 5/11/09, Simon Riggs  wrote:
>
> On Mon, 2009-05-11 at 17:18 +0200, Dimitri wrote:
>
>> Yes, forget, MySQL is reaching 17.500 TPS here.
>
> Please share your measurements of MySQL scalability also.
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-11 Thread Dimitri

Hi Aidan,

thanks a lot for this detailed summary!

So, why I don't use prepare here: let's say I'm testing the worst
stress case :-)  Imagine you have thousands of such kind of queries -
you cannot prepare all of them! :-)  or you'll maybe prepare it once,
but as I showed previously in this thread prepare statement itself
takes 16ms, so for a single shot there is no gain! :-)  Stressing with
such kind of short and simple queries (and again, they have joins, it
may be even more simple :-)) will give me a result to show with
guarantee my worst case - I know then if I have to deploy a bombarding
OLTP-like application my database engine will be able to keep such
workload, and if I have performance problems they are inside of
application! :-)  (well, it's very simplistic, but it's not far from
the truth :-))

Now, as you see from your explanation, the Part #2 is the most
dominant - so why instead to blame this query not to implement a QUERY
PLANNER CACHE??? - in way if any *similar* query is recognized by
parser we simply *reuse* the same plan?..

Rgds,
-Dimitri


On 5/11/09, Aidan Van Dyk  wrote:
> * Dimitri  [090511 11:18]:
>> Folks, it's completely crazy, but here is what I found:
>>
>> - if HISTORY table is analyzed with target 1000 my SELECT response
>> time is jumping to 3ms, and the max throughput is limited to 6.000 TPS
>> (it's what happenned to 8.3.7)
>>
>> -if HISTORY table is analyzed with target 5 - my SELECT response time
>> is decreasing to 1.2ms (!)  and then my max TPS level is ~12.000 !
>> and CPU is used up to 95% even by 8.3.7 :-)  and 8.4 performed better
>> just because I left its analyze target to default 100 value.
>>
>> Anyone may explain me why analyze target may have so huge negative
>> secondary effect?..
>
> It's actually pretty straight forward.
>
> The PostgreSQL query planner is a "smart planner".  It takes into
> consideration all the statistics available on the columns/tables,
> expected outputs based on inputs, etc, to choose what it thinks will be
> the best plan.  The more data you have in statistics (the larger
> statistics target you have), the more CPU time and longer it's going to
> take to "plan" your queries.  The tradeoff is hopefully better plans.
>
> But, in your scenario, where you are hitting the database with the
> absolute worst possible way to use PostgreSQL, with small, repeated,
> simple queries, you're not getting the advantage  of "better" plans.  In
> your case, you're throwing absolutely simple queries at PG as fast as
> you can, and for each query, PostgreSQL has to:
>
> 1) Parse the given "query string"
> 2) Given the statistics available, plan the query and pick the best one
> 3) Actually run the query.
>
> Part 2 is going to dominate the CPU time in your tests, more so the more
> statistics it has to evaluate, and unless the data has to come from the
> disks (i.e. not in shared buffers or cache) is thus going to dominate the
> time before you get your results.  More statistics means more time
> needed to do the planning/picking of the query.
>
> If you were to use prepared statements, the cost of #1 and #2 is done
> once, and then every time you throw a new execution of the query to
> PostgreSQL, you get to just do #3, the easy quick part, especially for
> small simple queries where all the data is in shared buffers or the cache.
>
> a.
>
> --
> Aidan Van Dyk Create like a god,
> ai...@highrise.ca   command like a king,
> http://www.highrise.ca/   work like a slave.
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

>> So, why I don't use prepare here: let's say I'm testing the worst
>> stress case :-)  Imagine you have thousands of such kind of queries -
>> you cannot prepare all of them! :-)
>
> Thousands?  Surely there'll be a dozen or three of most common queries,
> to which you pass different parameters.  You can prepare thoseu

Ok, and if each client just connect to the database, execute each kind
of query just *once* and then disconnect?..  - cost of prepare will
kill performance here if it's not reused at least 10 times within the
same session.

Well, I know, we always can do better, and even use stored procedures,
etc. etc.


>
>> Now, as you see from your explanation, the Part #2 is the most
>> dominant - so why instead to blame this query not to implement a QUERY
>> PLANNER CACHE??? - in way if any *similar* query is recognized by
>> parser we simply *reuse* the same plan?..
>
> This has been discussed in the past, but it turns out that a real
> implementation is a lot harder than it seems.

Ok. If I remember well, Oracle have it and it helps a lot, but for
sure it's not easy to implement..

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

Nice to know. But again, if this cache is kept only on the client side
it'll be always lost on disconnect. And if clients are "short-lived"
it'll not help.

BTW, is there an option to say "do execution plan as simple as
possible"? If you're sure about your data and your indexes - don't
need to spend so much time.

Rgds,
-Dimitri

On 5/12/09, Heikki Linnakangas  wrote:
> Dimitri wrote:
>> Now, as you see from your explanation, the Part #2 is the most
>> dominant - so why instead to blame this query not to implement a QUERY
>> PLANNER CACHE??? - in way if any *similar* query is recognized by
>> parser we simply *reuse* the same plan?..
>
> At least in JDBC, there's several open source prepared statement cache
> implementations out there that people use. I don't know about other
> client libraries, but it certainly is possible to do in the client.
>
> --
>Heikki Linnakangas
>EnterpriseDB   http://www.enterprisedb.com
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-12 Thread Dimitri

It's just one of the test conditions - "what if there 2000 users?" - I
know I may use pgpool or others, but I also need to know the limits of
the database engine itself.. For the moment I'm limiting to 256
concurrent sessions, but config params are kept like for 2000 :-)

Rgds,
-Dimitri

On 5/12/09, Laurent Laborde  wrote:
> On Mon, May 11, 2009 at 6:31 PM, Dimitri  wrote:
>> Hi Kevin,
>>
>> PostgreSQL: 8.3.7 & 8.4
>> Server: Sun M5000 32cores
>> OS: Solaris 10
>>
>> current postgresql.conf:
>>
>> #
>> max_connections = 2000                  # (change requires restart)
>
> Are you sure about the 2000 connections ?
> Why don't you use a pgbouncer or pgpool instead ?
>
>
> --
> F4FQM
> Kerunix Flan
> Laurent Laborde
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

Folks, before you start to think "what a dumb guy doing a dumb thing" :-))
I'll explain you few details:

it's for more than 10 years I'm using a db_STRESS kit
(http://dimitrik.free.fr/db_STRESS.html) to check databases
performance and scalability. Until now I was very happy with results
it gave me as it stress very well each database engine internals an
put on light some things I should probably skip on other workloads.
What do you want, with a time the "fast" query executed before in
500ms now runs within 1-2ms  - not only hardware was improved but also
database engines increased their performance a lot! :-))

In 2007 I've published the first public results with PostgreSQL, and
it was 2 times faster on that time comparing to MySQL
(http://dimitrik.free.fr/db_STRESS_BMK_Part1.html)

Last month for the launching of MySQL 5.4 I've done a long series of
tests and at the end for my curiosity I've executed the same load
against PostgreSQL 8.3.7 to see if MySQL is more close now. For my big
surprise, MySQL was faster! As well observations on PG processing
bring me a lot of questions - I supposed something was abnormal on PG
side, but I did not have too much time to understand what it was
exactly 
(http://dimitrik.free.fr/db_STRESS_MySQL_540_and_others_Apr2009.html#note_5443)

What I'm trying to do now is to understand what exactly is the problem.

What I discovered so far with all your help:
  - the impact of a planner
  - the impact of the analyze target
  - the impact of prepare / execute
  - scalability limit on 32 cores

I'll also try to adapt prepare/execute solution to see how much it
improves performance and/or scalability.

As well helping from the other thread I was able to improve a lot the
TPS stability on read+write workload! :-)

Any other comments are welcome!

Rgds,
-Dimitri

On 5/12/09, Dimitri Fontaine  wrote:
> Hi,
>
> Dimitri  writes:
>
>>>> So, why I don't use prepare here: let's say I'm testing the worst
>>>> stress case :-)  Imagine you have thousands of such kind of queries -
>>>> you cannot prepare all of them! :-)
>>>
>>> Thousands?  Surely there'll be a dozen or three of most common queries,
>>> to which you pass different parameters.  You can prepare thoseu
>>
>> Ok, and if each client just connect to the database, execute each kind
>> of query just *once* and then disconnect?..  - cost of prepare will
>> kill performance here if it's not reused at least 10 times within the
>> same session.
>
> In a scenario which looks like this one, what I'm doing is using
> pgbouncer transaction pooling. Now a new connection from client can be
> served by an existing backend, which already has prepared your
> statement.
>
> So you first SELECT name FROM pg_prepared_statements; to know if you
> have to PREPARE or just EXECUTE, and you not only maintain much less
> running backends, lower fork() calls, but also benefit fully from
> preparing the statements even when you EXECUTE once per client
> connection.
>
>> Well, I know, we always can do better, and even use stored procedures,
>> etc. etc.
>
> Plain SQL stored procedure will prevent PostgreSQL to prepare your
> queries, only PLpgSQL functions will force transparent plan caching. But
> calling this PL will cost about 1ms per call in my tests, so it's not a
> good solution.
>
> It's possible to go as far as providing your own PostgreSQL C module
> where you PREPARE at _PG_init() time and EXECUTE in a SQL callable
> function, coupled with pgbouncer it should max out the perfs. But maybe
> you're not willing to go this far.
>
> Anyway, is hammering the server with always the same query your real
> need or just a simplified test-case? If the former, you'll see there are
> good ways to theorically obtain better perfs than what you're currently
> reaching, if the latter I urge you to consider some better benchmarking
> tools, such as playr or tsung.
>
>   https://area51.myyearbook.com/trac.cgi/wiki/Playr
>   http://tsung.erlang-projects.org/
>   http://pgfouine.projects.postgresql.org/tsung.html
>   http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>
> Regards,
> --
> dim
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

Wow, Simon! :-))

yes, I'm working in Sun Benchmark Center :-))
(I'm not using my Sun email on public lists only to avid a spam)

and as came here and asking questions it's probably proving my
intentions to show PostgreSQL in its best light, no?.. - I never liked
"not honest" comparisons :-))

Regarding your bet: from a very young age I learned a one thing - you
take any 2 person who betting for any reason - you'll find in them one
idiot and one bastard :-))   idiot - because betting while missing
knowledge, and bastard - because knowing the truth is not honset to
get a profit from idiots :-))  That's why I never betting in my life,
but every time telling the same story in such situation... Did you
like it? ;-))

However, no problem to give you a credit as well to all pg-perf list
as it provides a very valuable help! :-))

Rgds,
-Dimitri

On 5/12/09, Simon Riggs  wrote:
>
> On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:
>
>> For my big surprise, MySQL was faster!
>
> Ours too.
>
> ** I bet you $1000 that I can improve the performance of your benchmark
> results with PostgreSQL. You give me $1000 up-front and if I can't
> improve your high end numbers I'll give you $2000 back. Either way, you
> name me and link to me from your blog. Assuming you re-run the tests as
> requested and give me reasonable access to info and measurements. **
>
> I note your blog identifies you as a Sun employee. Is that correct? If
> you do not give us the opportunity to improve upon the results then
> reasonable observers might be persuaded you did not wish to show
> PostgreSQL in its best light. You up for it?
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

For the moment I'm even not considering any scalability issues on the
Read+Write workload - it may always be related to the storage box, and
storage latency or controller/cache efficiency may play a lot.

As problem I'm considering a scalability issue on Read-Only workload -
only selects, no disk access, and if on move from 8 to 16 cores we
gain near 100%, on move from 16 to 32 cores it's only 10%...

I think I have to replay Read-Only with prepare/execute and check how
much it'll help (don't know if there are some internal locking used
when a planner is involved)..

And yes, I'll try to profile on 32 cores, it makes sense.

Rgds,
-Dimitri

On 5/12/09, Heikki Linnakangas  wrote:
> Dimitri wrote:
>> What I discovered so far with all your help:
>>   - the impact of a planner
>>   - the impact of the analyze target
>>   - the impact of prepare / execute
>>   - scalability limit on 32 cores
>
> You've received good advice on how to minimize the impact of the first
> three points, and using those techniques should bring a benefit. But I'm
> pretty surprised by the bad scalability you're seeing and no-one seems
> to have a good idea on where that limit is coming from. At a quick
> glance, I don't see any inherent bottlenecks in the schema and workload.
>
> If you could analyze where the bottleneck is with multiple cores, that
> would be great. With something like oprofile, it should be possible to
> figure out where the time is spent.
>
> My first guess would be the WALInsertLock: writing to WAL is protected
> by that and it an become a bottleneck with lots of small
> UPDATE/DELETE/INSERT transactions. But a profile would be required to
> verify that.
>
> --
>Heikki Linnakangas
>EnterpriseDB   http://www.enterprisedb.com
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

Wait wait, currently I'm playing the "stress scenario", so there are
only 256 sessions max, but thing time is zero (full stress). Scenario
with 1600 users is to test how database is solid just to keep a huge
amount of users, but doing only one transaction per second (very low
global TPS comparing to what database is able to do, but it's testing
how well its internals working to manage the user sessions).

I did not plan to do 1600 users test this time (all depends on time :-))

So, do I need to increase WAL buffers for 256 users?

My LOG and DATA are placed on separated storage LUNs and controllers
from the beginning.

I've changed the default 0.5 checkpoint_completion_target to 0.8 now,
should I go until 0.95 ?..

Also, to avoid TPS "waves" and bring stability on Read+Write workload
I followed advices from a parallel thread:

bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 4.0
shared_buffers = 1024MB

I've also tried shared_buffers=256MB as it was advised, but then
Read-Only workload decreasing performance as PG self caching helps
anyway.

Also, checkpoint_timeout is 30s now, and of course a huge difference
came with moving default_statistics_target to 5 !  -but this one I
found myself :-))

Probably checkpoint_timeout may be bigger now with the current
settings? - the goal here is to keep Read+Write TPS as stable as
possible and also avoid a long recovery in case of
system/database/other crash (in theory).

Rgds,
-Dimitri

On 5/12/09, Simon Riggs  wrote:
>
> On Tue, 2009-05-12 at 12:19 +0200, Dimitri wrote:
>
>> What I'm trying to do now is to understand what exactly is the
>> problem.
>
> You're running with 1600 users, which is above the scalability limit
> uncovered (by Sun...) during earlier benchmarking. The scalability
> issues are understood but currently considered above the
> reasonable-setting limit and so nobody has been inclined to improve
> matters.
>
> You should use a connection concentrator to reduce the number of
> sessions down to say 400.
>
> You're WAL buffers setting is also too low and you will be experiencing
> contention on the WALWriteLock. Increase wal_buffers to about x8 where
> you have it now.
>
> You can move pg_xlog to its own set of drives.
>
> Set checkpoint_completion_target to 0.95.
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

Hi Stefan,

sorry, I did not have a time to bring all details into the toolkit -
but at least I published it instead to tell a "nice story" about :-)

The client process is a binary compiled with libpq. Client is
interpreting a scenario script and publish via SHM a time spent on
each SQL request. I did not publish sources yet as it'll also require
to explain how to compile them :-)) So for the moment it's shipped as
a freeware, but with time everything will be available (BTW, you're
the first who asking for sources (well, except IBM guys who asked to
get it on POWER boxes, but it's another story :-))

What is good is each client is publishing *live* its internal stats an
we're able to get live data and follow any kind of "waves" in
performance. Each session is a single process, so there is no
contention between clients as you may see on some other tools. The
current scenario script contains 2 selects (representing a Read
transaction) and delete/insert/update (representing Write
transaction). According a start parameters each client executing a
given number Reads per Write. It's connecting on the beginning and
disconnecting at the end of the test.

It's also possible to extend it to do other queries, or simply give to
each client a different scenario script - what's important is to able
to collect then its stats live to understand what's going wrong (if
any)..

I'm planning to extend it and give an easy way to run it against any
database schema, it's only question of time..

Rgds,
-Dimitri

On 5/12/09, Stefan Kaltenbrunner  wrote:
> Dimitri wrote:
>> Folks, before you start to think "what a dumb guy doing a dumb thing" :-))
>> I'll explain you few details:
>>
>> it's for more than 10 years I'm using a db_STRESS kit
>> (http://dimitrik.free.fr/db_STRESS.html) to check databases
>> performance and scalability. Until now I was very happy with results
>> it gave me as it stress very well each database engine internals an
>> put on light some things I should probably skip on other workloads.
>> What do you want, with a time the "fast" query executed before in
>> 500ms now runs within 1-2ms  - not only hardware was improved but also
>> database engines increased their performance a lot! :-))
>
> I was attempting to look into that "benchmark" kit a bit but I find the
> information on that page a bit lacking :( a few notices:
>
> * is the sourcecode for the benchmark actually available? the "kit"
> seems to contain a few precompiled binaries and some source/headfiles
> but there are no building instructions, no makefile or even a README
> which makes it really hard to verify exactly what the benchmark is doing
> or if the benchmark client might actually be the problem here.
>
> * there is very little information on how the toolkit talks to the
> database - some of the binaries seem to contain a static copy of libpq
> or such?
>
> * how many queries per session is the toolkit actually using - some
> earlier comments seem to imply you are doing a connect/disconnect cycle
> for every query ist that actually true?
>
>
> Stefan
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

Robert, what I'm testing now is 256 users max. The workload is growing
progressively from 1, 2, 4, 8 ... to 256 users. Of course the Max
throughput is reached on the number of users equal to 2 * number of
cores, but what's important for me here - database should continue to
keep the workload! - response time regressing, but the troughput
should remain near the same.

So, do I really need a pooler to keep 256 users working??  - I don't
think so, but please, correct me.

BTW, I did not look to put PostgreSQL in bad conditions - the test is
the test, and as I said 2 years ago PostgreSQL outperformed MySQL on
the same test case, and there was nothing done within MySQL code to
improve it explicitly for db_STRESS.. And I'm staying pretty honest
when I'm testing something.

Rgds,
-Dimitri

On 5/12/09, Robert Haas  wrote:
> On Tue, May 12, 2009 at 8:59 AM, Dimitri  wrote:
>> Wait wait, currently I'm playing the "stress scenario", so there are
>> only 256 sessions max, but thing time is zero (full stress). Scenario
>> with 1600 users is to test how database is solid just to keep a huge
>> amount of users, but doing only one transaction per second (very low
>> global TPS comparing to what database is able to do, but it's testing
>> how well its internals working to manage the user sessions).
>
> Didn't we beat this to death in mid-March on this very same list?
> Last time I think it was Jignesh Shah.  AIUI, it's a well-known fact
> that PostgreSQL doesn't do very well at this kind of workload unless
> you use a connection pooler.
>
> *goes and checks the archives*  Sure enough, 116 emails under the
> subject line "Proposal of tunable fix for scalability of 8.4".
>
> So, if your goal is to find a scenario under which PostgreSQL performs
> as badly as possible, congratulations - you've discovered the same
> case that we already knew about.  Obviously it would be nice to
> improve it, but IIRC so far no one has had any very good ideas on how
> to do that.  If this example mimics a real-world workload that you
> care about, and if using a connection pooler is just not a realistic
> option in that scenario for whatever reason, then you'd be better off
> working on how to fix it than on measuring it, because it seems to me
> we already know it's got problems, per previous discussions.
>
> ...Robert
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] What is the most optimal config parameters to keep stable write TPS ?..

2009-05-12 Thread Dimitri

Good point!  I missed it.. - will 20MB be enough?

Rgds,
-Dimitri

On 5/12/09, Julian v. Bock  wrote:
> Hi
>
>>>>>> "D" == Dimitri   writes:
>
> D> current postgresql.conf:
>
> D> #
> D> max_connections = 2000 # (change requires restart)
> D> temp_buffers = 200MB
>
> temp_buffers are kept per connection and not freed until the session
> ends. If you use some kind of connection pooling this can eat up a lot
> of ram that could be used for caching instead.
>
> Regards,
> Julian
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

On 5/12/09, Stefan Kaltenbrunner  wrote:
> Dimitri wrote:
>> Hi Stefan,
>>
>> sorry, I did not have a time to bring all details into the toolkit -
>> but at least I published it instead to tell a "nice story" about :-)
>
> fair point and appreciated. But it seems important that benchmarking
> results can be verified by others as well...

until now there were only people running Solaris or Linux :-))

>
>>
>> The client process is a binary compiled with libpq. Client is
>> interpreting a scenario script and publish via SHM a time spent on
>> each SQL request. I did not publish sources yet as it'll also require
>> to explain how to compile them :-)) So for the moment it's shipped as
>> a freeware, but with time everything will be available (BTW, you're
>> the first who asking for sources (well, except IBM guys who asked to
>> get it on POWER boxes, but it's another story :-))
>
> well there is no licence tag(or a copyright notice) or anything als
> associated with the download which makes it a bit harder than it really
> needs to be.
> The reason why I was actually looking for the source is that all my
> available benchmark platforms are none of the ones you are providing
> binaries for which kinda reduces its usefulness.
>

agree, will improve this point

>>
>> What is good is each client is publishing *live* its internal stats an
>> we're able to get live data and follow any kind of "waves" in
>> performance. Each session is a single process, so there is no
>> contention between clients as you may see on some other tools. The
>> current scenario script contains 2 selects (representing a Read
>> transaction) and delete/insert/update (representing Write
>> transaction). According a start parameters each client executing a
>> given number Reads per Write. It's connecting on the beginning and
>> disconnecting at the end of the test.
>
> well I have seen clients getting bottlenecked internally (like wasting
> more time in getting rid/absorbing of the actual result than it took the
> server to generate the answer...).
> How sure are you that your "live publishing of data" does not affect the
> benchmark results(because it kinda generates an artifical think time)
> for example?

On all my test tools client are publishing their data via shared
memory segment (ISM), all they do is just *incrementing* their current
stats values and continuing their processing. Another dedicated
program should be executed to print these stats - it's connecting to
the same SHM segment and printing a *difference* between values for
the current and the next interval. Let me know if you need more
details.

> But what I get from your answer is that you are basically doing one
> connect/disconnect per client and the testcase you are talking about has
> 256 clients?

Exactly, only one connect/disconnect per test, and number of clients
is growing progressively from 1, 2, 4, 8, 16, .. to 256

Rgds,
-Dimitri

>
>
> Stefan
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-12 Thread Dimitri

No, they keep connections till the end of the test.

Rgds,
-Dimitri

On 5/12/09, Joshua D. Drake  wrote:
> On Tue, 2009-05-12 at 17:22 +0200, Dimitri wrote:
>> Robert, what I'm testing now is 256 users max. The workload is growing
>> progressively from 1, 2, 4, 8 ... to 256 users. Of course the Max
>> throughput is reached on the number of users equal to 2 * number of
>> cores, but what's important for me here - database should continue to
>> keep the workload! - response time regressing, but the troughput
>> should remain near the same.
>>
>> So, do I really need a pooler to keep 256 users working??  - I don't
>> think so, but please, correct me.
>
> If they disconnect and reconnect yes. If they keep the connections live
> then no.
>
> Joshua D. Drake
>
> --
> PostgreSQL - XMPP: jdr...@jabber.postgresql.org
>Consulting, Development, Support, Training
>503-667-4564 - http://www.commandprompt.com/
>The PostgreSQL Company, serving since 1997
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-13 Thread Dimitri

On 5/12/09, Robert Haas  wrote:
> On Tue, May 12, 2009 at 1:00 PM, Dimitri  wrote:
>> On MySQL there is no changes if I set the number of sessions in the
>> config file to 400 or to 2000 - for 2000 it'll just allocate more
>> memory.
>
> I don't care whether the setting affects the speed of MySQL.  I want
> to know if it affects the speed of PostgreSQL.

the problem is they both have "max_connections" parameter, so as you
asked for MySQL I answered for MySQL, did not test yet for PostgreSQL,
will be in the next series..

>
>> After latest fix with default_statistics_target=5, version 8.3.7 is
>> running as fast as 8.4, even 8.4 is little little bit slower.
>>
>> I understand your position with a pooler, but I also want you think
>> about idea that 128 cores system will become a commodity server very
>> soon, and to use these cores on their full power you'll need a
>> database engine capable to run 256 users without pooler, because a
>> pooler will not help you here anymore..
>
> So what?  People with 128-core systems will not be running trivial
> joins that return in 1-2ms and have one second think times between
> them.  And if they are, and if they have nothing better to do than
> worry about whether MySQL can process those queries in 1/2000th of the
> think time rather than 1/1000th of the think time, then they can use
> MySQL.  If we're going to worry about performance on 128-core system,
> we would be much better advised to put our efforts into parallel query
> execution than how many microseconds it takes to execute very simple
> queries.

Do you really think nowdays for example a web forum application having
PG as a backend will have queries running slower than 1-2ms to print a
thread message within your browser???  or banking transactions??

>
> Still, I have no problem with making PostgreSQL faster in the case
> you're describing.  I'm just not interested in doing it on my own time
> for free.  I am sure there are a number of people who read this list
> regularly who would be willing to do it for money, though.  Maybe even
> me.  :-)
>
> ...Robert
>

You don't need to believe me, but I'm doing it for free - I still have
my work to finish in parallel :-))  And on the same time I don't see
any other way to learn and improve my knowledge, but nobody is perfect
:-))

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-13 Thread Dimitri

Hi Scott,

On 5/12/09, Scott Carey  wrote:
> Although nobody wants to support it, he should try the patch that Jignesh K.
> Shah (from Sun) proposed that makes ProcArrayLock lighter-weight.  If it
> makes 32 cores much faster, then we have a smoking gun.
>
> Although everyone here is talking about this as an 'unoptimal' solution, the
> fact is there is no evidence that a connection pooler will fix the
> scalability from 16 > 32 cores.
> Certainly a connection pooler will help most results, but it may not fix the
> scalability problem.
>
> A question for Dimitri:
> What is the scalability from 16 > 32 cores at the 'peak' load that occurs
> near 2x the CPU count?  Is it also poor?  If this is also poor, IMO the
> community here should not be complaining about this unopimal case -- a
> connection pooler at that stage does little and prepared statements will
> increase throughput but not likely alter scalability.

I'm attaching a small graph showing a TPS level on PG 8.4 depending on
number of cores (X-axis is a number of concurrent users, Y-axis is the
TPS number). As you may see TPS increase is near linear while moving
from 8 to 16 cores, while on 32cores even it's growing slightly
differently, what is unclear is why TPS level is staying limited to
11.000 TPS on 32cores. And it's pure read-only workload.

>
> If that result scales, then the short term answer is a connection pooler.
>
> In the tests that Jingesh ran -- making the ProcArrayLock faster helped the
> case where connections = 2x the CPU core count quite a bit.
>
> The thread about the CPU scalability is "Proposal of tunable fix for
> scalability of 8.4", originally posted by "Jignesh K. Shah"
> , March 11 2009.
>
> It would be very useful to see results of this benchmark with:
> 1. A Connection Pooler

will not help, as each client is *not* disconnecting/reconnecting
during the test, as well PG is keeping well even 256 users. And TPS
limit is reached already on 64 users, don't think pooler will help
here.

> 2. Jignesh's patch

I've already tested it and it did not help in my case because the real
problem is elsewhere.. (however, I did not test it yet with my latest
config params)

> 3. Prepared statements
>

yes, I'm preparing this test.

> #3 is important, because prepared statements are ideal for queries that
> perform well with low statistics_targets, and not ideal for those that
> require high statistics targets.  Realistically, an app won't have more than
> a couple dozen statement forms to prepare.  Setting the default statistics
> target to 5 is just a way to make some other query perform bad.

Agree, but as you may have a different statistic target *per* table it
should not be a problem. What is sure - all time spent on parse and
planner will be removed here, and the final time should be a pure
execution.

Rgds,
-Dimitri

>
>
> On 5/12/09 10:53 AM, "Alvaro Herrera"  wrote:
>
>> Andres Freund escribió:
>>
>>> Naturally it would still be nice to be good in this not optimal
>>> workload...
>>
>> I find it hard to justify wasting our scarce development resources into
>> optimizing such a contrived workload.
>>
>> --
>> Alvaro Herrera
>> http://www.CommandPrompt.com/
>> The PostgreSQL Company - Command Prompt, Inc.
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
<>
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-13 Thread Dimitri

I'm also confused, but seems discussion giving also other ideas :-)
But yes, each client is connecting to the database server only *once*.

To presice how the test is running:
 - 1 client is started  => 1 in total
 - sleep ...
 - 1 another client is started  => 2 in total
 - sleep ..
 - 2 another clients are started => 4 in total
 - sleep ..
 ...
 ... ===> 256 in total
 - sleep ...
 - kill clients

So I even able to monitor how each new client impact all others. The
test kit is quite flexible to prepare any kind of stress situations.

Rgds,
-Dimitri

On 5/12/09, Glenn Maynard  wrote:
> I'm sorry, but I'm confused.  Everyone keeps talking about connection
> pooling, but Dimitri has said repeatedly that each client makes a
> single connection and then keeps it open until the end of the test,
> not that it makes a single connection per SQL query.  Connection
> startup costs shouldn't be an issue.  Am I missing something here?
> test(N) starts N clients, each client creates a single connection and
> hammers the server for a while on that connection.  test(N) is run for
> N=1,2,4,8...256.  This seems like a very reasonable test scenario.
>
> --
> Glenn Maynard
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-13 Thread Dimitri

On MySQL there is no changes if I set the number of sessions in the
config file to 400 or to 2000 - for 2000 it'll just allocate more
memory.

After latest fix with default_statistics_target=5, version 8.3.7 is
running as fast as 8.4, even 8.4 is little little bit slower.

I understand your position with a pooler, but I also want you think
about idea that 128 cores system will become a commodity server very
soon, and to use these cores on their full power you'll need a
database engine capable to run 256 users without pooler, because a
pooler will not help you here anymore..

Rgds,
-Dimitri

On 5/12/09, Robert Haas  wrote:
> On Tue, May 12, 2009 at 11:22 AM, Dimitri  wrote:
>> Robert, what I'm testing now is 256 users max. The workload is growing
>> progressively from 1, 2, 4, 8 ... to 256 users. Of course the Max
>> throughput is reached on the number of users equal to 2 * number of
>> cores, but what's important for me here - database should continue to
>> keep the workload! - response time regressing, but the troughput
>> should remain near the same.
>>
>> So, do I really need a pooler to keep 256 users working??  - I don't
>> think so, but please, correct me.
>
> Not an expert on this, but there has been a lot of discussion of the
> importance of connection pooling in this space.  Is MySQL still faster
> if you lower max_connections to a value that is closer to the number
> of users, like 400 rather than 2000?
>
>> BTW, I did not look to put PostgreSQL in bad conditions - the test is
>> the test, and as I said 2 years ago PostgreSQL outperformed MySQL on
>> the same test case, and there was nothing done within MySQL code to
>> improve it explicitly for db_STRESS.. And I'm staying pretty honest
>> when I'm testing something.
>
> Yeah but it's not really clear what that something is.  I believe you
> said upthread that PG 8.4 beta 1 is faster than PG 8.3.7, but PG 8.4
> beta 1 is slower than MySQL 5.4 whereas PG 8.3.7 was faster than some
> older version of MySQL.  So PG got faster and MySQL got faster, but
> they sped things up more than we did.  If our performance were getting
> WORSE, I'd be worried about that, but the fact that they were able to
> make more improvement on this particular case than we were doesn't
> excite me very much.  Sure, I'd love it if PG were even faster than it
> is, and if you have a suggested patch please send it in...  or if you
> want to profile it and send the results that would be great too.  But
> I guess my point is that the case of a very large number of
> simultaneous users with pauses-for-thought between queries has already
> been looked at in the very recent past in a way that's very similar to
> what you are doing (and by someone who works at the same company you
> do, no less!) so I'm not quite sure why we're rehashing the issue.
>
> ...Robert
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-13 Thread Dimitri

The idea is good, but *only* pooling will be not enough. I mean if all
what pooler is doing is only keeping no more than N backends working -
it'll be not enough. You never know what exactly your query will do -
if you choose your N value to be sure to not overload CPU and then
some of your queries start to read from disk - you waste your idle CPU
time because it was still possible to run other queries requiring CPU
time rather I/O, etc...

I wrote some ideas about an "ideal" solution here (just omit the word
"mysql" - as it's a theory it's valable for any db engine):
http://dimitrik.free.fr/db_STRESS_MySQL_540_and_others_Apr2009.html#note_5442

Rgds,
-Dimitri

On 5/13/09, Kevin Grittner  wrote:
> Glenn Maynard  wrote:
>> I'm sorry, but I'm confused.  Everyone keeps talking about
>> connection pooling, but Dimitri has said repeatedly that each client
>> makes a single connection and then keeps it open until the end of
>> the test, not that it makes a single connection per SQL query.
>> Connection startup costs shouldn't be an issue.  Am I missing
>> something here?
>
> Quite aside from the overhead of spawning new processes, if you have
> more active connections than you have resources for them to go after,
> you just increase context switching and resource contention, both of
> which have some cost, without any offsetting gains.  That would tend
> to explain why performance tapers off after a certain point.  A
> connection pool which queues requests prevents this degradation.
>
> It would be interesting, with each of the CPU counts, to profile
> PostgreSQL at the peak of each curve to see where the time goes when
> it is operating with an optimal poolsize.  Tapering after that point
> is rather uninteresting, and profiles would be less useful beyond that
> point, as the noise from the context switching and resource contention
> would make it harder to spot issues that really matter..
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-14 Thread Dimitri

Folks, sorry, I'm outpassed little bit by the events :-))

I've finished tests with PREPARE/EXECUTE - it's much faster of course,
and the max TSP is 15.000 now on 24 cores!  - I've done various tests
to see where is the limit bottleneck may be present - it's more likely
something timer or interrupt based, etc. Nothing special via DTrace,
or probably it'll say you more things then me, but for a 10sec period
it's quite small wait time:

# lwlock_wait_8.4.d `pgrep -n postgres`

   Lock IdMode   Count
 FirstBufMappingLock   Exclusive   1
  FirstLockMgrLock   Exclusive   1
   BufFreelistLock   Exclusive   3
 FirstBufMappingLock  Shared   4
  FirstLockMgrLock  Shared   4

   Lock IdMode   Combined Time (ns)
  FirstLockMgrLock   Exclusive 803700
   BufFreelistLock   Exclusive 3001600
  FirstLockMgrLock  Shared   4586600
 FirstBufMappingLock   Exclusive  6283900
 FirstBufMappingLock  Shared 21792900

On the same time those lock waits are appearing only on 24 or 32 cores.
I'll plan to replay this case on the bigger server (64 cores or more)
- it'll be much more evident if the problem is in locks.

Currently I'm finishing my report with all data all of you asked
(system graphs, pgsql, and other). I'll publish it on my web site and
send you a link.

Rgds,
-Dimitri

On 5/14/09, Simon Riggs  wrote:
>
> On Tue, 2009-05-12 at 14:28 +0200, Dimitri wrote:
>
>> As problem I'm considering a scalability issue on Read-Only workload -
>> only selects, no disk access, and if on move from 8 to 16 cores we
>> gain near 100%, on move from 16 to 32 cores it's only 10%...
>
> Dimitri,
>
> Will you be re-running the Read-Only tests?
>
> Can you run the Dtrace script to assess LWlock contention during the
> run?
>
> Would you re-run the tests with a patch?
>
> Thanks,
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-14 Thread Dimitri

It's absolutely great!
it'll not help here because a think time is 0.
but for any kind of solution with a spooler it's a must to try!

Rgds,
-Dimitri

On 5/13/09, Dimitri Fontaine  wrote:
> Hi,
>
> Le 13 mai 09 à 18:42, Scott Carey a écrit :
>>> will not help, as each client is *not* disconnecting/reconnecting
>>> during the test, as well PG is keeping well even 256 users. And TPS
>>> limit is reached already on 64 users, don't think pooler will help
>>> here.
>>
>> Actually, it might help a little.  Postgres has a flaw that makes
>> backends
>> block on a lock briefly based on the number of total backends --
>> active or
>> completely passive.  Your tool has some (very small) user-side delay
>> and a
>> connection pooler would probably allow 64 of your users to
>> efficiently 'fit'
>> in 48 or so connection pooler slots.
>
> It seems you have think time, and I'm only insisting on what Scott
> said, but having thinktime means a connection pool can help. Pgbouncer
> is a good choice because it won't even attempt to parse the queries,
> and it has a flexible configuration.
>
>>>> 3. Prepared statements
>>> yes, I'm preparing this test.
>
> It's possible to use prepared statement and benefit from pgbouncer at
> the same time, but up until now it requires the application to test
> whether its statements are already prepared at connection time,
> because the application is not controlling when pgbouncer is reusing
> an existing backend or giving it a fresh one.
>
> As I think I need this solution too, I've coded a PG module to scratch
> that itch this morning, and just published it (BSD licenced) on
> pgfoundry:
>http://preprepare.projects.postgresql.org/README.html
>http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/preprepare/preprepare/
>
> With this module and the proper pgbouncer setup (connect_query='SELECT
> prepare_all();') the application has no more to special case the fresh-
> backend-nothing-prepared case, it's all transparent, just replace your
> SELECT query with its EXECUTE foo(x, y, z) counter part.
>
> I've took the approach to setup the prepared statements themselves
> into a table with columns name and statement, this latter one
> containing the full PREPARE SQL command. There's a custom variable
> preprepare.relation that has to be your table name (shema qualified).
> Each statement that you then put in there will get prepared when you
> SELECT prepare_all();
>
> Hope this helps, regards,
> --
> dim

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-14 Thread Dimitri

Hi Scott,

let me now finish my report and regroup all data together, and then
we'll continue discussion as it'll come more in debug/profile phase..
- I'll be not polite from my part to send some tons of attachments to
the mail list :-)

Rgds,
-Dimitri

On 5/13/09, Scott Carey  wrote:
>
> On 5/13/09 3:22 AM, "Dimitri"  wrote:
>
>> Hi Scott,
>>
>> On 5/12/09, Scott Carey  wrote:
>>> Although nobody wants to support it, he should try the patch that Jignesh
>>> K.
>>> Shah (from Sun) proposed that makes ProcArrayLock lighter-weight.  If it
>>> makes 32 cores much faster, then we have a smoking gun.
>>>
>>> Although everyone here is talking about this as an 'unoptimal' solution,
>>> the
>>> fact is there is no evidence that a connection pooler will fix the
>>> scalability from 16 > 32 cores.
>>> Certainly a connection pooler will help most results, but it may not fix
>>> the
>>> scalability problem.
>>>
>>> A question for Dimitri:
>>> What is the scalability from 16 > 32 cores at the 'peak' load that occurs
>>> near 2x the CPU count?  Is it also poor?  If this is also poor, IMO the
>>> community here should not be complaining about this unopimal case -- a
>>> connection pooler at that stage does little and prepared statements will
>>> increase throughput but not likely alter scalability.
>>
>> I'm attaching a small graph showing a TPS level on PG 8.4 depending on
>> number of cores (X-axis is a number of concurrent users, Y-axis is the
>> TPS number). As you may see TPS increase is near linear while moving
>> from 8 to 16 cores, while on 32cores even it's growing slightly
>> differently, what is unclear is why TPS level is staying limited to
>> 11.000 TPS on 32cores. And it's pure read-only workload.
>>
>
> Interesting.  What hardware is this, btw? Looks like the 32 core system
> probably has 2x the CPU and a bit less interconnect efficiency versus the 16
> core one (which would be typical).
> Is the 16 core case the same, but with fewer cores per processor active?  Or
> fewer processors total?
> Understanding the scaling difference may require a better understanding of
> the other differences besides core count.
>
>>>
>>> If that result scales, then the short term answer is a connection pooler.
>>>
>>> In the tests that Jingesh ran -- making the ProcArrayLock faster helped
>>> the
>>> case where connections = 2x the CPU core count quite a bit.
>>>
>>> The thread about the CPU scalability is "Proposal of tunable fix for
>>> scalability of 8.4", originally posted by "Jignesh K. Shah"
>>> , March 11 2009.
>>>
>>> It would be very useful to see results of this benchmark with:
>>> 1. A Connection Pooler
>>
>> will not help, as each client is *not* disconnecting/reconnecting
>> during the test, as well PG is keeping well even 256 users. And TPS
>> limit is reached already on 64 users, don't think pooler will help
>> here.
>>
>
> Actually, it might help a little.  Postgres has a flaw that makes backends
> block on a lock briefly based on the number of total backends -- active or
> completely passive.  Your tool has some (very small) user-side delay and a
> connection pooler would probably allow 64 of your users to efficiently 'fit'
> in 48 or so connection pooler slots.
>
> It is not about connecting and disconnecting in this case, its about
> minimizing Postgres' process count.  If this does help, it would hint at
> certain bottlenecks.  If it doesn't it would point elsewhere (and quiet some
> critics).
>
> However, its unrealistic for any process-per-connection system to have less
> backends than about 2x the core count -- else any waiting on I/O or network
> will just starve CPU.  So this would just be done for research, not a real
> answer to making it scale better.
>
> For those who say "but, what if its I/O bound!   You don't need more
> backends then!":   Well you don't need more CPU either if you're I/O bound.
> By definition, CPU scaling tests imply the I/O can keep up.
>
>
>>> 2. Jignesh's patch
>>
>> I've already tested it and it did not help in my case because the real
>> problem is elsewhere.. (however, I did not test it yet with my latest
>> config params)
>>
>
> Great to hear that! -- That means this case is probably not ProcArrayLock.
> If its Solaris, could we get:
> 1. What is the CPU stats when it is in the inefficient

Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Dimitri

Folks, I've just published a full report including all results here:
http://dimitrik.free.fr/db_STRESS_PostgreSQL_837_and_84_May2009.html

>From my point of view it needs first to understand where the time is
wasted on a single query (even when the statement is prepared it runs
still slower comparing to MySQL).

Then to investigate on scalability issue I think a bigger server will
be needed here (I'm looking for 64cores at least :-))

If  you have some other ideas or patches (like Simon) - don't hesitate
to send them - once I'll get an access to the server again the
available test time will be very limited..

Best regards!
-Dimitri


On 5/18/09, Simon Riggs  wrote:
>
> On Thu, 2009-05-14 at 20:25 +0200, Dimitri wrote:
>
>> # lwlock_wait_8.4.d `pgrep -n postgres`
>
>>Lock IdMode   Combined Time (ns)
>>   FirstLockMgrLock   Exclusive 803700
>>BufFreelistLock   Exclusive 3001600
>>   FirstLockMgrLock  Shared   4586600
>>  FirstBufMappingLock   Exclusive  6283900
>>  FirstBufMappingLock  Shared 21792900
>
> I've published two patches to -Hackers to see if we can improve the read
> only numbers on 32+ cores.
>
> Try shared_buffer_partitions = 256
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Dimitri

Thanks Dave for correction, but I'm also curious where the time is
wasted in this case?..

0.84ms is displayed by "psql" once the result output is printed, and I
got similar time within my client (using libpq) which is not printing
any output..

Rgds,
-Dimitri

On 5/18/09, Dave Dutcher  wrote:
>
> What I don't understand is the part where you talking about disabling hash
> joins:
>
>>* result: planner replaced hash join is replaced by merge join
>>* execution time: 0.84ms !
>>* NOTE: curiously planner is expecting to execute this query in 0.29ms
> - so it's supposed from its logic to be faster, so why this plan is not used
> from the beginning???...
>>
>> Sort  (cost=4562.83..4568.66 rows=2329 width=176) (actual
> time=0.237..0.237 rows=20 loops=1)
>> Sort Key: h.horder
>> Sort Method:  quicksort  Memory: 30kB
>> ->  Merge Join  (cost=4345.89..4432.58 rows=2329 width=176)
> (actual time=0.065..0.216 rows=20 loops=1)
>>   Merge Cond: (s.ref = h.ref_stat)
>>   ->  Index Scan using stat_ref_idx on stat s
> (cost=0.00..49.25 rows=1000 width=45) (actual time=0.018..0.089 rows=193
> loops=1)
>>   ->  Sort  (cost=4345.89..4351.72 rows=2329 width=135)
> (actual time=0.042..0.043 rows=20 loops=1)
>> Sort Key: h.ref_stat
>> Sort Method:  quicksort  Memory: 30kB
>> ->  Index Scan using history_ref_idx on history h
> (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.012..0.025 rows=20
> loops=1)
>>   Index Cond: (ref_object = '01'::bpchar)
>> Total runtime: 0.288 ms
>>(12 rows)
>
> The explain analyze ran the query in 0.288 ms.  That is the actual time it
> took to run the query on the server.  It is not an estimate of the time.
> You measured 0.84 ms to run the query, which seems to imply either a problem
> in one of the timing methods or that 66% of your query execution time is
> sending the results to the client.  I'm curious how you did you execution
> time measurements.
>
> Dave
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Dimitri

On 5/18/09, Scott Carey  wrote:
> Great data Dimitri!'

Thank you! :-)

>
> I see a few key trends in the poor scalability:
>
> The throughput scales roughly with %CPU fairly well.  But CPU used doesn't
> go past ~50% on the 32 core tests.  This indicates lock contention.
>

You should not look on #1 STATs, but on #2 - they are all with the
latest "fixes"  - on all of them CPU is used well (90% in pic on
32cores).
Also, keep in mind these cores are having 2 threads, and from Solaris
point of view they are seen as CPU (so 64 CPU) and %busy is accounted
as for 64 CPU

> Other proof of lock contention are the mutex locks / sec graph which climbs

exactly, except no locking was seen on processes while I tried to
trace them.. What I think will be needed here is a global and
corelated tracing of all PG processes - I did not expect to do it now,
but next time

> rapidly as the system gets more inefficient (along with context switches).
>
> Another trend is the system calls/sec which caps out with the test, at about
> 400,000 per sec on the peak (non-prepared statement) result.  Note that when
> the buffer size is 256MB, the performance scales much worse and is slower.
> And correlated with this the system calls/sec per transaction is more than
> double, at slower throughput.

of course, because even the data were cached by filesystem to get them
you still need to call a read() system call..

>
> Using the OS to cache pages is not as fast as pages in shared_buffers, by a
> more significant amount with many cores and higher concurrency than in the
> low concurrency case.


exactly, it's what I also wanted to demonstrate because I often hear
"PG is delegating caching to the filesystem" - and I don't think it's
optimal :-)

>
> The system is largely lock limited in the poor scaling results.  This holds
> true with or without the use of prepared statements -- which help a some,
> but not a lot and don't affect the scalability.

we are agree here, but again - 20K mutex spins/sec is a quite low
value, that's why I hope on the bigger server it'll be more clear
where is a bottleneck :-)

Rgds,
-Dimitri


>
>
> 4096MB shared buffers, 32 cores, 8.4, read only:
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html
>
> 256MB cache, 32 cores, 8.4, read-only:
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_52.html
>
> 4096MB shared buffs, 32 cores, 8.4, read only, prepared statements
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html
>
> On 5/18/09 11:00 AM, "Dimitri"  wrote:
>
>> Folks, I've just published a full report including all results here:
>> http://dimitrik.free.fr/db_STRESS_PostgreSQL_837_and_84_May2009.html
>>
>> From my point of view it needs first to understand where the time is
>> wasted on a single query (even when the statement is prepared it runs
>> still slower comparing to MySQL).
>>
>> Then to investigate on scalability issue I think a bigger server will
>> be needed here (I'm looking for 64cores at least :-))
>>
>> If  you have some other ideas or patches (like Simon) - don't hesitate
>> to send them - once I'll get an access to the server again the
>> available test time will be very limited..
>>
>> Best regards!
>> -Dimitri
>>
>>
>> On 5/18/09, Simon Riggs  wrote:
>>>
>>> On Thu, 2009-05-14 at 20:25 +0200, Dimitri wrote:
>>>
>>>> # lwlock_wait_8.4.d `pgrep -n postgres`
>>>
>>>>Lock IdMode   Combined Time (ns)
>>>>   FirstLockMgrLock   Exclusive 803700
>>>>BufFreelistLock   Exclusive 3001600
>>>>   FirstLockMgrLock  Shared   4586600
>>>>  FirstBufMappingLock   Exclusive  6283900
>>>>  FirstBufMappingLock  Shared 21792900
>>>
>>> I've published two patches to -Hackers to see if we can improve the read
>>> only numbers on 32+ cores.
>>>
>>> Try shared_buffer_partitions = 256
>>>
>>> --
>>>  Simon Riggs   www.2ndQuadrant.com
>>>  PostgreSQL Training, Services and Support
>>>
>>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-18 Thread Dimitri

On 5/18/09, Simon Riggs  wrote:
>
> On Mon, 2009-05-18 at 20:00 +0200, Dimitri wrote:
>
>> >From my point of view it needs first to understand where the time is
>> wasted on a single query (even when the statement is prepared it runs
>> still slower comparing to MySQL).
>
> There is still a significant number of things to say about these numbers
> and much tuning still to do, so I'm still confident of improving those
> numbers if we needed to.
>
> In particular, running the tests repeatedly using
>   H.REF_OBJECT = '01'
> rather than varying the value seems likely to benefit MySQL. The

let me repeat again - the reference is *random*,
the '01' value I've used just to show a query execution
plan.

also, what is important - the random ID is chosen in way that no one
user use the same to avoid deadlocks previously seen with PostgreSQL
(see the "Deadlock mystery" note 2 years ago
http://dimitrik.free.fr/db_STRESS_BMK_Part1.html#note_4355 )

> distribution of values is clearly non-linear; while Postgres picks a
> strange plan for that particular value, I would guess there are also
> values for which the MySQL plan is sub-optimal. Depending upon the
> distribution of selected data we might see the results go either way.
>
> What I find worrying is your result of a scalability wall for hash
> joins. Is that a repeatable issue?

I think yes (but of course I did not try to replay it several times)

Rgds,
-Dimitri


>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

No, Tom,  the query cache was off.
I put it always explicitly off on MySQL as it has scalability issues.

Rgds,
-Dimitri

On 5/19/09, Tom Lane  wrote:
> Simon Riggs  writes:
>> In particular, running the tests repeatedly using
>>  H.REF_OBJECT = '01'
>> rather than varying the value seems likely to benefit MySQL.
>
> ... mumble ... query cache?
>
>   regards, tom lane
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

On 5/19/09, Scott Carey  wrote:
>
> On 5/18/09 3:32 PM, "Dimitri"  wrote:
>
>> On 5/18/09, Scott Carey  wrote:
>>> Great data Dimitri!'
>>
>> Thank you! :-)
>>
>>>
>>> I see a few key trends in the poor scalability:
>>>
>>> The throughput scales roughly with %CPU fairly well.  But CPU used
>>> doesn't
>>> go past ~50% on the 32 core tests.  This indicates lock contention.
>>>
>>
>> You should not look on #1 STATs, but on #2 - they are all with the
>> latest "fixes"  - on all of them CPU is used well (90% in pic on
>> 32cores).
>> Also, keep in mind these cores are having 2 threads, and from Solaris
>> point of view they are seen as CPU (so 64 CPU) and %busy is accounted
>> as for 64 CPU
>>
>
> Well, if the CPU usage is actually higher, then it might not be lock waiting
> -- it could be spin locks or context switches or cache coherency overhead.
> Postgres may also not be very SMT friendly, at least on the hardware tested
> here.

do you mean SMP or CMT? ;-)
however both should work well with PostgreSQL. I also think about CPU
affinity - probably it may help to avoid CPU cache misses - but makes
sense mostly if pooler will be added as a part of PG.

>
> (what was the context switch rate?  I didn't see that in the data, just
> mutex spins).

increasing with a load, as this ex.:
http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_100.html#bmk_CPU_CtxSwitch_100


>
> The scalability curve is definitely showing something.  Prepared statements
> were tried, as were most of the other suggestions other than one:
>
> What happens if the queries are more complicated (say, they take 15ms server
> side with a more complicated plan required)?  That is a harder question to
> answer

What I observed is: if planner takes more long time (like initially
with 8.3.7 and analyze target 1000) the scalability problem is
appearing more strange -
http://dimitrik.free.fr/Report_20090505/5521_dim_STAT_18.html - as you
see CPU even not used more than 60% , and as you may see spin locks
are lowering - CPUs are not spinning for locks, there is something
else..
I'm supposing a problem of some kind of synchronization - background
processes are not waking up on time or something like this...
Then, if more time spent on the query execution itself and not planner:
 - if it'll be I/O time - I/O will hide everything else until you
increase a storage performance and/or add more RAM, but then you come
back to the initial issue :-)
 - if it'll be a CPU time it may be interesting! :-)

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

On 5/19/09, Simon Riggs  wrote:
>
> On Tue, 2009-05-19 at 00:33 +0200, Dimitri wrote:
>> >
>> > In particular, running the tests repeatedly using
>> >H.REF_OBJECT = '01'
>> > rather than varying the value seems likely to benefit MySQL. The
>>
>> let me repeat again - the reference is *random*,
>> the '01' value I've used just to show a query execution
>> plan.
>>
>> also, what is important - the random ID is chosen in way that no one
>> user use the same to avoid deadlocks previously seen with PostgreSQL
>> (see the "Deadlock mystery" note 2 years ago
>> http://dimitrik.free.fr/db_STRESS_BMK_Part1.html#note_4355 )
>
> OK, didn't pick up on that.
>
> (Like Tom, I was thinking query cache)
>
> Can you comment on the distribution of values for that column? If you
> are picking randomly, this implies distribution is uniform and so I am
> surprised we are mis-estimating the selectivity.

yes, the distribution of reference values is uniform between
'01' to '001000' (10M), only one OBJECT row by one
reference, and only 20 rows with the same reference in HISTORY table.

>
>> I think yes (but of course I did not try to replay it several times)
>
> If you could that would be appreciated. We don't want to go chasing
> after something that is not repeatable.

I'll retry and let you know.

Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

I may confirm the issue with hash join - it's repeating both with
prepared and not prepared statements - it's curious because initially
the response time is lowering near ~1ms (the lowest seen until now)
and then once workload growing to 16 sessions it's jumping to 2.5ms,
then with 32 sessions it's 18ms, etc..

I've retested on 24 isolated cores, so any external secondary effects
are avoided.

Rgds,
-Dimitri

On 5/19/09, Dimitri  wrote:
> On 5/19/09, Simon Riggs  wrote:
>>
>> On Tue, 2009-05-19 at 00:33 +0200, Dimitri wrote:
>>> >
>>> > In particular, running the tests repeatedly using
>>> >   H.REF_OBJECT = '01'
>>> > rather than varying the value seems likely to benefit MySQL. The
>>>
>>> let me repeat again - the reference is *random*,
>>> the '01' value I've used just to show a query execution
>>> plan.
>>>
>>> also, what is important - the random ID is chosen in way that no one
>>> user use the same to avoid deadlocks previously seen with PostgreSQL
>>> (see the "Deadlock mystery" note 2 years ago
>>> http://dimitrik.free.fr/db_STRESS_BMK_Part1.html#note_4355 )
>>
>> OK, didn't pick up on that.
>>
>> (Like Tom, I was thinking query cache)
>>
>> Can you comment on the distribution of values for that column? If you
>> are picking randomly, this implies distribution is uniform and so I am
>> surprised we are mis-estimating the selectivity.
>
> yes, the distribution of reference values is uniform between
> '01' to '001000' (10M), only one OBJECT row by one
> reference, and only 20 rows with the same reference in HISTORY table.
>
>>
>>> I think yes (but of course I did not try to replay it several times)
>>
>> If you could that would be appreciated. We don't want to go chasing
>> after something that is not repeatable.
>
> I'll retry and let you know.
>
> Rgds,
> -Dimitri
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

The response time is not progressive, it's simply jumping, it's likely
since 16 sessions there is a sort of serialization happening
somewhere.. As well on 16 sessions the throughput in TPS is near the
same as on 8 (response time is only twice bigger for the moment), but
on 32 it's dramatically dropping down..

Rgds,
-Dimitri


On 5/19/09, Simon Riggs  wrote:
>
> On Tue, 2009-05-19 at 14:00 +0200, Dimitri wrote:
>
>> I may confirm the issue with hash join - it's repeating both with
>> prepared and not prepared statements - it's curious because initially
>> the response time is lowering near ~1ms (the lowest seen until now)
>> and then once workload growing to 16 sessions it's jumping to 2.5ms,
>> then with 32 sessions it's 18ms, etc..
>
> Is it just bad all the time, or does it get worse over time?
>
> Do you get the same behaviour as 32 sessions if you run 16 sessions for
> twice as long?
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

On 5/19/09, Merlin Moncure  wrote:
> On Mon, May 18, 2009 at 6:32 PM, Dimitri  wrote:
>> Thanks Dave for correction, but I'm also curious where the time is
>> wasted in this case?..
>>
>> 0.84ms is displayed by "psql" once the result output is printed, and I
>> got similar time within my client (using libpq) which is not printing
>> any output..
>
> Using libpq?  What is the exact method you are using to execute
> queries...PQexec?

exactly

> If you are preparing queries against libpq, the
> best way to execute queries is via PQexecPrepared.

the query is *once* prepared via PQexec,
then it's looping with "execute" via PQexec.
Why PQexecPrepared will be better in my case?..

> Also, it's
> interesting to see if you can get any benefit from asynchronous
> queries (PQsendPrepared), but this might involve more changes to your
> application than you are willing to make.
>
> Another note: I would like to point out again that there are possible
> negative side effects in using char(n) vs. varchar(n) that IIRC do not
> exist in mysql.  When you repeat your test I strongly advise switching
> to varchar.

if it's true for any case, why not just replace CHAR implementation by
VARCHAR directly within PG code?..

>
> Another question: how exactly are you connecting to the database?
> local machine? if so, domain socket or tcp/ip?

local TCP/IP, same as MySQL

>  What are you doing
> with the results...immediately discarding?

from PQ side they immediately discarded once all rows are fetched

>
> One last thing: when you get access to the server, can you run a
> custom format query test from pgbench and compare the results to your
> test similarly configured (same number of backends, etc) in terms of
> tps?

I'll try


Rgds,
-Dimitri

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

On 5/19/09, Scott Carey  wrote:
>
> On 5/19/09 3:46 AM, "Dimitri"  wrote:
>
>> On 5/19/09, Scott Carey  wrote:
>>>
>>> On 5/18/09 3:32 PM, "Dimitri"  wrote:
>>>
>>>> On 5/18/09, Scott Carey  wrote:
>>>>> Great data Dimitri!'
>>>>
>>>> Thank you! :-)
>>>>
>>>>>
>>>>> I see a few key trends in the poor scalability:
>>>>>
>>>>> The throughput scales roughly with %CPU fairly well.  But CPU used
>>>>> doesn't
>>>>> go past ~50% on the 32 core tests.  This indicates lock contention.
>>>>>
>>>>
>>>> You should not look on #1 STATs, but on #2 - they are all with the
>>>> latest "fixes"  - on all of them CPU is used well (90% in pic on
>>>> 32cores).
>>>> Also, keep in mind these cores are having 2 threads, and from Solaris
>>>> point of view they are seen as CPU (so 64 CPU) and %busy is accounted
>>>> as for 64 CPU
>>>>
>>>
>>> Well, if the CPU usage is actually higher, then it might not be lock
>>> waiting
>>> -- it could be spin locks or context switches or cache coherency
>>> overhead.
>>> Postgres may also not be very SMT friendly, at least on the hardware
>>> tested
>>> here.
>>
>> do you mean SMP or CMT? ;-)
>> however both should work well with PostgreSQL. I also think about CPU
>> affinity - probably it may help to avoid CPU cache misses - but makes
>> sense mostly if pooler will be added as a part of PG.
>
> Symmetric Multi Threading (HyperThreading in Intels marketing terms, other
> marketing terms for Sun or IBM).  One CPU core that can handle more than one
> concurrently executing thread.
> Technically, 'SMT' allows instructions in flight from multiple threads at
> once in a superscalar Cpu core while some implementations differ and might
> technically CMT (one thread or the other, but can switch fast, or a
> non-superscalar core).
>
> For many implementations of  'multiple threads on one CPU core' many of the
> processor resources are reduced per thread when it is active -- caches get
> split, instruction re-order buffers are split, etc.  That is rather hardware
> implementation dependant.
>
> For Intel's SMT (and other similar), spin-locks hurt scalability if they
> aren't using new special instructions for the spin to yield pipeline slots
> to the other thread.
>
> Generally, code that stresses common processor resources more than CPU
> execution will scale poorly with SMT/CMT etc.

All application are scaling well anyway, except if you have any kind
of lock contention inside of the application itself or meet any kind
of system resource become hot. But well, here we may spend days to
discuss :-)


>
> So I'm not sure about the Postgres details, but the general case of an
> application that doesn't benefit from these technologies exists, and there
> is a non-zero chance that Postgres has some characteristics of such an app.
>
>>>
>>> (what was the context switch rate?  I didn't see that in the data, just
>>> mutex spins).
>>
>> increasing with a load, as this ex.:
>> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_100.html#bmk_CPU_CtxSwit
>> ch_100
>>
>
> Well, on most systems over 100K context switches/sec is a lot.  And those
> reach 18 /sec.
> However, this is 'only' 10 context switches per transaction and less than
> 20% system CPU, so maybe those numbers aren't quite as big as they seem.
>
> Out of curiosity, what was the context switch rate for MySql at its peak
> throughput?

the main MySQL problem is a mutex locking like here:
http://dimitrik.free.fr/Report_20090504/5465_dim_STAT_31.html#bmk_SpinMtx_31
so you have to limit a number of active threads to lower this
contention (similar to pooler idea folks told here)

and the context switch is even higher (~200K/sec)


>>
>>>
>>> The scalability curve is definitely showing something.  Prepared
>>> statements
>>> were tried, as were most of the other suggestions other than one:
>>>
>>> What happens if the queries are more complicated (say, they take 15ms
>>> server
>>> side with a more complicated plan required)?  That is a harder question
>>> to
>>> answer
>>
>> What I observed is: if planner takes more long time (like initially
>> with 8.3.7 and analyze target 1000) the scalability problem is
>> appearing more strange -
>> http://dimitrik.free.fr/Report_2009

Re: [PERFORM] Any better plan for this query?..

2009-05-19 Thread Dimitri

On 5/19/09, Merlin Moncure  wrote:
> On Tue, May 19, 2009 at 11:53 AM, Dimitri  wrote:
>> On 5/19/09, Merlin Moncure  wrote:
>>> On Mon, May 18, 2009 at 6:32 PM, Dimitri  wrote:
>>>> Thanks Dave for correction, but I'm also curious where the time is
>>>> wasted in this case?..
>>>>
>>>> 0.84ms is displayed by "psql" once the result output is printed, and I
>>>> got similar time within my client (using libpq) which is not printing
>>>> any output..
>>>
>>> Using libpq?  What is the exact method you are using to execute
>>> queries...PQexec?
>>
>> exactly
>>
>>> If you are preparing queries against libpq, the
>>> best way to execute queries is via PQexecPrepared.
>>
>> the query is *once* prepared via PQexec,
>> then it's looping with "execute" via PQexec.
>> Why PQexecPrepared will be better in my case?..
>
> It can be better or worse (usually better).  the parameters are
> separated from the query string.  Regardless of performance, the
> parametrized interfaces are superior for any queries taking arguments
> and should be used when possible.

you're probably right, but I don't like either when solution become so
complicated - PG has a so elegant way to execute a prepared query!


>
>>> Another note: I would like to point out again that there are possible
>>> negative side effects in using char(n) vs. varchar(n) that IIRC do not
>>> exist in mysql.  When you repeat your test I strongly advise switching
>>> to varchar.
>>
>> if it's true for any case, why not just replace CHAR implementation by
>> VARCHAR directly within PG code?..
>
> First, let me explain the difference.  char(n) is padded out to 'n' on
> disk and when returned.  despite this, the length is still stored so
> there is no real advantage to using the char(n) type except that the
> returned string is of a guaranteed length.   mysql, at least the
> particular version and storage engine that I am logged into right now,
> does not do this for char(n).  In other words, select cast('abc' as
> char(50)) returns a string of 50 chars on pgsql and 3 chars on mysql.
> I will leave it as an exercise to the reader to figure out whom is
> following the standard.  pg's handling of the situation is not
> necessarily optimal, but we just tell everyone to quit using 'char(n)'
> type.
>
> Unless for example your 'NOTE' column is mostly full or mostly null,
> your query is not fair because postgres has to both store and return a
> proportionally greater amount of data.  This makes the comparison
> hardly apples to apples.   This stuff counts when we are measuring at
> microsecond level.

Good point! I may confirm only at least at the beginning all fields
are fully filled within a database. Will test both engines with
VARCHAR next time to be sure it's not an issue.


>
>>> Another question: how exactly are you connecting to the database?
>>> local machine? if so, domain socket or tcp/ip?
>>
>> local TCP/IP, same as MySQL
>
> would be curious to see if you get different results from domain socket.

at least for PG there was no difference if I remember well.
However, before when I tested on the real network I finished by change
completely my code to reduce a network traffic (initially I've used
cursors), and finally PG traffic was lower or similar to MySQL, it was
an interesting stuff too :-)

Rgds,
-Dimitri


>
> merlin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Dimitri

Few weeks ago tested a customer application on 16 cores with Oracle:
  - 20,000 sessions in total
  - 70,000 queries/sec

without any problem on a mid-range Sun box + Solaris 10..

Rgds,
-Dimitri

On 6/3/09, Kevin Grittner  wrote:
> James Mansion  wrote:
>
>> I'm sure most of us evaluating Postgres from a background in Sybase
>> or SQLServer would regard 5000 connections as no big deal.
>
> Sure, but the architecture of those products is based around all the
> work being done by "engines" which try to establish affinity to
> different CPUs, and loop through the various tasks to be done.  You
> don't get a context switch storm because you normally have the number
> of engines set at or below the number of CPUs.  The down side is that
> they spend a lot of time spinning around queue access to see if
> anything has become available to do -- which causes them not to play
> nice with other processes on the same box.
>
> If you do connection pooling and queue requests, you get the best of
> both worlds.  If that could be built into PostgreSQL, it would
> probably reduce the number of posts requesting support for bad
> configurations, and help with benchmarks which don't use proper
> connection pooling for the product; but it would actually not add any
> capability which isn't there if you do your homework
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Dimitri

Just to say you don't need a mega server to keep thousands connections
with Oracle, it's just trivial, nor CPU affinity and other stuff you
may or may not need with Sybase :-)

Regarding PostgreSQL, I think it'll only benefit to have an integrated
connection pooler as it'll make happy all populations anyway:
  - those who don't like the idea may always disable it :-)
  - those who have a lot but mostly inactive sessions will be happy to
simplify session pooling
  - those who really seeking for the most optimal workload on their
servers will be happy twice: if there are any PG scalability limits,
integrated pooler will be in most cases more performant than external;
if there are no PG scalability limits - it'll still help to size PG
most optimally according a HW or OS capacities..

Rgds,
-Dimitri


On 6/3/09, Kevin Grittner  wrote:
> Dimitri  wrote:
>> Few weeks ago tested a customer application on 16 cores with Oracle:
>>   - 20,000 sessions in total
>>   - 70,000 queries/sec
>>
>> without any problem on a mid-range Sun box + Solaris 10..
>
> I'm not sure what point you are trying to make.  Could you elaborate?
>
> (If it's that Oracle doesn't need an external connection pool, then
> are you advocating that PostgreSQL include that in the base product?)
>
> -Kevin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?

2011-05-06 Thread Dimitri

You should rather consider VxFS tuning - it has an auto-discovery for
DIRECT I/O according the block size. Just change this setting to 8K or
16-32K depending on your workload - then all I/O operations with a
bigger block size will be executed in DIRECT mode and bypass FS cache
(which logical as usually it'll correspond to a full scan or a seq
scan of some data), while I/O requests with smaller blocks will remain
cached which is very useful as it'll mainly cache random I/O (mainly
index access)..

With such a tuning I've got over %35 performance improvement comparing
to any other states (full DIRECT or fully cached).

Rgds,
-Dimitri


Rgds,
-Dimitri

On 5/5/11, Robert Haas  wrote:
> On Sat, Apr 30, 2011 at 4:51 AM, Hsien-Wen Chu 
> wrote:
>> since the block size is 8k for the default, and it consisted with many
>> tuple/line; as my understand, if any tuple/line is changed(maybe
>> update, insert, delete). the block will be marked as dirty block. and
>> then it will be flashed to disk by bgwriter.
>
> True...
>
>> so my question is if the data block(8k) is aligned with the file
>> system block?  if it is aligned with file system block, so what's the
>> potential issue make it is not safe for direct io. (please  assume
>> vxfs, vxvm and the disk sector is aligned ).please correct me if any
>> incorrect.
>
> It's not about safety - it's about performance.  On a machine with
> 64GB of RAM, a typical setting for shared_buffers set to 8GB.  If you
> start reading blocks into the PostgreSQL cache - or writing them out
> of the cache - in a way that bypasses the filesystem cache, you're
> going to have only 8GB of cache, instead of some much larger amount.
> More cache = better performance.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2

2006-12-15 Thread Dimitri Fontaine

Hi list,

Le vendredi 15 décembre 2006 11:50, Martijn van Oosterhout a écrit :
> BTW, doing gettimeofday() without kernel entry is not really possible.
> You could use the cycle counter but it has the problem that if you have
> multiple CPUs you need to calibrate the result. If the CPU goes to
> sleep, there's is no way for the userspace process to know. Only the
> kernel has all the relevent information about what "time" is to get a
> reasonable result.

I remember having played with intel RDTSC (time stamp counter) for some timing 
measurement, but just read from several sources (including linux kernel 
hackers considering its usage for gettimeofday()  implementation) that TSC is 
not an accurate method to have elapsed time information.

May be some others method than gettimeofday() are available (Lamport 
Timestamps, as PGDG may have to consider having a distributed processing 
ready EA in some future), cheaper and accurate?
After all, the discussion, as far as I understand it, is about having a 
accurate measure of duration of events, knowing when they occurred in the day 
does not seem to be the point.

My 2¢, hoping this could be somehow helpfull,
-- 
Dimitri Fontaine
http://www.dalibo.com/


pgppVKlng4gwN.pgp
Description: PGP signature

Re: [PERFORM] Scaling concerns

2006-12-16 Thread Dimitri Fontaine

Le samedi 16 décembre 2006 18:32, Steinar H. Gunderson a écrit :
> Instead, try real queries on real data,
> and post here if some are too slow for you.

To quickly find out a subset of slow queries on your production system, you 
can use the pgfouine tool:
  http://pgfouine.projects.postgresql.org/

If you then want to make some measurements of PostgreSQL performances with 
some different settings and compare them, consider using the tsung tool (and 
may be tsung-ploter companion tool to graph several benchs onto the same 
charts for comparing purpose):
  http://pgfouine.projects.postgresql.org/tsung.html
  http://tsung.erlang-projects.org/
  http://debian.dalibo.org/unstable/

This latter link also contains a .tar.gz archive of tsung-ploter in case 
you're not running a debian system. Dependencies are python and matplotlib.

Regards,
-- 
Dimitri Fontaine
http://www.dalibo.com/


pgpyW4ZPkTS5m.pgp
Description: PGP signature

Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux

2007-01-03 Thread Dimitri Fontaine

Le jeudi 4 janvier 2007 00:18, Magnus Hagander a écrit :
> But to get a good answer on if the difference is
> significant enough to matter, you really need to run some kind of simple
> benchmark on *your* workload.

To easily stress test a couple of servers and compare results on *your* 
workload, please consider using both pgfouine[1,2] and tsung[3].

The companion tool tsung-ploter[4] (for plotting several results using common 
graph, hence scales), may also be usefull.

[1]: http://pgfouine.projects.postgresql.org/
[2]: http://pgfouine.projects.postgresql.org/tsung.html
[3]: http://tsung.erlang-projects.org/
[4]: http://debian.dalibo.org/unstable/tsung-ploter_0.1-1.tar.gz

Regards,
-- 
Dimitri Fontaine
http://www.dalibo.com/


pgpHLPZaAGz2d.pgp
Description: PGP signature

1 2 >

1 - 100 of 182 matches

Mail list logo