from:"mark"

Re: [PERFORM] Caching by Postgres

2005-08-23 Thread mark

On Tue, Aug 23, 2005 at 02:41:39PM -0400, Donald Courtney wrote:
> I mean well with this comment - 
> This whole issue of data caching is  a troubling issue with postreSQL
> in that even if you ran postgreSQL on a 64 bit address space
> with larger number of CPUs you won't see much of a scale up
> and possibly even a drop.   I am not alone in having the *expectation*
> that a database should have some cache size parameter and
> the option to skip the file system.   If I use oracle, sybase, mysql
> and maxdb they all have the ability to size a data cache and move
> to 64 bits.
> Is this a crazy idea - that a project be started to get this adopted?  
> Is it
> too big and structural to contemplate? 
> From one who likes postgreSQL

Hey Donald. :-)

This is an operating system issue, not a PostgreSQL issue. If you have
more physical memory than fits in 32-bit addresses, and your operating
system isn't using this extra memory to cache files (or anything
else), than your OS is what I would consider to be broken (or at the
very least, not designed for a 64-bit host).

The only questions that can be asked here is - 1) can PostgreSQL do a
better job than the OS at best utilizing system RAM, and 2) if so, is
the net gain worth the added complexity to PostgreSQL?

I happen to think that yes, PostgreSQL can do a better job than most
OS's, as it has better information to make decisions as to which pages
are worth keeping, and which are not, but no, it isn't worth the
effort until PostgreSQL developers start running out of things to do.

Buy your 64-bit platforms - but if page caching is your concern, 1)
ensure that you really have more physical memory than can fit in 32
bits, and 2) ensure that your operating system is comfortable caching
data pages from files above the 32-bit mark.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Caching by Postgres

2005-08-24 Thread mark

On Wed, Aug 24, 2005 at 09:21:12AM -0400, Donald Courtney wrote:
> I built postgreSQL 8.1 64K bit on solaris 10 a few months ago
> and side by side with the 32 bit postgreSQL build saw no improvement. 
> In fact the 64 bit result was slightly lower.

I've had this sort of argument with a friend of mine who works at a
retail computer sales company who always tries to pitch 64-bit
platforms to me (I don't have one yet).

There are a few issues in here that are hard to properly detach to
allow for a fair comparison.

The first, to always remember - is that the move from 64-bits to
32-bits doesn't come for free. In a real 64-bit system with a
64-bit operating system, and 64-bit applications, pointers are
now double their 32-bit size. This means more bytes to copy around
memory, and in an extreme case, has the potential to approach
halfing both the memory latency to access many such pointers from
RAM, and half the effective amount of RAM. In real world cases,
not everything is a pointer, so this sort of performance degradation
is doubtful - but it is something to keep in mind.

In response to this, it appears that, at least on the Intel/AMD side
of things, they've increased the bandwidth on the motherboard, and
allowed for faster memory to be connected to the motherboard. They've
increased the complexity of the chip, to allow 64-bit register
operations to be equivalent in speed to 32-bit register operations.
I have no idea what else they've done... :-)

So, it may be difficult to properly compare a 32-bit system to a
64-bit system. Even if the Ghz on the chip appears equal, it isn't
the same chip, and unless it is the exact same make, product and
version of the motherboard, it may not be a fair compairson. Turning
support for 32-bit on or off, and using a kernel that is only 32-bit
may give good comparisons - but with the above explanation, I would
expect the 32-bit application + kernel to out-perform the 64-bit
application.

So then we move on to what 64-bit is really useful for. Obviously,
there is the arithmetic. If you were previously doing 64-bit
arithmetic through software, you will notice an immediate speed
improvement when doing it through hardware instead. If you have
a program that is scanning memory in any way, it may benefit from
64-bit instructions (for example - copying data 64-bit words at
a time instead of 32-bit words at a time). PostgreSQL might benefit
slightly from either of these, slightly balancing the performance
degradation of using more memory to store the pointers, and more
memory bandwidth the access the pointers.

The real benefit of 64-bit is address space. From the kernel
perspective, it means that more programs, or bigger programs can run
at once. From the application perspective, it means your application
can use more than 32-bits of address space. For programs that make
extensive use of mmap(), this can be a necessity. They are mapping
very large files into their own address space. This isn't a
performance boost, as much as it is a 'you can't do it', if the
files mmap()'ed at the same time, will not fit within 32-bits of
address space. This also becomes, potentially, a performance
degradation, as the system is now having to manage applications
that have very large page tables. Page faults may become
expensive.

PostgreSQL uses read(), instead of mmap(), and uses <2 Gbyte files.
PostgreSQL doesn't require the additional address space for normal
operation.

If, however, you happen to have a very large amount of physical memory
- more memory than is supported by a 32-bit system, but is supported
by your 64-bit system, then the operating system should be able to use
this additional physical memory to cache file system data pages, which
will benefit PostgreSQL if used with tables that are larger than the
memory supported by your 32-bit system, and which have queries which
require more pages than the memory supported by your 32-bit system to
be frequently accessed. If you have a huge database, with many clients
accessing the data, this would be a definate yes. With anything less,
it is a maybe, or a probably not.

I've been looking at switching to 64-bit, mostly to benefit from the
better motherboard bandwidth, and just to play around. I'm not
expecting to require the 64-bit instructions.

Hope this helps,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Caching by Postgres

2005-08-24 Thread mark

On Wed, Aug 24, 2005 at 02:47:09PM -0400, Alan Stange wrote:
> At least on Sparc processors, v8 and newer, any double precision math 
> (including longs) is performed with a single instruction, just like for 
> a 32 bit datum.  Loads and stores of 8 byte datums are also handled via 
> a single instruction.   The urban myth that 64bit math is 
> different/better on a 64 bit processor is just that;  yes, some lower 
> end processors would emulate/trap those instructions but that an 
> implementation detail, not architecture.

It isn't an urban myth that 64-bit math on a 64-bit processor is
faster, at least if done using registers. It definately is faster.

It may be an urban myth, though, that most applications perform
a sufficient amount of 64-bit arithmetic to warrant the upgrade.
The benefit may be lost in the noise for an application such as
PostgreSQL. It takes, effectively, infinately longer to access
a disk page, than to increment a 64-bit integer in software.

For the lower end processors that emulate/trap these instructions,
they are being performed in software, along with the overhead of a
trap, and are therefore not a single instruction any more. We are
coming at this from different sides (which is good - perspective is
always good :-) ). From the Intel/AMD side of things, ALL non 64-bit
platforms are 'lower end processors', and don't emulate/trap the
instructions as they didn't exist (at least not yet - who knows what
clever and sufficiently motivated people will do :-) ).

> >If, however, you happen to have a very large amount of physical memory
> >- more memory than is supported by a 32-bit system, but is supported
> >by your 64-bit system, then the operating system should be able to use
> >this additional physical memory to cache file system data pages, which
> >will benefit PostgreSQL if used with tables that are larger than the
> >memory supported by your 32-bit system, and which have queries which
> >require more pages than the memory supported by your 32-bit system to
> >be frequently accessed. If you have a huge database, with many clients
> >accessing the data, this would be a definate yes. With anything less,
> >it is a maybe, or a probably not.
> Solaris, at least, provided support for far more than 4GB of physical 
> memory on 32 bit kernels.  A newer 64 bit kernel might be more 
> efficient, but that's just because the time was taken to support large 
> page sizes and more efficient data structures.  It's nothing intrinsic 
> to a 32 vs 64 bit kernel.

Hehe. That's why I was so careful to qualify my statements. :-)

But yeah, I agree. It's a lot of hype, for not much gain (and some
loss, depending on what it is being used for). I only want one because
they're built better, and because I want to play around.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Caching by Postgres

2005-08-24 Thread mark

On Wed, Aug 24, 2005 at 05:09:04PM -0400, Alan Stange wrote:
> The older 32bit RISC processors do have 64 bit registers, ALUs and 
> datapaths, and they are marketed toward high end scientific computing, 
> and you're claiming that such a processor is slower than one which has 
> the addition of 64 bit pointers added to it?

No. I'm claiming that you are talking about a hybrid 64/32 processor,
and that this isn't fair to declare that 64-bit arithmetic units don't
provide benefit for 64-bit math. :-)

> As an example, an UltraSparc running a 32 bit kernel+application will 
> have the same double precision floating point performance as one 
> running  a 64bit kernel+application (except for the additional FP 
> registers in the 64bit API).  For a function like daxpy, it's the exact 
> same hardware running the exact same instructions!  So why do you think 
> the performance would be different?

Double precision floating point isn't 64-bit integer arithmetic. I think
this is all a little besides the point. If you point is that the SPARC
was designed well - I agree with you.

I won't agree that a SPARC with 64-bit registers should be considered
a 32-bit machine. The AMD 64-bit machines come in two forms as well -
the ones that allow you to use the 64-bit integer registers (not
floating point! those are already 80-bit!), and the ones that allow
you to address more memory. I wouldn't consider either to be a 32-bit
CPU, although they will allow 32-bit applications to run fine.

> I believe the IBM Power processors also upped everything to double 
> precision internally because of some details of the "multiply-add fused" 
> instructions.  It's been a few years since I taught H&P to CS 
> undergrads, but I'm fairly sure the details are all the same for MIPS 
> processors as well. 

Smart design, that obscures the difference - but doesn't make the
difference a myth. If it's already there, then it's already there,
and we can't talk as if it isn't.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] When to do a vacuum for highly active table

2005-08-30 Thread mark

On Tue, Aug 30, 2005 at 05:29:17PM -0400, Tom Lane wrote:
> Markus Benne <[EMAIL PROTECTED]> writes:
> > We have a highly active table that has virtually all
> > entries updated every 5 minutes.  Typical size of the
> > table is 50,000 entries, and entries have grown fat.
> ...
> > We are thinking of splitting the table in two: the
> > part the updates often, and the part the updates
> > infrequently as we suspect that record size impacts
> > vacuum.
> You just said that virtually all rows update constantly --- where's
> the "infrequent" part?

I think he means splitting it vertically, instead of horizontally, and
it sounds like an excellent idea, if a large enough portion of each
record is in fact mostly fixed. Otherwise, PostgreSQL is copying data
multiple times, only to have the data expire as part of a dead row.

I've already started to notice such issues with postgresql - but more
because I'm using low-end hardware, and I'm projecting the effect for
when our database becomes much larger with much higher demand on the
database.

This is the sort of scenario where a database without transactional
integrity would significantly out-perform one designed around it. If
records are fixed sized, and updated in place, these problems would
occur far less often. Is it heresy to suggest MySQL in here? :-)

I switched from MySQL to PostgreSQL several months ago, and haven't
looked back - but they do work differently, and for certain uses, one
can destroy the other. Using a MyISAM table would be the way I would
go with this sort of problem.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] 'Real' auto vacuum?

2005-08-30 Thread mark

On Wed, Aug 31, 2005 at 10:21:20AM +1200, Ralph Mason wrote:
> This is a wild and crazy thought which I am sure is invalid for some 
> good reason.
> 
> But why can't postgres just vacuum itself as it goes along?
> 
> When a row is orphaned it's added to a list of possibly available rows.  
> When a new row is needed the list of possible rows is examined and the 
> first one with a transaction id less then the lowest running transaction 
> id is chosen to be the new row?  These rows can be in a heap so it's 
> really fast to find one.
> 
> Like magic - no more vacuuming.  No more holes for people to fall into.

Yes please. :-)

> Is this an oversimplification of the problem?

But, yeah. It's probably not that easy, especially with really big
databases. Where is this free list stored? How efficient is it to keep
track of the lowest running transaction at all times? How does one
synchronize access to this free list, to ensure that processes don't
block up waiting for access to the free list? Is the fre list
journalled to prevent corruption, and the accidental re-use of a still
in use row? And, there would be a cost to scanning this list on every
insert or update.

As an outsider (like you?) I see the current model as a design flaw as
well. A neat and tidy model on paper. Not so nice in real life. The
need to vacuum in batch mode, to keep the database from dying, seems
intuitively bad.

I think there must be answers to this problem. Even simple
optimizations, such as defining a table such that any delete or update
within a table, upon commit, will attempt to vacuum just the rows that
should not be considered free for any new transactions. If it's in
use by an active transaction, oh well. It can be picked up by a batch
run of vacuum. If it's free though - let's do it now.

I think any optimizations we come up with, will be more happily accepted
with a working patch that causes no breakage... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread mark

n. Memory management,
or disk management, is "good enough" as provided by decent operating
systems, and the itch just isn't bad enough to scratch yet. They
remain unconvinced that the gain in performance, would be worth the
cost of maintaining this extra complexity in the code base.

If you believe the case can be made, it is up to you to make it.

Cheers!
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-07 Thread mark

On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote:
> On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote:
> > What? strlen is definitely not in the kernel, and thus won't count as
> > system time.
> System time on Linux includes time spent in glibc routines.

Do you have a reference for this?

I believe this statement to be 100% false.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-08 Thread mark

On Fri, Oct 07, 2005 at 09:20:59PM -0700, Luke Lonergan wrote:
> On 10/7/05 5:17 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> > On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote:
> >> On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote:
> >>> What? strlen is definitely not in the kernel, and thus won't count as
> >>> system time.
> >> System time on Linux includes time spent in glibc routines.
> > Do you have a reference for this?
> > I believe this statement to be 100% false.
> How about 99%? OK, you're right, I had this confused with the profiling
> problem where glibc routines aren't included in dynamic linked profiles.

Sorry to emphasize the 100%. It wasn't meant to judge you. It was meant
to indicate that I believe 100% of system time is accounted for, while
the system call is actually active, which is not possible while glibc
is active.

I believe the way it works, is that a periodic timer interrupt
increments a specific integer every time it wakes up. If it finds
itself within the kernel, it increments the system time for the active
process, if it finds itself outside the kernel, it incremenets the
user time for the active process.

> Back to the statements earlier - the output of time had much of time for a
> dd spent in system, which means kernel, so where in the kernel would that be
> exactly?

Not really an expert here. I only play around. At a minimum, their is a
cost to switching from user context to system context and back, and then
filling in the zero bits. There may be other inefficiencies, however.
Perhaps /dev/zero always fill in a whole block (8192 usually), before
allowing the standard file system code to read only one byte.

I dunno.

But, I see this oddity too:

$ time dd if=/dev/zero of=/dev/zero bs=1 count=1000
1000+0 records in
1000+0 records out
dd if=/dev/zero of=/dev/zero bs=1 count=1000  4.05s user 11.13s system 94% 
cpu 16.061 total

$ time dd if=/dev/zero of=/dev/zero bs=10 count=100
100+0 records in
100+0 records out
dd if=/dev/zero of=/dev/zero bs=10 count=100  0.37s user 1.37s system 100% 
cpu 1.738 total

>From my numbers, it looks like 1 byte reads are hard in both the user context
and the system context. It looks almost linearly, even:

$ time dd if=/dev/zero of=/dev/zero bs=100 count=10
10+0 records in
10+0 records out
dd if=/dev/zero of=/dev/zero bs=100 count=10  0.04s user 0.15s system 95% 
cpu 0.199 total

$ time dd if=/dev/zero of=/dev/zero bs=1000 count=1
1+0 records in
1+0 records out
dd if=/dev/zero of=/dev/zero bs=1000 count=1  0.01s user 0.02s system 140% 
cpu 0.021 total

At least some of this gets into the very in-depth discussions as to
whether kernel threads, or user threads, are more efficient. Depending
on the application, user threads can switch many times faster than
kernel threads. Other parts of this may just mean that /dev/zero isn't
implemented optimally.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-08 Thread mark

On Fri, Oct 07, 2005 at 12:48:16PM +0200, Steinar H. Gunderson wrote:
> On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote:
> > Isn't it possible (and reasonable) for these environments to keep track of
> > whether there is a transaction in progress with update to given table and
> > if not, use an index scan (count(*) where) or cached value (count(*)) to
> > perform this kind of query?
> Even if there is no running update, there might still be dead rows in the
> table. In any case, of course, a new update could always be occurring while
> your counting query was still running.

I don't see this being different from count(*) as it is today.

Updating a count column is certainly clever. If using a trigger,
perhaps it would allow the equivalent of:

select count(*) from table for update;

:-)

Cheers,
mark

(not that this is necessarily a good thing!)

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Best way to get all different values in a column

2005-10-14 Thread mark

On Fri, Oct 14, 2005 at 06:02:56PM +0200, [EMAIL PROTECTED] wrote:
> Does anyone here know what  is  the most efficient way to  list all
> different values of a given column with low cardinality ?  For instance
> I have a table with columns DAY, NAME, ID, etc.  The table is updated
> about each week with thousands of records with the same (current) date.
> Now I would like to list all values for DAY, only if possible without
> scanning all the table each time I submit the request.
> I can think of:
> ...
> Solution 6: Store the values in a separate table, recreated each time
> TABLE is updated.

I've found a variant on 6 to work well for this problem domain.

Why not insert into the separate table, when you insert into the table?
Either as a trigger, or in your application.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] prepared transactions that persist across sessions?

2005-10-22 Thread mark

Hey all.

Please point me to a place I should be looking if this is a common
question that has been debated periodically and at great length
already. :-)

I have a complex query. It's a few Kbytes large, and yes, I've already
worked on reducing it to be efficient in terms of database design, and
minimizing the expressions used to join the tables. Running some timing
tests, I've finding that the query itself, when issued in full, takes
around 60 milliseconds to complete on modest hardware. If prepared, and
then executed, however, it appears to take around 60 milliseconds to
prepare, and 20 milliseconds to execute. I'm not surprised. PostgreSQL
is very likely calculating the costs of many, many query plans.

This is telling me that the quickest method of me to accelerate these
queries, is to have them pre-select a query plan, and to use it.
Unfortunately, I'll only be executing this query once per session,
so "PREPARE" seems to be out of the question.

I am using PHP's PDO PGSQL interface - I haven't read up enough on it
to determine whether a persistent connection can re-use server-side
prepared queries as an option. Anybody know?

My read of the PLPGSQL documentation seems to suggest that it will do
some sort of query plan caching. Is there better documentation on this
that would explain exactly how it works? What is the best way to define
a PLPGSQL function that will return a set of records? Is RETURNS SETOF
the only option in this regard? It seems inefficient to me. Am I doing
it wrong? Not understanding it? For very simple queries, it seems that
using PLPGSQL and SELECT INTO, RETURN, and then SELECT * FROM F(arg)"
actually slows down the query slightly. It wasn't giving me much faith,
and I wanted to pick up some people's opinions befor egoing further.

What is the reason that SQL and/or PostgreSQL have not added
server-defined prepared statements? As in, one defines a
server-defined prepared statement, and all sessions that have
permission can execute the prepared statement. Is this just an issue
of nobody implementing it? Or was there some deeper explanation as
to why this would be a bad thing?

My reading of views, are that views would not accelerate the queries.
Perhaps the bytes sent to the server would reduce, however, the cost
to prepare, and execute the statement would be similar, or possibly
even longer?

I'm thinking I need some way of defined a server side query, that
takes arguments, that will infrequently prepare the query, such that
the majority of the time that it is executed, it will not have to
choose a query plan.

Am I missing something obvious? :-)

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] 8.x index insert performance

2005-10-31 Thread mark

On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote:
> if that index is causing the problem, you may want to consider setting
> up partial index to exclude null values.

Hey all.

Pardon my ignorance. :-)

I've been trying to figure out whether null values are indexed or not from
the documentation. I was under the impression, that null values are not
stored in the index. Occassionally, though, I then see a suggestion such
as the above, that seems to indicate to me that null values *are* stored
in the index, allowing for the 'exclude null values' to have effect?

Which is it? :-)

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] 8.x index insert performance

2005-10-31 Thread mark

On Mon, Oct 31, 2005 at 03:27:31PM -0500, Merlin Moncure wrote:
> > On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote:
> > > if that index is causing the problem, you may want to consider setting
> > > up partial index to exclude null values.
> > Hey all.
> > Pardon my ignorance. :-)
> > I've been trying to figure out whether null values are indexed or not from
> > the documentation. I was under the impression, that null values are not
> > stored in the index. Occassionally, though, I then see a suggestion such
> > as the above, that seems to indicate to me that null values *are* stored
> > in the index, allowing for the 'exclude null values' to have effect?
> > Which is it? :-)
> I think I'm the ignorant one...do explain on any lookup on an indexed
> field where the field value is null and you get a seqscan.

Nahhh... I think the documentation could use more explicit or obvious
explanation. Or, I could have checked the source code to see. In any case,
I expect we aren't the only ones that lacked confidence.

Tom was kind enough to point out that null values are stored. I expect
that the seqscan is used if the null values are not selective enough,
the same as any other value that isn't selective enough.

Now we can both have a little more confidence! :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] WAL sync behaviour

2005-11-10 Thread mark

On Thu, Nov 10, 2005 at 11:39:34AM -0500, Tom Lane wrote:
> No, Mike is right: for WAL you shouldn't need any journaling.  This is
> because we zero out *and fsync* an entire WAL file before we ever
> consider putting live WAL data in it.  During live use of a WAL file,
> its metadata is not changing.  As long as the filesystem follows
> the minimal rule of syncing metadata about a file when it fsyncs the
> file, all the live WAL files should survive crashes OK.

Yes, with emphasis on the zero out... :-)

> You do need metadata journaling for all non-WAL PG files, since we don't
> fsync them every time we extend them; which means the filesystem could
> lose track of which disk blocks belong to such a file, if it's not
> journaled.

I think there may be theoretical problems with regard to the ordering
of the fsync operation, for files that are not pre-allocated. For
example, if a new block is allocated - there are two blocks that need
to be updated.  The indirect reference block (or inode block, if block
references fit into the inode entry), and the block itself. If the
indirect reference block is written first, before the data block, the
state of the disk is inconsistent. This would be a crash during the
fsync() operation. The metadata journalling can ensure that the data
block is allocated first, and then all the necessary references
updated, allowing for the operation to be incomplete and rolled back,
or committed in full.

Or, that is my understanding, anyways, and this is why I would not use
ext2 for the database, even if it was claimed that fsync() was used.

For WAL, with pre-allocated zero blocks? Sure. Ext2... :-)

mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Opinions on Raid

2007-02-27 Thread mark

Hope you don't mind, Ron. This might be splitting hairs.

On Tue, Feb 27, 2007 at 11:05:39AM -0500, Ron wrote:
> The real CPU overhead when using SW RAID is when using any form of SW 
> RAID that does XOR operations as part of writes (RAID 5, 6, 50, ..., 
> etc).  At that point, you are essentially hammering on the CPU just 
> as hard as you would on a dedicated RAID controller... ...and the 
> dedicated RAID controller probably has custom HW helping it do this 
> sort of thing more efficiently.
> That being said, SW RAID 5 in this sort of scenario can be reasonable 
> if you =dedicate= a CPU core to it.  So in such a system, your "n" 
> core box is essentially a "n-1" core box because you have to lock a 
> core to doing nothing but RAID management.

I have an issue with the above explanation. XOR is cheap. It's one of
the cheapest CPU instructions available. Even with high bandwidth, the
CPU should always be able to XOR very fast.

This leads me to the belief that the RAID 5 problem has to do with
getting the data ready to XOR. With RAID 5, the L1/L2 cache is never
large enoguh to hold multiple stripes of data under regular load, and
the system may not have the blocks in RAM. Reading from RAM to find the
missing blocks shows up as CPU load. Reading from disk to find the
missing blocks shows up as system load. Dedicating a core to RAID 5
focuses on the CPU - which I believe to be mostly idle waiting for a
memory read. Dedicating a core reduces the impact, but can't eliminate
it, and the cost of a whole core to sit mostly idle waiting for memory
reads is high. Also, any reads scheduled by this core will affect the
bandwidth/latency for other cores.

Hardware RAID 5 solves this by using its own memory modules - like a
video card using its own memory modules. The hardware RAID can read
from its own memory or disk all day and not affect system performance.
Hopefully it has plenty of memory dedicated to holding the most
frequently required blocks.

> SW RAID 5 etc in usage scenarios involving far more reads than writes 
> and light write loads can work quite well even if you don't dedicate 
> a core to RAID management, but you must be careful about workloads 
> that are, or that contain parts that are, examples of the first 
> scenario I gave.  If you have any doubts about whether you are doing 
> too many writes, dedicate a core to RAID stuff as in the first scenario.

I found software RAID 5 to suck such that I only use it for backups
now. It seemed that Linux didn't care to read-ahead or hold blocks in
memory for too long, and preferred to read and then write. It was awful.
RAID 5 doesn't seem like a good option even with hardware RAID. They mask
the issues with it behind a black box (dedicated hardware). The issues
still exist.

Most of my system is RAID 1+0 now. I have it broken up. Rarely read or
written files (long term storage) in RAID 5, The main system data on
RAID 1+0. The main system on RAID 1. A larger build partition on RAID
0. For a crappy server in my basement, I've very happy with my
software RAID performance now. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread mark

On Thu, Mar 22, 2007 at 10:18:10AM -0400, Michael Stone wrote:
> IIRC, that's basically what you get with the mysql count anyway, since 
> there are corner cases for results in a transaction. Avoiding those 
> cases is why the postgres count takes so long; sometimes that's what's 
> desired and sometimes it is not.

Adding to this point:

In any production system, the count presented to the user is usually
wrong very shortly after it is displayed anyways. Transactions in the
background or from other users are adding or removing items, perhaps
even before the count reaches the user's display.

The idea of transaction-safety for counts doesn't apply in this case.
Both the transaction and the number are complete before the value is
displayed.

In my own systems, I rarely use count(*) for anything except user
visible results. For the PostgreSQL system I use, I keep a table of
counts, and lock the row for update when adding or removing items.
This turns out to be best in this system anyways, as I need my new
rows to be ordered, and locking the 'count' row lets me assign a
new sequence number for the row. (Don't want to use SEQUENCE objects,
as there could as the rows are [key, sequence, data], with thousands
or more keys)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] SCSI vs SATA

2007-04-04 Thread mark

On Wed, Apr 04, 2007 at 08:50:44AM -0700, Joshua D. Drake wrote:
> >difference. OTOH, the SCSI discs were way less reliable than the SATA
> >discs, that might have been bad luck.
> Probably bad luck. I find that SCSI is very reliable, but I don't find 
> it any more reliable than SATA. That is assuming correct ventilation etc...

Perhaps a basic question - but why does the interface matter? :-)

I find the subject interesting to read about - but I am having trouble
understanding why SATAII is technically superior or inferior to SCSI as
an interface, in any place that counts.

Is the opinion being expressed that manufacturers who have decided to
move to SATAII are not designing for the enterprise market yes? I find
myself doubting this...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] SCSI vs SATA

2007-04-08 Thread mark

On Sat, Apr 07, 2007 at 08:46:33PM -0400, Ron wrote:
> The Google and CMU studies are =not= based on data drawn from 
> businesses where the lesser consequences of an outage are losing 
> $10Ks or $100K per minute... ...and where the greater consequences 
> include the chance of loss of human life.
> Nor are they based on businesses that must rely exclusively on highly 
> skilled and therefore expensive labor.

Google up time seems to be quite good. Reliability can be had from trusting
more reputable (and usually more expensive) manufacturers and product lines,
or it can be had through redundancy. The "I" in RAID.

I recall reading the Google study before, and believe I recall it
lacking in terms of how much it costs to pay the employees to maintain
the system.  It would be interesting to know whether the inexpensive
drives require more staff time to be spent on it. Staff time can
easily become more expensive than the drives themselves.

I believe there are factors that exist that are not easy to calculate.
Somebody else mentioned how Intel was not the cleanest architecture,
and yet, how Intel architecture makes up the world's fastest machines,
and the cheapest machines per work to complete. There is a game of
numbers being played. A manufacturer that sells 10+ million units has
the resources, the profit margin, and the motivation, to ensure that
their drives are better than a manufacturer that sells 100 thousand
units. Even if the manufacturer of the 100 K units spends double in
development per unit, they would only be spending 1/50 as much as the
manufacturer who makes 10+ million units.

As for your experience - no disrespect - but if your experience is over
the last 25 years, then you should agree that most of those years are
no longer relevant in terms of experience. SATA has only existed for
5 years or less, and is only now stabilizing in terms of having the
different layers of a solution supporting the features like command
queuing. The drives of today have already broken all sorts of rules that
people assumed were not possible to break 5 years ago, 10 years ago, or
20 years ago. The playing field is changing. Even if your experience is
correct or valid today - it may not be true tomorrow.

The drives of today, I consider to be incredible in terms of quality,
reliability, speed, and density. All of the major brands, for desktops
or servers, IDE, SATA, or SCSI, are amazing compared to only 10 years
ago. To say that they don't meet a standard - which standard?

Everything has a cost. Having a drive never break, will have a very
large cost. It will cost more to turn 99.9% to 99.99%. Given that the
products will never be perfect, perhaps it is valid to invest in a
low-cost fast-to-implement recovery solution, that will assumes that
some number of drives will fail in 6 months, 1 year, 2 years, and 5
years. Assume they will fail, because regardless of what you buy -
their is a good chance that they *will* fail. Paying double price for
hardware, with a hope that they will not fail, may not be a good
strategy.

I don't have a conclusion here - only things to consider.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] LIKE search and performance

2007-05-24 Thread mark

On Thu, May 24, 2007 at 02:02:40PM -0700, Mark Lewis wrote:
> PG could scan the index looking for matches first and only load the
> actual rows if it found a match, but that could only be a possible win
> if there were very few matches, because the difference in cost between a
> full index scan and a sequential scan would need to be greater than the
> cost of randomly fetching all of the matching data rows from the table
> to look up the visibility information.  

> So yes it would be possible, but the odds of it being faster than a
> sequential scan are small enough to make it not very useful.

> And since it's basically impossible to know the selectivity of this kind
> of where condition, I doubt the planner would ever realistically want to
> choose that plan anyway because of its poor worst-case behavior.

What is a real life example where an intelligent and researched
database application would issue a like or ilike query as their
primary condition in a situation where they expected very high
selectivity?

Avoiding a poor worst-case behaviour for a worst-case behaviour that
won't happen doesn't seem practical.

What real life examples, that could not be implemented a better way,
would behave poorly if like/ilike looked at the index first to filter?

I don't understand... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] LIKE search and performance

2007-05-25 Thread mark

On Fri, May 25, 2007 at 09:13:25AM +0100, Richard Huxton wrote:
> [EMAIL PROTECTED] wrote:
> >>And since it's basically impossible to know the selectivity of this kind
> >>of where condition, I doubt the planner would ever realistically want to
> >>choose that plan anyway because of its poor worst-case behavior.
> >What is a real life example where an intelligent and researched
> >database application would issue a like or ilike query as their
> >primary condition in a situation where they expected very high
> >selectivity?
> >Avoiding a poor worst-case behaviour for a worst-case behaviour that
> >won't happen doesn't seem practical.
> But if you are also filtering on e.g. date, and that has an index with 
> good selectivity, you're never going to use the text index anyway are 
> you? If you've only got a dozen rows to check against, might as well 
> just read them in.
> The only time it's worth considering the behaviour at all is *if* the 
> worst-case is possible.

I notice you did not provide a real life example as requested. :-)

This seems like an ivory tower restriction. Not allowing best performance
in a common situation vs not allowing worst performance in a not-so-common
situation.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] LIKE search and performance

2007-05-25 Thread mark

On Fri, May 25, 2007 at 04:35:22PM +0100, Richard Huxton wrote:
> >I notice you did not provide a real life example as requested. :-)
> OK - any application that allows user-built queries:  foo>  
> Want another? Any application that has a "search by name" box - users 
> can (and do) put one letter in and hit enter.
> Unfortunately you don't always have control over the selectivity of 
> queries issued.

The database has 10 million records. The user enters "bar" and it
translates to "%bar%". You are suggesting that we expect bar to match
1 million+ records? :-)

I hope not. I would define this as bad process. I would also use "LIMIT"
to something like "100".

> >This seems like an ivory tower restriction. Not allowing best performance
> >in a common situation vs not allowing worst performance in a not-so-common
> >situation.
> What best performance plan are you thinking of? I'm assuming we're 
> talking about trailing-wildcard matches here, rather than "contains" 
> style matches.

"Trailing-wildcard" already uses B-Tree index, does it not?

I am speaking of contains, as contains is the one that was said to
require a seqscan. I am questioning why it requires a seqscan. The
claim was made that with MVCC, the index is insufficient to check
for visibility and that the table would need to be accessed anyways,
therefore a seqscan is required. I question whether a like '%bar%'
should be considered a high selectivity query in the general case.
I question whether a worst case should be assumed.

Perhaps I question too much? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] ECC RAM really needed?

2007-05-26 Thread mark

On Sat, May 26, 2007 at 08:43:15AM -0400, Michael Stone wrote:
> On Fri, May 25, 2007 at 06:45:15PM -0700, Craig James wrote:
> >We're thinking of building some new servers.  We bought some a while back 
> >that have ECC (error correcting) RAM, which is absurdly expensive compared 
> >to the same amount of non-ECC RAM.  Does anyone have any real-life data 
> >about the error rate of non-ECC RAM, and whether it matters or not?  In my 
> >long career, I've never once had a computer that corrupted memory, or at 
> >least I never knew if it did. 
> ...because ECC RAM will correct single bit errors. FWIW, I've seen *a 
> lot* of single bit errors over the years. Some systems are much better 
> about reporting than others, but any system will have occasional errors. 
> Also, if a stick starts to go bad you'll generally be told about with 
> ECC memory, rather than having the system just start to flake out. 

First: I would use ECC RAM for a server. The memory is not
significantly more expensive.

Now that this is out of the way - I found this thread interesting because
although it talked about RAM bit errors, I haven't seen reference to the
significance of RAM bit errors.

Quite a bit of memory is only rarely used (sent out to swap or flushed
before it is accessed), or used in a read-only capacity in a limited form.
For example, if searching table rows - as long as the row is not selected,
and the bit error is in a field that isn't involved in the selection
criteria, who cares if it is wrong?

So, the question then becomes, what percentage of memory is required
to be correct all of the time? I believe the estimates for bit error
are high estimates with regard to actual effect. Stating that a bit
may be wrong once every two weeks does not describe effect. In my
opinion, software defects have a similar estimate for potential for
damage to occur.

In the last 10 years - the only problems with memory I have ever
successfully diagnosed were with cheap hardware running in a poor
environment, where the problem became quickly obvious, to the point
that the system would be unusable or the BIOS would refuse to boot
with the broken memory stick. (This paragraph represents the primary
state of many of my father's machines :-) ) Replacing the memory
stick made the problems go away.

In any case - the word 'cheap' is significant in the above paragraph.
non-ECC RAM should be considered 'cheap' memory. It will work fine
most of the time and most people will never notice a problem.

Do you want to be the one person who does notice a problem? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread mark

On Wed, May 30, 2007 at 08:51:45AM -0700, Luke Lonergan wrote:
> > This is standard stuff, very well proven: try googling 'self healing zfs'.
> The first hit on this search is a demo of ZFS detecting corruption of one of
> the mirror pair using checksums, very cool:
>   
> http://www.opensolaris.org/os/community/zfs/demos/selfheal/;jsessionid=52508
> D464883F194061E341F58F4E7E1
> 
> The bad drive is pointed out directly using the checksum and the data
> integrity is preserved.

One part is corruption. Another is ordering and consistency. ZFS represents
both RAID-style storage *and* journal-style file system. I imagine consistency
and ordering is handled through journalling.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] setting up raid10 with more than 4 drives

2007-05-30 Thread mark

On Thu, May 31, 2007 at 01:28:58AM +0530, Rajesh Kumar Mallah wrote:
> i am still not clear what is the best way of throwing in more
> disks into the system.
> does more stripes means more performance (mostly) ?
> also is there any thumb rule about best stripe size ? (8k,16k,32k...)

It isn't that simple. RAID1 should theoretically give you the best read
performance. If all you care about is read, then "best performance" would
be to add more mirrors to your array.

For write performance, RAID0 is the best. I think this is what you mean
by "more stripes".

This is where RAID 1+0/0+1 come in. To reconcile the above. Your question
seems to be: I have a RAID 1+0/0+1 system. Should I add disks onto the 0
part of the array? Or the 1 part of the array?

My conclusion to you would be: Both, unless you are certain that you load
is scaled heavily towards read, in which case the 1, or if scaled heavily
towards write, then 0.

Then comes the other factors. Do you want redundancy? Then you want 1.
Do you want capacity? Then you want 0.

There is no single answer for most people.

For me, stripe size is the last decision to make, and may be heavily
sensitive to load patterns. This suggests a trial and error / benchmarking
requirement to determine the optimal stripe size for your use.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Autodetect of software RAID1+0 fails

2007-06-01 Thread mark

> On Fri, Jun 01, 2007 at 10:57:56AM -0700, Craig James wrote:
> > The Linux kernel doesn't properly detect my software RAID1+0 when I boot 
> > up.  It detects the two RAID1 arrays, the partitions of which are marked 
> > properly.  But it can't find the RAID0 on top of that, because there's no 
> > corresponding device to auto-detect.  The result is that it creates 
> > /dev/md0 and /dev/md1 and assembles the RAID1 devices on bootup, but 
> > /dev/md2 isn't created, so the RAID0 can't be assembled at boot time.

Hi Craig:

I had the same problem for a short time. There *is* a device to base the
RAID0 off, however, it needs to be recursively detected. mdadm will do this
for you, however, if the device order isn't optimal, it may need some help
via /etc/mdadm.conf. For a while, I used something like:

DEVICE partitions
...
ARRAY /dev/md3 level=raid0 num-devices=2 
UUID=10d58416:5cd52161:7703b48e:cd93a0e0
ARRAY /dev/md5 level=raid1 num-devices=2 
UUID=1515ac26:033ebf60:fa5930c5:1e1f0f12
ARRAY /dev/md6 level=raid1 num-devices=2 
UUID=72ddd3b6:b063445c:d7718865:bb79aad7

My symptoms were that it worked where started from user space, but failed during
reboot without the above hints. I believe if I had defined md5 and md6 before
md3, it may have worked automatically without hints.

On Fri, Jun 01, 2007 at 11:35:01PM +0200, Steinar H. Gunderson wrote:
> Either do your md discovery in userspace via mdadm (your distribution can
> probably help you with this), or simply use the raid10 module instead of
> building raid1+0 yourself.

I agree with using the mdadm RAID10 support. RAID1+0 has the
flexibility of allowing you to fine-control the RAID1 vs RAID0 if you
want to add disks later. RAID10 from mdadm has the flexibility that
you don't need an even number of disks. As I don't intend to add disks
to my array - the RAID10 as a single layer, with potentially better
intelligence in terms of performance, appeals to me.

They both worked for me - but I am sticking with the single layer now.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Question about SQL performance

2007-06-05 Thread mark

On Mon, Jun 04, 2007 at 11:18:30PM -0400, Jason Lustig wrote:
> I have some questions about the performance of certain types of SQL  
> statements.
> 
> What sort of speed increase is there usually with binding parameters  
> (and thus preparing statements) v. straight sql with interpolated  
> variables? Will Postgresql realize that the following queries are  
> effectively the same (and thus re-use the query plan) or will it  
> think they are different?
> 
>   SELECT * FROM mytable WHERE item = 5;
>   SELECT * FROM mytable WHERE item = 10;
>
> Obviously to me or you they could use the same plan. From what I  
> understand (correct me if I'm wrong), if you use parameter binding -  
> like "SELECT * FROM mytable WHERE item = ?" - Postgresql will know  
> that the queries can re-use the query plan, but I don't know if the  
> system will recognize this with above situation.

Although they could use the same plan, it is possible that using the
same plan is non-optimal. For example, if I know that 99% of the table
contains item = 5, but only 1% of the table contains item = 10, then
the 'best plan' may be a sequential scan for item = 5, but an index scan
for item = 10.

In the case of a prepared query, PostgreSQL will pick a plan that will
be good for all values, which may not be best for specific queries. You
save parsing time and planning time, but may risk increasing execution
time.

> Also, what's the difference between prepared statements (using  
> PREPARE and EXECUTE) and regular functions (CREATE FUNCTION)? How do  
> they impact performance? From what I understand there is no exact  
> parallel to stored procedures (as in MS SQL or oracle, that are  
> completely precompiled) in Postgresql. At the same time, the  
> documentation (and other sites as well, probably because they don't  
> know what they're talking about when it comes to databases) is vague  
> because  PL/pgSQL is often said to be able to write stored procedures  
> but nowhere does it say that PL/pgSQL programs are precompiled.

I think you can find all of these answers in the documentation, including
my comments about prepared queries. Does it matter if the program is
precompiled? I believe it is, but why would it matter?

Are you addressing a real performance problem? Or are you trying to avoid
issues that you are not sure if they exist or not? :-)

Prepared queries are going to improve performance due to being able to
execute multiple queries without communicating back to the
client. Especially for short queries, network latency can be a
significant factor for execution speed.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] LIKE search and performance

2007-06-06 Thread mark

On Wed, Jun 06, 2007 at 11:23:13PM +0100, James Mansion wrote:
> [EMAIL PROTECTED] wrote:
> >What is a real life example where an intelligent and researched
> >database application would issue a like or ilike query as their
> >primary condition in a situation where they expected very high
> >selectivity?
> In my case the canonical example is to search against textual keys
> where the search is performed automatically if the user hs typed
> enough data and paused.  In almost all cases the '%' trails, and I'm
> looking for 'starts with' in effect.  usually the search will have a
> specified upper number of returned rows, if that's an available
> facility.  I realise in this case that matching against the index
> does not allow the match count unless we check MVCC as we go, but I
> don't see why another thread can't be doing that.

I believe PostgreSQL already considers using the index for "starts
with", so this wasn't part of the discussion for me. Sorry that this
wasn't clear.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [OT] Re: [PERFORM] How much ram is too much

2007-06-08 Thread mark

On Fri, Jun 08, 2007 at 08:54:39PM +0200, Zoltan Boszormenyi wrote:
> Joshua D. Drake írta:
> >Zoltan Boszormenyi wrote:
> >>Dave Cramer írta:
> >>>It's an IBM x3850 using linux redhat 4.0
> >>Isn't that a bit old? I have a RedHat 4.2 somewhere
> >>that was bundled with Applixware 3. :-)
> >He means redhat ES/AS 4 I assume.
> I guessed that, hence the smiley.
> But it's very unfortunate that version numbers
> are reused - it can cause confusion.
> There was a RH 4.0 already a long ago,
> when the commercial and the community
> version were the same. I think Microsoft
> will avoid reusing its versions when year 2095 comes. :-)

He should have written RHEL 4.0. RH 4.0 is long enough ago, though,
that I think few would assume it meant the much older release.

You'll find a similar thing with products like "CuteFTP 7.0" or
"CuteFTP Pro 3.0".

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] Held idle connections vs use of a Pooler

2010-09-14 Thread mark

Hello,

I am relatively new to postgres (just a few months) so apologies if
any of you are bearing with me.

I am trying to get a rough idea of the amount of bang for the buck I
might see if I put in a connection pooling service into the enviroment
vs our current methodology of using persistent open connections.

We have a number of in house applications that connect to a central
Postgres instance. (8.3.7). The box is admitting underpowered with
only 8 cores, and 8gb or ram and not great disk IO out of an MSA-70.
the database is about 35GB on disk and does mainly (~95%) OTLP type
queries. I am currently begging for more ram.

Most of the connections from the various apps hold idle connections
until they need to execute a query once done go back to holding an
open idle connection.  (there are ~600 open connections at any given
time, and most of the time most are idle)

this is typically fine while the number of active queries is low, but
some other application (that doesn't use connection pooling or holding
open connections when not in use) is hitting the db from time to time
with 50-100 small queries (2ms queries from my testing) nearly all at
once. when this happens the whole response time goes out the door
however).


I think from reading this list for a few weeks the answer is move to
using connection pooling package elsewhere to better manage incoming
connections, with a lower number to the db.

I am told this will require some re-working of some app code as I
understand pg-pool was tried a while back in our QA environment and
server parts of various in-house apps/scripts/..etc started to
experience show stopping problems.

to help make my case to the devs and various managers I was wondering
if someone could expand on what extra work is having to be done while
queries run and there is a high (500-600) number of open yet idle
connections to db. lots of the queries executed use sub-transactions
if that makes a difference.


basically what I am paying extra for with that many persistent
connections, that I might save if I go to the effort of getting the
in-house stuff to make use of a connection pooler ?


thank you for your time.

..: mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] cleanup on pg_ system tables?

2010-09-20 Thread mark

Hi All,

(pg 8.3.7 on RHEL  2.6.18-92.el5 )

I ran the query below (copied from
http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html ) on a
production DB we have and I am looking at some pretty nasty looking
numbers for tables in the pg_catalog schema. I have tried a reindex
and vaccum but neither seem to be clearing these out, tried a cluster
and it won't let me.

I am viewing the problem wrong? is there anything I can do while the
DB is online ? do I need to clean up other things first ?


thanks,

..: Mark



-[ RECORD 1 ]+
schemaname   | pg_catalog
tablename| pg_attribute
reltuples| 5669
relpages | 113529
otta | 92
tbloat   | 1234.0
wastedpages  | 113437
wastedbytes  | 929275904
wastedsize   | 886 MB
iname| pg_attribute_relid_attnam_index
ituples  | 5669
ipages   | 68
iotta| 80
ibloat   | 0.9
wastedipages | 0
wastedibytes | 0
wastedisize  | 0 bytes




SELECT
       schemaname, tablename, reltuples::bigint, relpages::bigint, otta,
       ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric
END,1) AS tbloat,
       relpages::bigint - otta AS wastedpages,
       bs*(sml.relpages-otta)::bigint AS wastedbytes,
       pg_size_pretty((bs*(relpages-otta))::bigint) AS wastedsize,
       iname, ituples::bigint, ipages::bigint, iotta,
       ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE
ipages/iotta::numeric END,1) AS ibloat,
       CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END
AS wastedipages,
       CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS
wastedibytes,
       CASE WHEN ipages < iotta THEN pg_size_pretty(0) ELSE
pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize
     FROM (
       SELECT
         schemaname, tablename, cc.reltuples, cc.relpages, bs,
         CEIL((cc.reltuples*((datahdr+ma-
           (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
END))+nullhdr2+4))/(bs-20::float)) AS otta,
         COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0)
AS ituples, COALESCE(c2.relpages,0) AS ipages,
         COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0)
AS iotta -- very rough approximation, assumes all cols
       FROM (
         SELECT
           ma,bs,schemaname,tablename,
           (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
END)))::numeric AS datahdr,
           (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma
ELSE nullhdr%ma END))) AS nullhdr2
         FROM (
           SELECT
             schemaname, tablename, hdr, ma, bs,
             SUM((1-null_frac)*avg_width) AS datawidth,
             MAX(null_frac) AS maxfracsum,
             hdr+(
               SELECT 1+count(*)/8
               FROM pg_stats s2
               WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND
s2.tablename = s.tablename
             ) AS nullhdr
           FROM pg_stats s, (
             SELECT
               (SELECT current_setting('block_size')::numeric) AS bs,
               CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN
27 ELSE 23 END AS hdr,
               CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
             FROM (SELECT version() AS v) AS foo
           ) AS constants
           GROUP BY 1,2,3,4,5
         ) AS foo
       ) AS rs
       JOIN pg_class cc ON cc.relname = rs.tablename
       JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname
= rs.schemaname
       LEFT JOIN pg_index i ON indrelid = cc.oid
       LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
     ) AS sml
     WHERE sml.relpages - otta > 0 OR ipages - iotta > 10
     ORDER BY wastedbytes DESC, wastedibytes DESC

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Using Between

2010-09-23 Thread mark

>>> The question is how can we make it faster.

>>If there's just one region ID for any given postal code, you might try
>>adding a column to vehicleused and storing the postal codes there.
>>You could possibly populate that column using a trigger; probably it
>>doesn't change unless the postalcode changes.  Then you could index
>>that column and query against it directly, rather than joining to
>>PostalCodeRegionCountyCity.  Short of that, I don't see any obvious
>>way to avoid reading most of the vehicleused table.  There may or may
>>not be an index that can speed that up slightly and of course you can
>>always throw hardware at the problem, but fundamentally reading half a
>>million or more rows isn't going to be instantaneous.

>>Incidentally, it would probably simplify things to store postal codes
>>in the same case throughout the system. If you can avoid the need to
>>write lower(x) = lower(y) and just write x = y you may get better
>>plans.  I'm not sure that's the case in this particular example but
>>it's something to think about.

Something else you might test is bumping the read-ahead value. Most linux
installs have this at 256, might try bumping the value to ~8Meg and tune
from there . this may help you slightly for seq scan performance. As always:
YMMV. It's not going to magically fix low performing I/O subsystems and it
won't help many applications of PG but there are a few outlying instances
where this change can help a little bit. 


I am sure someone will step in and tell you it is a bad idea - AND they will
probably have perfectly valid reasons for why it is, so you will need to
consider the ramifications.. if at all possible test and tune to see. 
..: Mark



>>-- 
>>Robert Haas
>>EnterpriseDB: http://www.enterprisedb.com
>>The Enterprise Postgres Company

-- 
>>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>To make changes to your subscription:
>>http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-14 Thread mark

Could this be an interesting test use of https://www.fossexperts.com/ ? 

'Community' driven proposal - multiple people / orgs agree to pay various
portions? Maybe with multiple funders a reasonable target fund amount could
be reached.

Just throwing around ideas here. 


Mark

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Robert Haas
Sent: Wednesday, October 13, 2010 7:29 AM
To: Neil Whelchel
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow count(*) again...

On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel 
wrote:
> I might go as far as to rattle the cage of the developers to see if it
makes
> any sense to add some column oriented storage capability to Postgres. That
> would be the hot ticket to be able to specify an attribute on a column so
that
> the back end could shadow or store a column in a column oriented table so
> aggregate functions could work on them with good efficiency, or is that an
> INDEX?

I'd love to work on that, but without funding it's tough to find the
time.  It's a big project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread mark


-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andy
Sent: Wednesday, December 08, 2010 5:24 PM
To: Marti Raudsepp
Cc: pgsql-performance@postgresql.org; Benjamin Krajmalnik
Subject: Re: [PERFORM] Hardware recommendations



>> > If you are IO-bound, you might want to consider using
>> SSD.
>> >
>> > A single SSD could easily give you more IOPS than 16
>> 15k SAS in RAID 10.
>> 
>> Are there any that don't risk your data on power loss, AND
>> are cheaper
>> than SAS RAID 10?
>> 

>Vertex 2 Pro has a built-in supercapacitor to save data on power loss. It's
spec'd at 50K IOPS and a 200GB one costs around $1,000.


Viking offers 6Gbps SAS physical connector SSD drives as well - with a super
capacitor.

I have not seen any official pricing yet, but I would suspect it would be in
the same ballpark.

 I am currently begging to get some for eval. I will let everyone know if I
swing that and can post numbers. 

-mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Hardware recommendations

2010-12-13 Thread mark



> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Benjamin Krajmalnik
> Sent: Monday, December 13, 2010 1:45 PM
> To: Greg Smith
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Hardware recommendations
> 
> 
> 
> > -Original Message-
> > From: Greg Smith [mailto:g...@2ndquadrant.com]
> > Sent: Saturday, December 11, 2010 2:18 AM
> > To: Benjamin Krajmalnik
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Hardware recommendations
> >
 
> > Have you increased checkpoint parameters like checkpoint_segments?
> You
> > need to avoid having checkpoints too often if you're going to try and
> > use 4GB of memory for shared_buffers.
> >
> 
> Yes, I have it configured at 1024 checkpoint_segments, 5min timeout,0.9
> compiostat -x 5letion_target


I would consider bumping that checkpoint timeout duration to a bit longer
and see if that helps any if you are still looking for knobs to fiddle with.


YMMV. 

-Mark





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Migrating to Postgresql and new hardware

2011-01-18 Thread mark

Comments in line, take em for what you paid for em.



> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Lars
> Sent: Tuesday, January 18, 2011 3:57 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Migrating to Postgresql and new hardware
> 
> Hi,
> 
> We are in the process of moving a web based application from a MySql to
> Postgresql database.
> Our main reason for moving to Postgresql is problems with MySql
> (MyISAM) table locking.

I would never try and talk someone out of switching but MyISAM? What
version of MySQL and did you pick MyISAM for a good reason or just happened
to end up there?



> We will buy a new set of servers to run the Postgresql databases.
> 
> The current setup is five Dell PowerEdge 2950 with 2 *  XEON E5410, 4GB
> RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3 in RAID 5 + 1 spare).
> 
> One server is used for shared data.
> Four servers are used for sharded data. A user in the system only has
> data in one of the shards.
> There is another server to which all data is replicated but I'll leave
> that one out of this discussion.
> These are dedicated database servers. There are more or less no stored
> procedures. The shared database size is about 20GB and each shard
> database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect
> the size will grow 10%-15% this year. Server load might increase with
> 15%-30% this year. This setup is disk I/O bound. The overwhelming
> majority of sql statements are fast (typically single row selects,
> updates, inserts and deletes on primary key) but there are some slow
> long running (10min) queries.
> 
> As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb
> RAM, H700 512MB NV Cache.

One would think you should notice a nice speed improvement, ceteris paribus,
since the X5650 will have ->significantly<- more memory bandwidth than the
5410s you are used to, and you are going to have a heck of a lot more ram
for things to cache in. I think the H700 is a step up in raid cards as well
but with only 4 disks your probably not maxing out there.  



> Dell has offered two alternative SSDs:
> Samsung model SS805 (100GB Solid State Disk SATA 2.5").
> (http://www.plianttechnology.com/lightning_lb.php)
> Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5").
> (http://www.samsung.com/global/business/semiconductor/products/SSD/Prod
> ucts_Enterprise_SSD.html)

The Samsung ones seems to indicate that they have protection in the event of
a power failure, and the pliant does not mention it. 

Granted I haven't done or seen any pull the plug under max load tests on
either family, so I got nothing beyond that it is the first thing I have
looked at with every SSD that crosses my path.



> 
> Both are SLC drives. The price of the Pliant is about 2,3 times the
> price of the Samsung (does it have twice the performance?).
> 
> One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung
> drives (4 in RAID 10 + 1 spare).
> Another alternative would be 3 servers (1 shared and 2 shards) with 5
> Pliant drives (4 in RAID 10 + 1 spare). This would be slightly more
> expensive than the first alternative but would be easier to upgrade
> with two new shard servers when it's needed.

As others have mentioned, how are you going to be doing your "shards"?



> 
> Anyone have experience using the Samsung or the Pliant SSD? Any
> information about degraded performance over time?
> Any comments on the setups? How would an alternative with 15K disks (6
> RAID 10 + 1 spare, or even 10 RAID10 + 1 spare) compare?


You still may find that breaking xlog out to its own logical drive (2 drives
in raid 1) gives a speed improvement to the overall. YMMV - so tinker and
find out before you go deploying. 

> How would these alternatives compare in I/O performance compared to the
> old setup?
> Anyone care to guess how the two alternatives would compare in
> performance running Postgresql?
> How would the hardware usage of Postgresql compare to MySqls?


I won't hazard a guess on the performance difference between PG w/ Fsync ON
and MySQL running with MyISAM. 

If you can get your OS and PG tuned you should be able to have a database
that can have pretty decent throughput for an OLTP workload. Since that
seems to be the majority of your intended workload. 


-Mark

> 
> 
> 
> Regards
> /Lars
> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performa...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Queries becoming slow under heavy load

2011-01-26 Thread mark

> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Ivan Voras
> Sent: Wednesday, January 26, 2011 6:25 AM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Queries becoming slow under heavy load
> 
> On 25/01/2011 22:37, Anne Rosset wrote:
> > Hi,
> >
> > We are running some performances tests.  With a lot of concurrent
> > access,  queries get very slow. When there is no load, those queries
> run
> > fast.
> 
> As others said, you need to stat how many concurrent clients are
> working
> on the database and also the number of logical CPUs (CPU cores,
> hyperthreading) are present in the machine. So far, as a rule of thumb,
> if you have more concurrent active connections (i.e. doing queries, not
> idling), you will experience a sharp decline in performance if this
> number exceeds the number of logical CPUs in the machine.
> 
Depending on what version the OP is running - I didn't see where a version was 
givin - if there is a "lot" number of idle connections it can affect things as 
well. Tom indicated to me this should be "much better" in 8.4 and later. 

We cut our idle connections from 600+ to a bit over 300 and saw a good drop in 
box load and query responsiveness. (still get large user cpu load spikes though 
when a few hundred idle connection processes are woken open because they all 
appear to be sleeping on the same semaphore and one of them has work to do. )

(yeah I know get a pooler, to bad only bouncer seems to pool out idle 
connections with transaction pooling but then I lose prepared statements... I 
am still working on that part and getting off 8.3. yes our app tried to do its 
own quasi connection pooling. When we deployed the app on a few hundred boxes 
the error of this choice years ago when this app lived on much fewer machines 
is now pretty evident.)

> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performa...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Xeon twice the performance of opteron

2011-03-17 Thread mark



> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Jeff
> Sent: Thursday, March 17, 2011 9:14 AM
> To: pgsql-performance@postgresql.org
> Cc: Brian Ristuccia
> Subject: [PERFORM] Xeon twice the performance of opteron
> 
> hey folks,
> 
> Running into some odd performance issues between a few of our db
> boxes.  While trying to speed up a query I ran it on another box and
> it was twice as fast.  The plans are identical and various portions of
> the query run in the same amount of time - it all boils down to most
> of the time being spent in a join filter.  The plan is as good as it
> is going to get but the thing that is concerning me, which hopefully
> some folks here may have some insight on, is the very large difference
> in runtime.
> 
> three boxes:
>   A: Intel(R) Xeon(R) CPU   E5345  @ 2.33GHz  (Runs query
> fastest)
>   4MB cache
>   B: Quad-Core AMD Opteron(tm) Processor 2352 (2.1GHZ) (Main
> production
> box, currently, middle speed)
>   512k cache
>   C: Quad-Core AMD Opteron(tm) Processor 2378 (2.4GHZ)
>   512k cache
> 
> A & B are running PG 8.4.2 (yes, I know it desperately need to be
> upgraded). C was also on 8.4.2 and since it was not in production I
> upgraded it to 8.4.7 and got the same performance as 8.4.2.  Dataset
> on A & B is the same C is mostly the same, but is missing a couple
> weeks of data (but since this query runs over 3 years of data, it is
> negligable - plus C runs the slowest!)
> 
> All three running FC10 with kernel Linux db06
> 2.6.27.19-170.2.35.fc10.x86_64 #1 SMP Mon Feb 23 13:00:23 EST 2009
> x86_64 x86_64 x86_64 GNU/Linux
> 
> Load is very low on each box. The query is running from shared_buffers
> - no real IO is occuring.
> 
> The average timing for the query in question is 90ms on A,  180ms on B
> and 190ms on C.
> 
> Now here's where some odd stuff starts piling up: explain analyze
> overhead on said queries:
> 20ms on A, 50ms on B and 85ms on C(!!)
> 
> We had one thought about potential NUMA issues, but doing a series
> (100) of connect, query, disconnect and looking at the timings reveals
> them all to be solid... but even still we wouldn't expect it to be
> that awful.  The smaller cache of the opterons is also a valid
> argument.
> 
> I know we're running an old kernel, I'm tempted to upgrade to see what
> will happen, but at the same time I'm afraid it'll upgrade to a kernel
> with a broken [insert major subsystem here] which has happened before.
> 
> Anybody have some insight into this or run into this before?
> 
> btw, little more background on the query:
> 
> ->  Nested Loop  (cost=5.87..2763.69 rows=9943 width=0) (actual
> time=0.571..2
> 74.750 rows=766 loops=1)
>   Join Filter: (ce.eventdate >= (md.date - '6 days'::interval))
>   ->  Nested Loop  (cost=5.87..1717.98 rows=27 width=8)
> (actual time=0.53
> 3..8.301 rows=159 loops=1)
>   [stuff removed here]
>  ->  Index Scan using  xxx_date_idx on xx md
> (cost=0.00..19.50 rows=1099 width=8) (actual time=0.023..0.729
> rows=951 loops=15
> 9)
> Index Cond: (ce.eventdate <= md.date)
> 
> 
> On all three boxes that inner nestloop completes in about the same
> amount of time - it is that join filter that is causing the pain and
> agony. (If you are noticing the timing differences, that is because
> the numbers above are the actual numbers, not explain analyze).  The
> query is pulling up a rolling window of events that occured on a
> specific date. This query pulls up al the data for a period of time.
> ce.eventdate is indexed, and is used in the outer nestloop.  Thinking
> more about what is going on cache thrashing is certainly a possibility.
> 
> the amazing explain analyze overhead is also very curious - we all
> know it adds overhead, but 85ms?  Yow.
> 
> --
> Jeff Trout 
> http://www.stuarthamm.net/
> http://www.dellsmartexitin.com/

I am sure you might have already checked for this, but just incase...
Did you verify that no power savings stuff is turned on in the BIOS or at
the kernel ?

I have to set ours to something HP calls static high performance or
something like that if I want boxes that are normally pretty idle to execute
in a predictable fashion for sub second queries. 

I assume you checked with a steam benchmark results on the AMD machines to
make sure they are getting in the ballpark of where they are supposed to ? 







-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Select in subselect vs select = any array

2011-03-21 Thread mark


> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Adam Tistler
> Sent: Monday, March 21, 2011 12:17 AM
> To: Pavel Stehule
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Select in subselect vs select = any array
> 
> Pavel, thanks for the help.
> 
> I increased work_mem from 16MB to 64MB, no difference.  The queries are
> really just a test case.  My actual queries are actual just large
> number of primary keys that I am selecting from the db:
> 
> For example:
>select * from nodes where node_id in ( 1, 2, 3 . )
> 

What does "large" number of primary keys mean ?

I have seen some "odd" things happen when I passed, carelessly, tens of
thousands of items to an in list for a generated query, but I don't get the
feeling that isn't the case here.




..: Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread mark

On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote:
> I understand most of these issues, and expected this kind of reply. Please, 
> allow me to insist that we reason on this problem and try to find a 
> solution. My reason for doing so is that the future software industry is 
> likely to see more and more web applications retrieving data from virtually 
> endless databases, and in such contexts, it is sensible to ask the final 
> client--the web client--to store the "cursor state", because web 
> interaction is intrinsically asynchronous, and you cannot count on users 
> logging out when they're done, releasing resources allocated to them. Think 
> of Google.

What is wrong with LIMIT and OFFSET? I assume your results are ordered
in some manner.

Especially with web users, who become bored if the page doesn't flicker
in a way that appeals to them, how could one have any expectation that
the cursor would ever be useful at all?

As a 'general' solution, I think optimizing the case where the same
query is executed multiple times, with only the LIMIT and OFFSET
parameters changing, would be a better bang for the buck. I'm thinking
along the lines of materialized views, for queries executed more than
a dozen times in a short length of time... :-)

In the mean time, I successfully use LIMIT and OFFSET without such an
optimization, and things have been fine for me.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Suspending SELECTs

2006-01-18 Thread mark

On Wed, Jan 18, 2006 at 09:57:50AM +0100, Alessandro Baretta wrote:
> [EMAIL PROTECTED] wrote:
> >On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote:
> >>I understand most of these issues, and expected this kind of reply. 
> >>Please, allow me to insist that we reason on this problem and try to find 
> >>a solution. My reason for doing so is that the future software industry 
> >>is likely to see more and more web applications retrieving data from 
> >>virtually endless databases, and in such contexts, it is sensible to ask 
> >>the final client--the web client--to store the "cursor state", because 
> >>web interaction is intrinsically asynchronous, and you cannot count on 
> >>users logging out when they're done, releasing resources allocated to 
> >>them. Think of Google.
> >What is wrong with LIMIT and OFFSET? I assume your results are ordered
> >in some manner.
> It looks like this is the only possible solution at present--and in the 
> future, too--but it has a tremendouse performance impact on queries 
> returning thousands of rows.

In the case of one web user generating one query, I don't see how it would
have a tremendous performance impact on large queries.

You mentioned google. I don't know how you use google - but most of the
people I know, rarely ever search through the pages. Generally the answer
we want is on the first page. If the ratio of users who search through
multiple pages of results, and users who always stop on the first page,
is anything significant (better than 2:1?) LIMIT and OFFSET are the
desired approach. Why have complicated magic in an application, for a
subset of the users?

I there is to be a change to PostgreSQL to optimize for this case, I
suggest it involve the caching of query plans, executor plans, query
results (materialized views?), LIMIT and OFFSET. If we had all of
this, you would have exactly what you want, while benefitting many
more people than just you. No ugly 'persistent state cursors' or
'import/export cursor state' implementation. People would automatically
benefit, without changing their applications.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Suspending SELECTs

2006-01-18 Thread mark

On Wed, Jan 18, 2006 at 03:41:57PM +, Harry Jackson wrote:
> There are various reason why google might  want to limit the search
> result returned ie to encourage people to narrow their search. Prevent
> screen scrapers from hitting them really hard blah blah. Perhaps less
> than 0.0001% of real users (not scrapers) actually dig down to the
> 10th page so whats the point.

I recall a day when google crashed, apparently due to a Windows virus
that would use google to obtain email addresses.

As an unsubstantiated theory - this may have involved many, many clients,
all accessing search page results beyond the first page.

I don't see google optimizing for the multiple page scenario. Most
people (as I think you agree above), are happy with the first or
second page, and they are gone. Keeping a cursor for these people as
anything more than an offset into search criteria, would not be
useful.

Cheers,

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] LIKE query on indexes

2006-02-21 Thread mark

On Tue, Feb 21, 2006 at 05:57:12PM +0200, Ibrahim Tekin wrote:
> i have btree index on a text type field. i want see rows which starts with
> certain characters on that field. so i write a query like this:
> SELECT * FROM mytable WHERE myfield LIKE 'john%'
> since this condition is from start of the field, query planner should use
> index to find such elements but explain command shows me it will do a
> sequential scan.
> is this lack of a feature or i am wrong somewhere?

Is the query fast enough? How big is your table? What does explain
analyze select tell you?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread mark

On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> Sad, cause the AMD is, on a price / performance scale, twice the
> processor for the same money as the Intel.

Maybe a year or two ago. Prices are all coming down. Intel more
than AMD.

AMD still seems better - but not X2, and it depends on the workload.

X2 sounds like biggotry against Intel... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread mark

On Tue, Apr 25, 2006 at 01:42:31PM -0500, Scott Marlowe wrote:
> On Tue, 2006-04-25 at 13:38, [EMAIL PROTECTED] wrote:
> > On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote:
> > > Sad, cause the AMD is, on a price / performance scale, twice the
> > > processor for the same money as the Intel.
> > Maybe a year or two ago. Prices are all coming down. Intel more
> > than AMD.
> > AMD still seems better - but not X2, and it depends on the workload.
> > X2 sounds like biggotry against Intel... :-)
> Actually, that was from an article from this last month that compared
> the dual core intel to the amd.  for every dollar spent on the intel,
> you got about half the performance of the amd.  Not bigotry.  fact.
> But don't believe me or the other people who've seen the difference.  Go
> buy the Intel box.  No skin off my back.

AMD Opteron vs Intel Xeon is different than AMD X2 vs Pentium D.

For AMD X2 vs Pentium D - I have both - in similar price range, and
similar speed. I choose to use the AMD X2 as my server, and Pentium D
as my Windows desktop. They're both quite fast.

I made the choice I describe based on a lot of research. I was going
to go both Intel, until I noticed that the Intel prices were dropping
fast. 30% price cut in 2 months. AMD didn't drop at all during the
same time.

There are plenty of reasons to choose one over the other. Generally
the AMD comes out on top. It is *not* 2X though. Anybody who claims
this is being highly selective about which benchmarks they consider.

One article is nothing.

There is a lot of hype these days. AMD is winning the elite market,
which means that they are able to continue to sell high. Intel, losing
this market, is cutting its prices to compete. And they do compete.
Quite well.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread mark

On Tue, Apr 25, 2006 at 08:54:40PM -0400, [EMAIL PROTECTED] wrote:
> I made the choice I describe based on a lot of research. I was going
> to go both Intel, until I noticed that the Intel prices were dropping
> fast. 30% price cut in 2 months. AMD didn't drop at all during the
> same time.

Errr.. big mistake. That was going to be - I was going to go both AMD.

> There are plenty of reasons to choose one over the other. Generally
> the AMD comes out on top. It is *not* 2X though. Anybody who claims
> this is being highly selective about which benchmarks they consider.

I have an Intel Pentium D 920, and an AMD X2 3800+. These are very
close in performance. The retail price difference is:

Intel Pentium D 920 is selling for $310 CDN
AMD X2 3800+is selling for $347 CDN

Another benefit of Pentium D over AMD X2, at least until AMD chooses
to switch, is that Pentium D supports DDR2, whereas AMD only supports
DDR. There are a lot of technical pros and cons to each - with claims
from AMD that DDR2 can be slower than DDR - but one claim that isn't
often made, but that helped me make my choice:

1) DDR2 supports higher transfer speeds. I'm using DDR2 5400 on
   the Intel. I think I'm at 3200 or so on the AMD X2.

2) DDR2 is cheaper. I purchased 1 Gbyte DDR2 5400 for $147 CDN.
   1 Gbyte of DDR 3200 starts at around the same price, and
   stretches into $200 - $300 CDN.

Now, granted, the Intel 920 requires more electricity to run. Running
24/7 for a year might make the difference in cost.

It doesn't address point 1) though. I like my DDR2 5400.

So, unfortunately, I won't be able to do a good test for you to prove
that my Windows Pentium D box is not only cheaper to buy, but faster,
because the specs aren't exactly equivalent. In the mean time, I'm
quite enjoying my 3d games while doing other things at the same time.
I imagine my desktop load approaches that of a CPU-bound database
load. 3d games require significant I/O and CPU.

Anybody who claims that Intel is 2X more expensive for the same
performance, isn't considering all factors. No question at all - the
Opteron is good, and the Xeon isn't - but the original poster didn't
ask about Opeteron or Xeon, did he? For the desktop lines - X2 is not
double Pentium D. Maybe 10%. Maybe not at all. Especially now that
Intel is dropping it's prices due to overstock.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-25 Thread mark

On Tue, Apr 25, 2006 at 11:07:17PM -0400, Ron Peacetree wrote:
> THROUGHPUT is better with DDR2 if and only if there is enough data
> to be fetched in a serial fashion from memory.
> LATENCY however is dependent on the base clock rate of the RAM
> involved.  So PC3200, 200MHz x2, is going to actually perform better
> than PC2-5400, 166MHz x4, for almost any memory access pattern
> except those that are highly sequential.

I had forgotten about this. Still, it's not quite as simple as you say.

DDR2 has increased latency, however, it has a greater upper limit,
and when run at the same clock speed (200 Mhz for 200 Mhz), it is
not going to perform worse. Add in double the pre-fetching capability,
and what you get is that most benchmarks show DDR2 5400 as being
slightly faster than DDR 3200.

AMD is switching to DDR2, and I believe that, even after making such a
big deal about latency, and why they wouldn't switch to DDR2, they are
now saying that their on-chip memory controller will be able to access
DDR2 memory (when they support it soon) faster than Intel can, not
having an on-chip memory controller.

You said that DB accesses are random. I'm not so sure. In PostgreSQL,
are not the individual pages often scanned sequentially, especially
because all records are variable length? You don't think PostgreSQL
will regularly read 32 bytes (8 bytes x 4) at a time, in sequence?
Whether for table pages, or index pages - I'm not seeing why the
accesses wouldn't be sequential. You believe PostgreSQL will access
the table pages and index pages randomly on a per-byte basis? What
is the minimum PostgreSQL record size again? Isn't it 32 bytes or
over? :-)

I wish my systems were running the same OS, and I'd run a test for
you. Alas, I don't think comparing Windows to Linux would be valuable.

> A minor point to be noted in addition here is that most DB servers
> under load are limited by their physical IO subsystem, their HDs,
> and not the speed of their RAM.

It seems like a pretty major point to me. :-)

It's why Opteron with RAID kicks ass over HyperTransport.

> All of the above comments about the relative performance of
> different RAM types become insignificant when performance is gated
> by the HD subsystem.

Yes.

Luckily - we don't all have Terrabyte databases... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Large (8M) cache vs. dual-core CPUs

2006-04-26 Thread mark

On Wed, Apr 26, 2006 at 05:37:31PM -0500, Jim C. Nasby wrote:
> On Wed, Apr 26, 2006 at 06:16:46PM -0400, Bruce Momjian wrote:
> > AMD transfers the dirty cache line directly from cpu to cpu.  I can
> > imaging that helping our test-and-set shared memory usage quite a bit.
> Wasn't the whole point of test-and-set that it's the recommended way to
> do lightweight spinlocks according to AMD/Intel? You'd think they'd have
> a way to make that performant on multiple CPUs (though if it's relying
> on possibly modifying an underlying data page I can't really think of
> how to do that without snaking through the cache...)

It's expensive no matter what. One method might be less expensive than
another. :-)

AMD definately seems to have things right for lowest absolute latency.
2X still sounds like an extreme case - but until I've actually tried a
very large, or thread intensive PostgreSQL db on both, I probably
shouldn't doubt the work of others too much. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] 64-bit vs 32-bit performance ... backwards?

2006-06-12 Thread mark

I've been trying to track this stuff - in fact, I'll likely be
switching from AMD32 to AMD64 in the next few weeks.

I believe I have a handle on the + vs - of 64-bit. It makes sense that
full 64-bit would be slower. At an extreme it halfs the amount of
available memory or doubles the required memory bandwidth, depending
on the work load.

Has anybody taken a look at PostgreSQL to ensure that it uses 32-bit
integers instead of 64-bit integers where only 32-bit is necessary?
32-bit offsets instead of 64-bit pointers? This sort of thing?

I haven't. I'm meaning to take a look. Within registers, 64-bit should
be equal speed to 32-bit. Outside the registers, it would make sense
to only deal with the lower 32-bits where 32-bits is all that is
required.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-15 Thread mark

On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
> On Mon, Aug 14, 2006 at 01:09:04PM -0400, Michael Stone wrote:
> > On Mon, Aug 14, 2006 at 12:05:46PM -0500, Jim C. Nasby wrote:
> > >Wow, interesting. IIRC, XFS is lower performing than ext3,
> > For xlog, maybe. For data, no. Both are definately slower than ext2 for 
> > xlog, which is another reason to have xlog on a small filesystem which 
> > doesn't need metadata journalling.
> Are 'we' sure that such a setup can't lose any data? I'm worried about
> files getting lost when they get written out before the metadata does.

I've been worrying about this myself, and my current conclusion is that
ext2 is bad because: a) fsck, and b) data can be lost or corrupted, which
could lead to the need to trash the xlog.

Even ext3 in writeback mode allows for the indirect blocks to be updated
without the data underneath, allowing for blocks to point to random data,
or worse, previous apparently sane data (especially if the data is from
a drive only used for xlog - the chance is high that a block might look
partially valid?).

So, I'm sticking with ext3 in ordered mode.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-15 Thread mark

On Tue, Aug 15, 2006 at 01:26:46PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
> >Are 'we' sure that such a setup can't lose any data?
> Yes. If you check the archives, you can even find the last time this was 
> discussed...

I looked last night (coincidence actually) and didn't find proof that
you cannot lose data.

How do you deal with the file system structure being updated before the
data blocks are (re-)written?

I don't think you can.

> The bottom line is that the only reason you need a metadata journalling 
> filesystem is to save the fsck time when you come up. On a little 
> partition like xlog, that's not an issue.

fsck isn't only about time to fix. fsck is needed, because the file system
is broken. If the file system is broken, how can you guarantee data has
not been corrupted?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-15 Thread mark

On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, [EMAIL PROTECTED] wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was 
> >>discussed...
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
> You aren't going to find proof, any more than you'll find proof that you 
> won't lose data if you do lose a journalling fs. (Because there isn't 
> any.) Unfortunately, many people misunderstand the what a metadata 
> journal does for you, and overstate its importance in this type of 
> application.

Yes, many people do. :-)

> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
> *That's what the postgres log is for.* If the latest xlog entries don't 
> make it to disk, they won't be replayed; if they didn't make it to 
> disk, the transaction would not have been reported as commited. An 
> application that understands filesystem semantics can guarantee data 
> integrity without metadata journaling.

No. This is not true. Updating the file system structure (inodes, indirect
blocks) touches a separate part of the disk than the actual data. If
the file system structure is modified, say, to extend a file to allow
it to contain more data, but the data itself is not written, then upon
a restore, with a system such as ext2, or ext3 with writeback, or xfs,
it is possible that the end of the file, even the postgres log file,
will contain a random block of data from the disk. If this random block
of data happens to look like a valid xlog block, it may be played back,
and the database corrupted.

If the file system is only used for xlog data, the chance that it looks
like a valid block increases, would it not?

> >>The bottom line is that the only reason you need a metadata journalling 
> >>filesystem is to save the fsck time when you come up. On a little 
> >>partition like xlog, that's not an issue.
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken. 
> fsck is needed to reconcile the metadata with the on-disk allocations. 
> To do that, it reads all the inodes and their corresponding directory 
> entries. The time to do that is proportional to the size of the 
> filesystem, hence the comment about time. fsck is not needed "because 
> the filesystem is broken", it's needed because the filesystem is marked 
> dirty. 

This is also wrong. fsck is needed because the file system is broken.

It takes time, because it doesn't have a journal to help it, therefore it
must look through the entire file system and guess what the problems are.
There are classes of problems such as I describe above, for which fsck
*cannot* guess how to solve the problem. There is not enough information
available for it to deduce that anything is wrong at all.

The probability is low, for sure - but then, the chance of a file system
failure is already low.

Betting on ext2 + postgresql xlog has not been confirmed to me as reliable.

Telling me that journalling is misunderstood doesn't prove to me that you
understand it.

I don't mean to be offensive, but I won't accept what you say, as it does
not make sense with my understanding of how file systems work. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-15 Thread mark

On Tue, Aug 15, 2006 at 02:15:05PM -0500, Jim C. Nasby wrote:
> So what causes files to get 'lost' and get stuck in lost+found?
> AFAIK that's because the file was written before the metadata. Now, if
> fsync'ing a file also ensures that all the metadata is written, then
> we're probably fine... if not, then we're at risk every time we create a
> new file (every WAL segment if archiving is on, and every time a
> relation passes a 1GB boundary).

Only if fsync ensures that the data written to disk is ordered, which as
far as I know, is not done for ext2. Dirty blocks are written in whatever
order is fastest for them to be written, or sequential order, or some
order that isn't based on examining the metadata.

If my understanding is correct - and I've seen nothing yet to say that
it isn't - ext2 is not safe, postgresql xlog or not, fsck or not. It
is safer than no postgresql xlog - but there exists windows, however
small, where the file system can be corrupted.

The need for fsck is due to this problem. If fsck needs to do anything
at all, other than replay a journal, the file system is broken.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-15 Thread mark

On Tue, Aug 15, 2006 at 04:05:17PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> > I've been worrying about this myself, and my current conclusion is that
> > ext2 is bad because: a) fsck, and b) data can be lost or corrupted, which
> > could lead to the need to trash the xlog.
> > Even ext3 in writeback mode allows for the indirect blocks to be updated
> > without the data underneath, allowing for blocks to point to random data,
> > or worse, previous apparently sane data (especially if the data is from
> > a drive only used for xlog - the chance is high that a block might look
> > partially valid?).
> At least for xlog, this worrying is misguided, because we zero and fsync
> a WAL file before we ever put any valuable data into it.  Unless the
> filesystem is lying through its teeth about having done an fsync, there
> should be no metadata changes happening for an active WAL file (other
> than mtime of course).

Hmmm... I may have missed a post about this in the archive.

WAL file is never appended - only re-written?

If so, then I'm wrong, and ext2 is fine. The requirement is that no
file system structures change as a result of any writes that
PostgreSQL does. If no file system structures change, then I take
everything back as uninformed.

Please confirm whichever. :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-15 Thread mark

On Tue, Aug 15, 2006 at 04:58:59PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 03:39:51PM -0400, [EMAIL PROTECTED] wrote:
> >No. This is not true. Updating the file system structure (inodes, indirect
> >blocks) touches a separate part of the disk than the actual data. If
> >the file system structure is modified, say, to extend a file to allow
> >it to contain more data, but the data itself is not written, then upon
> >a restore, with a system such as ext2, or ext3 with writeback, or xfs,
> >it is possible that the end of the file, even the postgres log file,
> >will contain a random block of data from the disk. If this random block
> >of data happens to look like a valid xlog block, it may be played back,
> >and the database corrupted.
> you're conflating a whole lot of different issues here. You're ignoring 
> the fact that postgres preallocates the xlog segment, you're ignoring 
> the fact that you can sync a directory entry, you're ignoring the fact 
> that syncing some metadata (such as atime) doesn't matter (only the 
> block allocation is important in this case, and the blocks are 
> pre-allocated).

Yes, no, no, no. :-)

I didn't know that the xlog segment only uses pre-allocated space. I
ignore mtime/atime as they don't count as file system structure
changes to me. It's updating a field in place. No change to the structure.

With the pre-allocation knowledge, I agree with you. Not sure how I
missed that in my reviewing of the archives... I did know it
pre-allocated once upon a time... Hmm

> >This is also wrong. fsck is needed because the file system is broken.
> nope, the file system *may* be broken. the dirty flag simply indicates 
> that the filesystem needs to be checked to find out whether or not it is 
> broken.

Ah, but if we knew it wasn't broken, then fsck wouldn't be needed, now
would it? So we assume that it is broken. A little bit of a game, but
it is important to me. If I assumed the file system was not broken, I
wouldn't run fsck. I run fsck, because I assume it may be broken. If
broken, it indicates potential corruption.

The difference for me, is that if you are correct, that the xlog is
safe, than for a disk that only uses xlog, fsck is not ever necessary,
even after a system crash. If fsck is necessary, then there is potential
for a problem.

With the pre-allocation knowledge, I'm tempted to agree with you that
fsck is not ever necessary for partitions that only hold a properly
pre-allocated xlog.

> >I don't mean to be offensive, but I won't accept what you say, as it does
> >not make sense with my understanding of how file systems work. :-)
>  I'm not getting paid to convince you of anything.

Just getting you to back up your claim a bit... As I said, no intent
to offend. I learned from it.

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] Q: Performance of join vs embedded query for simple queries?

2006-08-17 Thread mark

942c.dd2911d5.ad56.08:00:09:fd:1b:da') and lower(name) = 
lower('markm-Q00855572');
  QUERY PLAN 
---
 Index Scan using sm_change_name_key on sm_change  (cost=2.99..7.82 rows=1 
width=80) (actual time=0.322..0.328 rows=1 loops=1)
   Index Cond: (lower((name)::text) = 'markm-q00855572'::text)
   Filter: (system_dbid = $0)
   InitPlan
 ->  Seq Scan on sm_system  (cost=0.00..2.99 rows=1 width=4) (actual 
time=0.052..0.106 rows=1 loops=1)
   Filter: ((uid)::text = 
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text)
 Total runtime: 0.419 ms
(7 rows)

Time: 16.494 ms


neudb=> explain analyze select sm_change.uid, sm_change.name from sm_change 
join sm_system using (system_dbid) where sm_system.uid = 
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da' and lower(sm_change.name) = 
lower('markm-Q00855572');
 QUERY PLAN 
-
 Nested Loop  (cost=0.00..7.83 rows=1 width=80) (actual time=0.099..0.159 
rows=1 loops=1)
   Join Filter: ("outer".system_dbid = "inner".system_dbid)
   ->  Index Scan using sm_change_name_key on sm_change  (cost=0.00..4.83 
rows=1 width=84) (actual time=0.053..0.059 rows=1 loops=1)
 Index Cond: (lower((name)::text) = 'markm-q00855572'::text)
   ->  Seq Scan on sm_system  (cost=0.00..2.99 rows=1 width=4) (actual 
time=0.030..0.077 rows=1 loops=1)
 Filter: ((uid)::text = 
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text)
 Total runtime: 0.250 ms
(7 rows)

Time: 1.898 ms


I'm still learning how PostgreSQL works internally. My understanding
is that the above are essentially the same. The first finds the system
row using a sequential scan, then looks for the change row using the
index, filtering by the system value. The second finds the change rows
using the same index, expecting to find one row, and finding only one
row, and matches it up against the system row using a sequential scan.

So why does one reliably run faster than the other?

neudb=> prepare plan1 (varchar(80), varchar(80)) as select uid, name from 
sm_change where system_dbid = (select system_dbid from sm_system where uid = 
$1) and lower(name) = lower($2);

neudb=> prepare plan2 (varchar(80), varchar(80)) as select sm_change.uid, 
sm_change.name from sm_change join sm_system using (system_dbid) where 
sm_system.uid = $1 and lower(sm_change.name) = lower($2);

Now:

neudb=> execute plan1 ('2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da', 
'markm-q00855572');
   uid|  name
--+-
 ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572
(1 row)

Time: 0.794 ms


neudb=> execute plan2 ('2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da', 
'markm-q00855572');
   uid|  name
--+-
 ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572
(1 row)

Time: 0.715 ms


The numbers above don't mean anything. I ran both a few dozen times, and my 
conclusion
is that after the plan is prepared (I did explain analyze to ensure that the 
prepared
plans were the same as the dynamically generated plans), the times are the 
same. Both
ranged from 0.690 ms -> 0.850 ms. Timings at these resolutions are not so 
reliable. :-)

I think this means that the planner takes longer to figure out what to do about 
the
join, and that my writing the select out as an embedded select reduces the 
effort
required by the planner. This makes sense to me, except that I thought 
PostgreSQL
would convert back and forth between the two forms automatically. They are the 
same
query, are they not? Why wouldn't they both take longer, or both take shorter? 
What
if I invented a scenario where the difference in plans made a major difference,
such as making the system table much larger, still without an index? Should they
not both come up with the same plan - the better estimated plan?

Am I expecting too much? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Q: Performance of join vs embedded query for simple queries?

2006-08-17 Thread mark

On Thu, Aug 17, 2006 at 09:21:33PM -0400, Tom Lane wrote:
> [EMAIL PROTECTED] writes:
> > I have two simple queries that do what I believe to be the exact same
> > thing.
> These are actually not equivalent per spec.
> ...
> This still isn't equivalent to the join: it'll return at most one copy
> of any sm_change row, whereas you can get multiple copies of the same
> sm_change row from the join, if there were multiple matching sm_system
> rows.  (Hm, given the unique index on (system_dbid, uid), I guess that
> couldn't actually happen --- but you have to reason about it knowing
> that that index is there, it's not obvious from the form of the query.)

> Anyway: given the way that the planner works, the IN form and the join
> form will probably take comparable amounts of time to plan.  The "=
> subselect" form is much more constrained in terms of the number of
> alternative implementations we have, so it doesn't surprise me that it
> takes less time to plan.

That makes sense. Would it be reasonable for the planner to eliminate
plan considerations based on the existence of unique indexes, or is
this a fundamentally difficult thing to get right in the general case?

I did the elimination in my head, which is why I considered the plans to
be the same. Can the planner do it?

Sub-millisecond planning/execution for simple queries on moderate
hardware seems sexy... :-)

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Q: Performance of join vs embedded query for simple queries?

2006-08-17 Thread mark

On Thu, Aug 17, 2006 at 09:21:33PM -0400, Tom Lane wrote:
> Another related form is
> 
> neudb=> select uid, name from sm_change where system_dbid IN (select 
> system_dbid from sm_system where uid = 
> '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da') and lower(name) = 
> lower('markm-Q00855572');
> ...
> Anyway: given the way that the planner works, the IN form and the join
> form will probably take comparable amounts of time to plan.  The "=
> subselect" form is much more constrained in terms of the number of
> alternative implementations we have, so it doesn't surprise me that it
> takes less time to plan.

FYI: You are correct. The IN takes about as long as the join to plan,
and does generate the same plan as the join. This restores confidence
for me that PostgreSQL is able to understand the two as equivalent.

With regard to that unique constraint planning - I gave you the wrong
query from my log. I had already thought that through, and realized
that my original query missed the type. The timings and plans are the
functionally the same for all the three queries, with or without the
type qualifier. This is the table:

Table "public.sm_system"
   Column|  Type  |Modifiers 
-++-
 system_dbid | integer| not null default 
nextval('sm_system_system_dbid_seq'::regclass)
 type| character varying(10)  | not null
 uid | character varying(200) | not null
 name| character varying(200) | not null
 owner   | character varying(80)  | not null
Indexes:
"sm_system_pkey" PRIMARY KEY, btree (system_dbid) CLUSTER
"sm_system_type_key" UNIQUE, btree ("type", uid)
Check constraints:
"sm_system_type_check" CHECK ("type"::text = 'NEU'::text OR "type"::text = 
'PLS'::text)

And this is what the query should have been:

neudb=> explain analyze select uid, name from sm_change where system_dbid IN 
(select system_dbid from sm_system where type = 'NEU' and uid = 
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da') and lower(name) = 
lower('markm-Q00855572');
 QUERY PLAN 
-
 Nested Loop IN Join  (cost=0.00..7.86 rows=1 width=80) (actual 
time=19.438..19.453 rows=1 loops=1)
   ->  Index Scan using sm_change_name_key on sm_change  (cost=0.00..4.83 
rows=1 width=84) (actual time=0.064..0.073 rows=1 loops=1)
 Index Cond: (lower((name)::text) = 'markm-q00855572'::text)
   ->  Index Scan using sm_system_pkey on sm_system  (cost=0.00..3.02 rows=1 
width=4) (actual time=19.358..19.358 rows=1 loops=1)
 Index Cond: ("outer".system_dbid = sm_system.system_dbid)
 Filter: ((("type")::text = 'NEU'::text) AND ((uid)::text = 
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text))
 Total runtime: 19.568 ms
(7 rows)

Time: 21.449 ms


I guess the case isn't as simple as I thought. It would need to recognize
that the specification of both the 'type' and the 'uid' are static, and
unique, therefore the argument to the IN, or the table that it is joining
with will be either 0 rows or 1 row. Too complicated to be worth it, eh? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)

2006-09-18 Thread mark

On Mon, Sep 18, 2006 at 06:10:13PM -0700, Luke Lonergan wrote:
> Also be sure to set the random_page_cost parameter in
> postgresql.conf to 100 or even higher when you use indexes, as the actual
> seek rate for random access ranges between 50 and 300 for modern disk
> hardware.  If this parameter is left at the default of 4, indexes will often
> be used inappropriately.

Does a tool exist yet to time this for a particular configuration?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread mark

On Fri, Sep 22, 2006 at 02:52:09PM +1200, Guy Thornley wrote:
> > >> I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly
> > >> meant for this purpose?
> > > This is a good idea - I wasn't aware that this was possible.
> > This possibility was the reason for me to propose it. :-)
> posix_fadvise() features in the TODO list already; I'm not sure if any work
> on it has been done for pg8.2.
> 
> Anyway, I understand that POSIX_FADV_DONTNEED on a linux 2.6 kernel allows
> pages to be discarded from memory earlier than usual. This is useful, since
> it means you can prevent your seqscan from nuking the OS cache.
> 
> Of course you could argue the OS should be able to detect this, and prevent
> it occuring anyway. I don't know anything about linux's behaviour in this
> area.

I recall either monitoring or participating in the discussion when this
call was added to Linux.

I don't believe the kernel can auto-detect that you do not need a page
any longer. It can only prioritize pages to keep when memory is fully
in use and a new page must be loaded. This is often some sort of LRU
scheme. If the page is truly useless, only the application can know.

I'm not convinced that PostgreSQL can know this. The case where it is
useful is if a single process is sequentially scanning a large file
(much larger than memory). As soon as it is more than one process,
or if it is not a sequential scan, or if it is not a large file, this
call hurts more than it gains. Just because I'm done with the page does
not mean that *you* are done with the page.

I'd advise against using this call unless it can be shown that the page
will not be used in the future, or at least, that the page is less useful
than all other pages currently in memory. This is what the call really means.
It means, "There is no value to keeping this page in memory".

Perhaps certain PostgreSQL loads fit this pattern. None of my uses fit
this pattern, and I have trouble believing that a majority of PostgreSQL
loads fits this pattern.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Opteron vs. Xeon "benchmark"

2006-09-22 Thread mark

On Fri, Sep 22, 2006 at 11:50:47PM +0200, Arjen van der Meijden wrote:
> If you're an AMD-fan, by all means, buy their products, those processors 
> are indeed fast and you can build decent servers with them. But don't 
> rule out Intel, just because with previous processors they were the 
> slower player ;)

Yep. From what I understand, Intel is 8 to 10 times the size of AMD.

It's somewhat amazing that AMD even competes, and excellent for us, the
consumer, that they compete well, ensuring that we get very fast
computers, for amazingly low prices.

But Intel isn't crashing down any time soon. Perhaps they became a little
lazy, and made a few mistakes. AMD is forcing them to clean up.

May the competition continue... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Hints proposal

2006-10-16 Thread mark

On Sun, Oct 15, 2006 at 05:25:31PM -0700, Craig A. James wrote:
> So my question is: Is there any argument that can be made to persuade those 
> of you who are volunteering your time on the optimizer to even consider a 
> HINTS proposal?  Has all this discussion changed your perspective on 2(c), 
> and why it really matters to some of us?  Are we just wasting our time, or 
> is this a fruitful discussion?

They're waiting for an idea that captures their imagination. So far,
it seems like a re-hashing of old ideas that have been previously shot
down, none of which seem overly imaginative, or can be shown to
provide significant improvement short term or long term... :-)

Haha. That's my take on it. Sorry if it is harsh.

To get very competent people to volunteer their time, you need to make
them believe. They need to dream about it, and wake up the next morning
filled with a desire to try out some of their ideas.

You need to brain wash them... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Optimization of this SQL sentence

2006-10-18 Thread mark

On Wed, Oct 18, 2006 at 11:31:44AM +0200, Mario Weilguni wrote:
> It's not a bad idea. Usually I use postal codes with 25 chars, and never had 
> any problem. With text, the limit would be ~1 GB. No matter how much testing 
> in the application happens, the varchar(25) as last resort is a good idea.

> And in most cases, the application itself limits the length, and thus it's 
> good to reflect this in the database design.

> Feel free to use text anywhere for your application, and feel free to use 
> numeric(1000) instead of numeric(4) if you want to be prepared for really 
> long numbers, but don't tell other people it's bad database design - it 
> isn't.

It's unnecessary design.

Suggestions in this regard lead towards the user seeing a database error,
instead of a nice specific message provided by the application.

I used to use varchar instead of text, but have since softened, as the
number of times it has ever actually saved me is zero, and the number of
times it has screwed me up (picking too small of a limit too early) has
been a few.

It's kind of like pre-optimization before there is a problem. Sometimes
it works for you, sometimes it works against.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

OT: TCL vs Perl Re: [PERFORM] commit so slow program looks frozen

2006-10-26 Thread mark

Perl started out fast - TCL started out slow. Perl used syntax that,
although it would drive some people crazy, followed a linguistic curve
that Larry Wall claimed was healthy. The English language is crazy,
and yet, it has become standard world wide as well. Designed, regular
languages like Esperanto have not received much support either.

Perl is designed to be practical. TCL was designed to be minimalistic.

Perl uses common idioms for UNIX programmers. // for regular expressions,
$VAR for variables, Many of the statement are familiar for C programmers.
++ for increment (compare against 'incr abc' for TCL). $a=5 for assignment,
compare against 'set abc 5' in TCL.

TCL tries to have a reduced syntax, where 'everything is a string'
which requires wierdness for people.  For example, newline is
end-of-line, so { must be positioned correctly. Code is a string, so
in some cases you need to escape code, otherwise not.

Perl has object oriented support built-in. It's ugly, but it works.
TCL has a questionable '[incr tcl]' package.

Perl has a wealth of modules on CPAN to do almost anything you need to.
TCL has the beginning of one (not as rich), but comes built-in with things
like event loops, and graphicals (Tk).

I could go on and on - but I won't, because this is the PostgreSQL
mailing list. People either get Perl, or TCL, or they don't. More
people 'get' Perl, because it was marketted better, it's syntax is
deceivingly comparable to other well known languages, and for the
longest time, it was much faster than TCL to write (especially when
using regular expressions) and faster to run.

Did TCL get treated unfairly as a result? It's a language. Who cares! :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] MVCC & indexes?

2006-10-31 Thread mark

On Tue, Oct 31, 2006 at 10:55:40PM +0100, Ivan Voras wrote:
> Ok, so MVCC is the best thing since a guy put a round stone on a stick
> and called it "the wheel", but I've seen several references on this list
> about "indexes not being under MVCC" - at least that's how I read it,
> the original posts were explaining why indexes can't be used for solving
> MIN()/MAX()/COUNT() aggregates. Is this correct?

> In particular, I'm trying to find out is there (b)locking involved when
> concurrently updating and/or inserting records in an indexed table. My
> guess is that, since PG does copy+delete on updating, even updating a
> non-indexed field will require fixups in the index tree (to point to the
> new record) and thus (b)locking.

Short bits of blocking. The PostgreSQL index 'problem', is that indexes
are conservative. They only guarantee to return at least as much data as
you should see. They cannot be used to limit what you see to only as much
as you should see.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread mark



> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Scott Marlowe
> Sent: Monday, April 11, 2011 1:29 PM
> To: Glyn Astill
> Cc: Kevin Grittner; Joshua D. Drake; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Linux: more cores = less concurrency.
> 
> On Mon, Apr 11, 2011 at 12:23 PM, Glyn Astill 
> wrote:
> >
> >
> > --- On Mon, 11/4/11, Joshua D. Drake  wrote:
> >
> >> From: Joshua D. Drake 
> >> Subject: Re: [PERFORM] Linux: more cores = less concurrency.
> >> To: "Kevin Grittner" 
> >> Cc: pgsql-performance@postgresql.org, "Glyn Astill"
> 
> >> Date: Monday, 11 April, 2011, 19:12
> >> On Mon, 11 Apr 2011 13:09:15 -0500,
> >> "Kevin Grittner"
> >> 
> >> wrote:
> >> > Glyn Astill 
> >> wrote:
> >> >
> >> >> The new server uses 4 x 8 core Xeon X7550 CPUs at
> >> 2Ghz
> >> >
> >> > Which has hyperthreading.
> >> >
> >> >> our current servers are 2 x 4 core Xeon E5320 CPUs
> >> at 2Ghz.
> >> >
> >> > Which doesn't have hyperthreading.
> >> >
> >
> > Yep, off. If you look at the benchmarks I took, HT absoloutely killed
> it.
> >
> >> > PostgreSQL often performs worse with hyperthreading
> >> than without.
> >> > Have you turned HT off on your new machine?  If
> >> not, I would start
> >> > there.
> >>
> >> And then make sure you aren't running CFQ.
> >>
> >> JD
> >>
> >
> > Not running CFQ, running the no-op i/o scheduler.
> 
> Just FYI, in synthetic pgbench type benchmarks, a 48 core AMD Magny
> Cours with LSI HW RAID and 34 15k6 Hard drives scales almost linearly
> up to 48 or so threads, getting into the 7000+ tps range.  With SW
> RAID it gets into the 5500 tps range.

Just wondering, which LSI card ?
Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or some
other config?


-M


> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performa...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Linux: more cores = less concurrency.

2011-04-11 Thread mark

> -Original Message-
> From: Scott Marlowe [mailto:scott.marl...@gmail.com]
> Sent: Monday, April 11, 2011 6:18 PM
> To: mark
> Cc: Glyn Astill; Kevin Grittner; Joshua D. Drake; pgsql-
> performa...@postgresql.org
> Subject: Re: [PERFORM] Linux: more cores = less concurrency.
> 
> On Mon, Apr 11, 2011 at 6:05 PM, mark  wrote:
> > Just wondering, which LSI card ?
> > Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or
> some
> > other config?
> 
> We were using teh LSI but I'll be switching back to Areca when we
> go back to HW RAID.  The LSI only performed well if we setup 15
> RAID-1 pairs in HW and use linux SW RAID 0 on top.  RAID1+0 in the
> LSI was a pretty mediocre performer.  Areca 1680 OTOH, beats it in
> every test, with HW RAID10 only.  Much simpler to admin.

Interesting, thanks for sharing. 

I guess I have never gotten to the point where I felt I needed more than 2
drives for my xlogs. Maybe I have been dismissing that as a possibility
something. (my biggest array is only 24 SFF drives tho)

I am trying to get my hands on a dual core lsi card for testing at work.
(either a 9265-8i or 9285-8e) don't see any dual core 6Gbps SAS Areca cards
yetstill rocking a Arcea 1130 at home tho. 

-M

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] rant ? check the BBWC

2011-04-20 Thread mark

So sometime along yellow brick firmware road HP changed (and maybe your
vendor did too) the output of what happens when the write cache is off due
to failed batteries attached to the card/cache. (and no they don't always
beep with a self test in case someone happens to be walking near your cage,
and yes you will be wondering why batteries attached to these things are not
hot swappable.)

So yeah -- even if you were pretty sure some code had properly checked with
say nagios checks (or whatever) and were monitoring them... if someone
hasn't replaced a dead battery in a while you should probably be wondering
why. 

Now go forth and test/ your HA/DR plans, check you raid and UPS batteries,
test your monitoring (again). 

-M


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Query improvement

2011-05-02 Thread Mark

Hi I have 3 tables 
page - revision - pagecontent

CREATE TABLE mediawiki.page
(
  page_id serial NOT NULL,
  page_namespace smallint NOT NULL,
  page_title text NOT NULL,
  page_restrictions text,
  page_counter bigint NOT NULL DEFAULT 0,
  page_is_redirect smallint NOT NULL DEFAULT 0,
  page_is_new smallint NOT NULL DEFAULT 0,
  page_random numeric(15,14) NOT NULL DEFAULT random(),
  page_touched timestamp with time zone,
  page_latest integer NOT NULL,
  page_len integer NOT NULL,
  titlevector tsvector,
  page_type integer NOT NULL DEFAULT 0,
  CONSTRAINT page_pkey PRIMARY KEY (page_id)
);

CREATE TABLE mediawiki.revision
(
  rev_id serial NOT NULL,
  rev_page integer,
  rev_text_id integer,
  rev_comment text,
  rev_user integer NOT NULL,
  rev_user_text text NOT NULL,
  rev_timestamp timestamp with time zone NOT NULL,
  rev_minor_edit smallint NOT NULL DEFAULT 0,
  rev_deleted smallint NOT NULL DEFAULT 0,
  rev_len integer,
  rev_parent_id integer,
  CONSTRAINT revision_rev_page_fkey FOREIGN KEY (rev_page)
  REFERENCES mediawiki.page (page_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE CASCADE,
  CONSTRAINT revision_rev_id_key UNIQUE (rev_id)
)

CREATE TABLE mediawiki.pagecontent
(
  old_id integer NOT NULL DEFAULT
nextval('mediawiki.text_old_id_seq'::regclass),
  old_text text,
  old_flags text,
  textvector tsvector,
  CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id)
)

where i have query 
SELECT pa.page_id, pa.page_title, 
ts_rank(pc.textvector,(to_tsquery('fotbal')))+ts_rank(pa.titlevector,(to_tsquery('fotbal')))*10
as totalrank 
from mediawiki.page pa, mediawiki.revision re, mediawiki.pagecontent pc 
WHERE pa.page_id in 
(SELECT page_id FROM mediawiki.page WHERE page_id IN
(SELECT page_id FROM mediawiki.page 
 WHERE (titlevector @@ (to_tsquery('fotbal'
OR page_id IN
(SELECT p.page_id from mediawiki.page p,mediawiki.revision r,
(SELECT old_id FROM mediawiki.pagecontent 
WHERE (textvector @@ (to_tsquery('fotbal' ss
WHERE (p.page_id=r.rev_page AND r.rev_id=ss.old_id)))
AND (pa.page_id=re.rev_page AND re.rev_id=pc.old_id)
ORDER BY totalrank LIMIT 100;

This query find out titles of pages in page and content in page content by
full text search - @@ 
afterwards i count for the resulted id  by ts_rank the relevance.

Now the problem. 
When I try ANALYZE it shows:
"Limit  (cost=136568.00..136568.25 rows=100 width=185)"
"  ->  Sort  (cost=136568.00..137152.26 rows=233703 width=185)"
"Sort Key: ((ts_rank(pc.textvector, to_tsquery('fotbal'::text)) +
(ts_rank(pa.titlevector, to_tsquery('fotbal'::text)) * 10::double
precision)))"
"->  Hash Join  (cost=61707.99..127636.04 rows=233703 width=185)"
"  Hash Cond: (re.rev_id = pc.old_id)"
"  ->  Merge Join  (cost=24098.90..71107.48 rows=233703
width=66)"
"Merge Cond: (pa.page_id = re.rev_page)"
"->  Merge Semi Join  (cost=24096.98..55665.69
rows=233703 width=66)"
"  Merge Cond: (pa.page_id =
mediawiki.page.page_id)"
"  ->  Index Scan using page_btree_id on page pa 
(cost=0.00..13155.20 rows=311604 width=62)"
"  ->  Index Scan using page_btree_id on page 
(cost=24096.98..38810.19 rows=233703 width=4)"
"Filter: ((hashed SubPlan 1) OR (hashed
SubPlan 2))"
"SubPlan 1"
"  ->  Bitmap Heap Scan on page 
(cost=10.41..900.33 rows=270 width=4)"
"Recheck Cond: (titlevector @@
to_tsquery('fotbal'::text))"
"->  Bitmap Index Scan on gin_index 
(cost=0.00..10.34 rows=270 width=0)"
"  Index Cond: (titlevector @@
to_tsquery('fotbal'::text))"
"SubPlan 2"
"  ->  Nested Loop  (cost=1499.29..23192.08
rows=1558 width=4)"
"->  Nested Loop 
(cost=1499.29..15967.11 rows=1558 width=4)"
"  ->  Bitmap Heap Scan on
pagecontent  (cost=1499.29..6448.12 rows=1558 width=4)"
"Recheck Cond:
(textvector @@ to_tsquery('fotbal'::text))"
"->  Bitmap Index Scan
on gin_index2  (cost=0.00..1498.90 rows=1558 width=0)"
"  Index Cond:
(textvector @@ to_tsquery('fotbal'::text))"
"  ->  Index Scan using
page_btree_rev_content_id on revision r  (cost=0.00..6.10 rows=1 width=8)"
"Index Cond: (r.rev_id =
pagecontent.old_id)"
"->

Re: [PERFORM] Query improvement

2011-05-02 Thread Mark

Here is EXPLAIN ANALYZE:

"Limit  (cost=136568.00..136568.25 rows=100 width=185) (actual
time=1952.174..1952.215 rows=100 loops=1)"
"  ->  Sort  (cost=136568.00..137152.26 rows=233703 width=185) (actual
time=1952.172..1952.188 rows=100 loops=1)"
"Sort Key: ((ts_rank(pc.textvector, to_tsquery('fotbal'::text)) +
(ts_rank(pa.titlevector, to_tsquery('fotbal'::text)) * 10::double
precision)))"
"Sort Method:  top-N heapsort  Memory: 23kB"
"->  Hash Join  (cost=61707.99..127636.04 rows=233703 width=185)
(actual time=1046.838..1947.815 rows=3278 loops=1)"
"  Hash Cond: (re.rev_id = pc.old_id)"
"  ->  Merge Join  (cost=24098.90..71107.48 rows=233703
width=66) (actual time=200.884..859.453 rows=3278 loops=1)"
"Merge Cond: (pa.page_id = re.rev_page)"
"->  Merge Semi Join  (cost=24096.98..55665.69
rows=233703 width=66) (actual time=200.843..629.821 rows=3278 loops=1)"
"  Merge Cond: (pa.page_id =
mediawiki.page.page_id)"
"  ->  Index Scan using page_btree_id on page pa 
(cost=0.00..13155.20 rows=311604 width=62) (actual time=0.027..145.989
rows=311175 loops=1)"
"  ->  Index Scan using page_btree_id on page 
(cost=24096.98..38810.19 rows=233703 width=4) (actual time=200.779..429.219
rows=3278 loops=1)"
"Filter: ((hashed SubPlan 1) OR (hashed
SubPlan 2))"
"SubPlan 1"
"  ->  Bitmap Heap Scan on page 
(cost=10.41..900.33 rows=270 width=4) (actual time=0.748..9.845 rows=280
loops=1)"
"Recheck Cond: (titlevector @@
to_tsquery('fotbal'::text))"
"->  Bitmap Index Scan on gin_index 
(cost=0.00..10.34 rows=270 width=0) (actual time=0.586..0.586 rows=280
loops=1)"
"  Index Cond: (titlevector @@
to_tsquery('fotbal'::text))"
"SubPlan 2"
"  ->  Nested Loop  (cost=1499.29..23192.08
rows=1558 width=4) (actual time=2.032..185.743 rows=3250 loops=1)"
"->  Nested Loop 
(cost=1499.29..15967.11 rows=1558 width=4) (actual time=1.980..109.491
rows=3250 loops=1)"
"  ->  Bitmap Heap Scan on
pagecontent  (cost=1499.29..6448.12 rows=1558 width=4) (actual
time=1.901..36.583 rows=3250 loops=1)"
"Recheck Cond:
(textvector @@ to_tsquery('fotbal'::text))"
"->  Bitmap Index Scan
on gin_index2  (cost=0.00..1498.90 rows=1558 width=0) (actual
time=1.405..1.405 rows=3250 loops=1)"
"  Index Cond:
(textvector @@ to_tsquery('fotbal'::text))"
"  ->  Index Scan using
page_btree_rev_content_id on revision r  (cost=0.00..6.10 rows=1 width=8)
(actual time=0.020..0.021 rows=1 loops=3250)"
"Index Cond: (r.rev_id =
pagecontent.old_id)"
"->  Index Scan using page_btree_id
on page p  (cost=0.00..4.62 rows=1 width=4) (actual time=0.022..0.022 rows=1
loops=3250)"
"  Index Cond: (p.page_id =
r.rev_page)"
"->  Index Scan using page_btree_rev_page_id on revision
re  (cost=0.00..11850.52 rows=311604 width=8) (actual time=0.012..166.042
rows=311175 loops=1)"
"  ->  Hash  (cost=27932.04..27932.04 rows=311604 width=127)
(actual time=801.000..801.000 rows=311604 loops=1)"
"Buckets: 1024  Batches: 64  Memory Usage: 744kB"
"->  Seq Scan on pagecontent pc  (cost=0.00..27932.04
rows=311604 width=127) (actual time=0.018..465.686 rows=311604 loops=1)"
"Total runtime: 1952.962 ms"


I have tried 
set enable_hashjoin = false;

set enable_hashjoin = true; 

but the result have been worst than before. By the way is there a posibility
to create beeter query with same effect?
I have tried more queries, but this has got best performance yet. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4365717.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query improvement

2011-05-08 Thread Mark

Thanks for replies. Finally I have used UNION and JOINS, which helped. Mainly
the UNION helped a lot. Now the query takes 1sec max. Thanks a lot. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378163.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query improvement

2011-05-08 Thread Mark

Thanks for reply both UNION and JOINS helped. Mainly the UNION helped a lot.
Now the query takes 1sec max. Thanks a lot. 

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378157.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query improvement

2011-05-08 Thread Mark

Thanks a lot for reply. Finally I have used UNION, but thanks for your help.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378160.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] CLUSTER versus a dedicated table

2011-06-01 Thread mark



> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Robert James
> Sent: Wednesday, June 01, 2011 5:55 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] CLUSTER versus a dedicated table
> 
> Hi.  I'm interested in understanding the differences between
> CLUSTERing a table and making a dedicated one.
> 
> We have a table with about 1 million records.  On a given day, only
> about 1% of them are of interest.  That 1% changes every day (it's
> WHERE active_date = today), and so we index and cluster on it.
> 
> Even so, the planner shows a very large cost for the Index Scan: about
> 3500.  If I instead do a SELECT INTO temp_table FROM big_table WHERE
> active_date = today, and then do SELECT * FROM temp_table, I get a
> planned cost of 65.  Yet, the actual time for both queries is almost
> identical.
> 
> Questions:
> 1. Why is there such a discrepancy between the planner's estimate and
> the actual cost?
> 
> 2. In a case like this, will I in general see a performance gain by
> doing a daily SELECT INTO and then querying from that table? My ad hoc
> test doesn't indicate I would (despite the planner's prediction), and
> I'd rather avoid this if it won't help.
> 
> 3. In general, does CLUSTER provide all the performance benefits of a
> dedicated table? If it doesn't, what does it lack?
> 
> Thank you.

Start here:
http://wiki.postgresql.org/wiki/SlowQueryQuestions

1: there could be many reasons for the planner to come up a grossly
inaccurate ESTIMATE for some values. Last time the table was analyzed, is
usually where people start. 

2: look at table partitioning it's pretty straight forward and sounds like
it might be a good fit for you. It will however involve some triggers or
rules and check constraints. Table partitioning has some downsides though,
you should be aware of what they are before you commit to it. 

3: clustering, from a high level, just reorders the data on disk by a given
index. Depending on your setup keeping it close to that clustered ordering
might be trivial or it might not be. Big tables are relative to different
people, 1M rows might be a big table or it might not be, since you didn't
post the size of the table and indexes we can only guess.  Table
partitioning helps most with table maintenance, IMO,  but can be very useful
it the constraint exclusion can eliminate a large number of child tables
right off so it doesn't have to traverse large indexes or do lots of random
IO. 


You will need to post at lot more specific info if you want more specific
help.  The guide to reporting slow queries or guide to reporting problems
and start gathering specific information and then post back to the list.



-Mark

> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performa...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] not exits slow compared to not in. (nested loops killing me)

2011-06-06 Thread mark

Hi all,

I am trying to speed up a query on a DB I inherited and I am falling
flat on my face .

I changed a query from NOT IN to use NOT EXISTS and my query time went
from 19000ms to several hours (~5000  ms). this shocked me so much
I pretty much had to post. This seems like a corner case of the
planner not knowing that the nested-loops are going to turn out badly
in this case. The planner choosing a 13hr nested loop here is
basically the reason I am posting.

I have played around with rewriting this query using some CTEs and a
left join but thus far my results are not encouraging.   Given what
little I know , it seems like a LEFT JOIN where right_table.col is
null gets the same performance and estimates as a NOT EXISTS. (and
still picks a nested loop in this case)

I can see where it all goes to hell time wise, turning off nested
loops seems to keep it from running for hours for this query, but not
something I am looking to do globally. The time is not really that
much better than just leaving it alone with a NOT IN.

two queries are at http://pgsql.privatepaste.com/a0b672bab0#

the "pretty" explain versions :

NOT IN (with large work mem - 1GB)
http://explain.depesz.com/s/ukj

NOT IN (with only 64MB for work_mem)
http://explain.depesz.com/s/wT0

NOT EXISTS (with 64MB of work_mem)
http://explain.depesz.com/s/EuX

NOT EXISTS (with nested loop off. and 64MB of work_mem)
http://explain.depesz.com/s/UXG

LEFT JOIN/CTE (with nested loop off and 1GB of work_mem)
http://explain.depesz.com/s/Hwm

table defs, with estimated row counts (which all 100% match exact row count)
http://pgsql.privatepaste.com/c2ff39b653

tried running an analyze across the whole database, no affect.

I haven't gotten creative with explicit join orders yet .

postgresql 9.0.2.

willing to try stuff for people as I can run things on a VM for days
and it is no big deal. I can't do that on production machines.

thoughts ? ideas ?


-Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] not exits slow compared to not in. (nested loops killing me)

2011-06-06 Thread mark

> -Original Message-
> From: Craig Ringer [mailto:cr...@postnewspapers.com.au]
> Sent: Monday, June 06, 2011 5:08 PM
> To: mark
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] not exits slow compared to not in. (nested loops
> killing me)
> 
> On 06/07/2011 04:38 AM, mark wrote:
> 
> > NOT EXISTS (with 64MB of work_mem)
> > http://explain.depesz.com/s/EuX
> 
> Hash Anti Join (cost=443572.19..790776.84 rows=1 width=1560)
> (actual time=16337.711..50358.487 rows=2196299 loops=1)
> 
> Note the estimated vs actual rows. Either your stats are completely
> ridiculous, or the planner is confused.

I am starting to think the planner might be confused in 9.0.2. I got a
reasonable query time, given resource constraints, on a very small VM on my
laptop running 9.0.4. 

I am going to work on getting the vm I was using to test this with up to
9.0.4 and test again. 

There is a note in the 9.0.4 release notes 
" Improve planner's handling of semi-join and anti-join cases (Tom Lane)" 

Not sure that is the reason I got a much better outcome with a much smaller
vm. But once I do some more testing I will report back. 

> 
> What are your stats target levels? Have you tried increasing the stats
> levels on the table(s) or at least column(s) affected? Or tweaking
> default_statistics_target if you want to use a bigger hammer?

Will try that as well. Currently the default stat target is 100. Will try at
250, and 500 and report back. 

> 
> Is autovacuum being allowed to do its work and regularly ANALYZE the
> database? Does an explicit 'ANALYZE' help?

Auto vac is running, I have explicitly vacuum & analyzed the whole db. That
didn't change anything. 

> 
> --
> Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] benchmark woes and XFS options

2011-08-08 Thread mark

Hello PG perf junkies, 


Sorry this may get a little long winded. Apologies if the formatting gets
trashed. Also apologies if this double posts. (I originally set it yesterday
with the wrong account and the message is stalled - so my bad there) if
someone is a mod and it's still in the wait queue feel free to remove them. 


Short version: 
my zcav and dd tests look to get ->CPU bound<-. Yes CPU bound, with junk
numbers. My numbers in ZCAV are flat like and SSD which is odd for 15K rpm
disks. I am not sure what the point of moving further would be given these
unexpected poor numbers. Well I knew 12 disks wasn't going to be something
that impressed me, I am used to 24, but I was expecting about 40-50% better
than what I am getting.


Background:

I have been setting up some new servers for PG and I am getting some odd
numbers with zcav, I am hoping a second set of eyes here can point me in the
right direction. (other tests like bonniee++ (1.03e) and dd also give me odd
(flat and low) numbers)

I will preface this with, yes I bought greg's book. Yes I read it, and it
has helped me in the past, but seem to have hit an oddity. 

(hardware,os, and config stuff listed at the end)




Long version:


In the past when dealing with storage I typically see a large gain with
moving from ext3 to XFS, provided I set readahead to 16384 on either
filesystem.

I also see typical down ward trends in the MB/s (expected) and upward trends
in access times (expected) with either file system. 


These blades + storage-blades are giving me atypical results .


I am not seeing a dramatic down turn in MB/s in zcav nor am I seeing access
time really increase. (something I have only seen before when I forget to
have readahead set high enough) things are just flat at about 420MB/s in
zcav @ .6ms for access time with XFS and ~470MB/s @.56ms for ext3.

FWIW I get worthless results with zcav and bonnie++ using 1.03 or 1.96
sometimes, which isn't something I have had happen before even though greg
does mention it. 


Also when running zcav I will see kswapdX (0 and 1 in my two socket case)
start to eat significant cpu time (~40-50% each), with dd - kswapd and
pdflush become very active as well. This only happens once free mem gets
low. As well zcav or dd looks to get CPU bound at 100% while i/o wait stays
almost at 0.0 most of the time. (iostat -x -d shows util % at 98% though). I
see this with either XFS or ext3. Also when I cat /proc/zoneinfo it looks
like I am getting heavy contention for a single page in DMA while the tests
are running. (see end of email for zoneinfo)

Bonnie is giving me 99% cpu usage reported. Watching it while running it
bounces between 100 and 99. Kswap goes nuts here as well. 


I am lead to believe that I may need a 2.6.32 (rhel 6.1) or higher kernel to
see some of the kswapd issues go away. (testing that hopefully later this
week). Maybe that will take care of everything. I don't know yet. 

 Side note: Setting vm.swappiness to 10 (or 0) doesn't help, although others
on the RHEL support site indicated it did fix kswap issues for them.  



Running zcav on my home system (4 disk raid 1+0 3ware controller +BBWC using
ext4 ubunut 2.6.38-8 I don't see zcav near 100% and I see lots of i/o wait
as expected, and my zoneinfo for DMA doesn't sit at 1)

Not going to focus too much on ext3 since I am pretty sure I should be able
to get better numbers from XFS. 



With mkfs.xfs I have done some reading and it appears that it can't
automatically read the stripsize (aka stripe size to anyone other than HP)
or the number of disks. So I have been using the following:

mkfs.xfs -b size=4k -d su=256k,sw=6,agcount=256

(256K is the default hp stripsize for raid1+0, I have 12 disks in raid 10 so
I used sw=6, agcount of 256 because that is a (random) number I got from
google that seemed in the ball park.)






which gives me:
meta-data=/dev/cciss/c0d0isize=256agcount=256, agsize=839936
blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=215012774, imaxpct=25
 =   sunit=64 swidth=384 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=32768, version=2
 =   sectsz=512   sunit=64 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


(if I don't specif the agcount or su,sw stuff I get
meta-data=/dev/cciss/c0d0isize=256agcount=4, agsize=53753194
blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=215012774, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=32768, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=

Re: [PERFORM] benchmark woes and XFS options

2011-08-08 Thread mark



> -Original Message-
> From: Greg Smith [mailto:g...@2ndquadrant.com]
> Sent: Monday, August 08, 2011 9:42 PM
> To: mark
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] benchmark woes and XFS options
> 
> I think your notion that you have an HP CCISS driver in this older
> kernel that just doesn't drive your card very fast is worth exploring.
> What I sometimes do in the situation you're in is boot a Linux
> distribution that comes with a decent live CD, such as Debian or
> Ubuntu.  Just mount the suspect drive, punch up read-ahead, and re-test
> performance.  That should work well enough to do a simple dd test, and
> probably well enough to compile and run bonnie++ too.  If that gives
> good performance numbers, it should narrow the list of possible causes
> considerably.  You really need to separate out "bad driver" from the
> other possibilities here given what you've described, and that's a low
> impact way to do it.
> 
> --
> Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

Thanks Greg.

I will try and give HPSA a whirl and report back. Both with single disk and
the whole raid set. I am out of the office this week so I might have some
delay before I can do some more detective work.  

I don't think I have any gear that won't require either the CCISS or HPSA
driver and in SFF drives. But will try and look around. 



-Mark




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] XFS options and benchmark woes

2011-08-09 Thread mark

Hello PG perf junkies, 


Sorry this may get a little long winded. Apologies if the formatting gets
trashed. 



Background:

I have been setting up some new servers for PG and I am getting some odd
numbers with zcav, I am hoping a second set of eyes here can point me in the
right direction. (other tests like bonniee++ (1.03e) and dd also give me odd
(flat and low) numbers)

I will preface this with, yes I bought greg's book. Yes I read it, and it
has helped me in the past, but seem to have hit an oddity. 

(hardware,os, and config stuff listed at the end)





Short version: my zcav and dd tests look to get I/O bound. My numbers in
ZCAV are flat like and SSD which is odd for 15K rpm disks. 




Long version:


In the past when dealing with storage I typically see a large gain with
moving from ext3 to XFS, provided I set readahead to 16384 on either
filesystem.

I also see typical down ward trends in the MB/s (expected) and upward trends
in access times (expected) with either file system. 


These blades + storage-blades are giving me atypical results .


I am not seeing a dramatic down turn in MB/s in zcav nor am I seeing access
time really increase. (something I have only seen before when I forget to
have readahead set high enough) things are just flat at about 420MB/s in
zcav @ .6ms for access time with XFS and ~470MB/s @.56ms for ext3.

FWIW I get worthless results with zcav and bonnie++ using 1.03 or 1.96
sometimes, which isn't something I have had happen before even though greg
does mention it. 


Also when running zcav I will see kswapdX (0 and 1 in my two socket case)
start to eat significant cpu time (~40-50% each), with dd - kswapd and
pdflush become very active as well. This only happens once free mem gets
low. As well zcav or dd looks to get CPU bound at 100% while i/o wait stays
almost at 0.0 most of the time. (iostat -x -d shows util % at 98% though). I
see this with either XFS or ext3. Also when I cat /proc/zoneinfo it looks
like I am getting heavy contention for a single page in DMA while the tests
are running. (see end of email for zoneinfo)

Bonnie is giving me 99% cpu usage reported. Watching it while running it
bounces between 100 and 99. Kswap goes nuts here as well. 


I am lead to believe that I may need a 2.6.32 (rhel 6.1) or higher kernel to
see some of the kswapd issues go away. (testing that hopefully later this
week). Maybe that will take care of everything. I don't know yet. 

 Side note: Setting vm.swappiness to 10 (or 0) doesn't help, although others
on the RHEL support site indicated it did fix kswap issues for them.  



Running zcav on my home system (4 disk raid 1+0 3ware controller +BBWC using
ext4 ubunut 2.6.38-8 I don't see zcav near 100% and I see lots of i/o wait
as expected, and my zoneinfo for DMA doesn't sit at 1)

Not going to focus too much on ext3 since I am pretty sure I should be able
to get better numbers from XFS. 



With mkfs.xfs I have done some reading and it appears that it can't
automatically read the stripsize (aka stripe size to anyone other than HP)
or the number of disks. So I have been using the following:

mkfs.xfs -b size=4k -d su=256k,sw=6,agcount=256

(256K is the default hp stripsize for raid1+0, I have 12 disks in raid 10 so
I used sw=6, agcount of 256 because that is a (random) number I got from
google that seemed in the ball park.)






which gives me:
meta-data=/dev/cciss/c0d0isize=256agcount=256, agsize=839936
blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=215012774, imaxpct=25
 =   sunit=64 swidth=384 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=32768, version=2
 =   sectsz=512   sunit=64 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


(if I don't specif the agcount or su,sw stuff I get
meta-data=/dev/cciss/c0d0isize=256agcount=4, agsize=53753194
blks
 =   sectsz=512   attr=2
data =   bsize=4096   blocks=215012774, imaxpct=25
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=4096   ascii-ci=0
log  =internal log   bsize=4096   blocks=32768, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0)

)





So it seems like I should be giving it the extra parameters at mkfs.xfs
time... could someone confirm ? In the past I have never specified the su or
sw or ag groups I have taken the defaults. But since I am getting odd
numbers here I started playing with em. Getting little or no change. 



for mounting:
logbufs=8,noatime,nodiratime,nobarrier,inode64,allocsize=16m



(I know that noatime also means nodiratime according xfs.org, but in the
past I seem to get better numbers when

Re: [PERFORM] XFS options and benchmark woes

2011-08-09 Thread mark



> -Original Message-
> From: mark [mailto:m...@sm-a.net]
> Sent: Monday, August 08, 2011 12:15 AM
> To: 'pgsql-performance@postgresql.org'
> Subject: XFS options and benchmark woes
> 
> Hello PG perf junkies,
> 
> 
> Sorry this may get a little long winded. Apologies if the formatting
> gets trashed.
> 
> 
> 
> Background:
> 
> I have been setting up some new servers for PG and I am getting some
> odd numbers with zcav, I am hoping a second set of eyes here can point
> me in the right direction. (other tests like bonniee++ (1.03e) and dd
> also give me odd (flat and low) numbers)
> 
> I will preface this with, yes I bought greg's book. Yes I read it, and
> it has helped me in the past, but seem to have hit an oddity.
> 
> (hardware,os, and config stuff listed at the end)
> 
> 
> 
> 
> 
> Short version: my zcav and dd tests look to get I/O bound. My numbers
> in ZCAV are flat like and SSD which is odd for 15K rpm disks.


uggg, ZCAV numbers appear to be CPU bound. Not i/o .

> 
> 
> 
> 
> Long version:
> 
> 
> In the past when dealing with storage I typically see a large gain with
> moving from ext3 to XFS, provided I set readahead to 16384 on either
> filesystem.
> 
> I also see typical down ward trends in the MB/s (expected) and upward
> trends in access times (expected) with either file system.
> 
> 
> These blades + storage-blades are giving me atypical results .
> 
> 
> I am not seeing a dramatic down turn in MB/s in zcav nor am I seeing
> access time really increase. (something I have only seen before when I
> forget to have readahead set high enough) things are just flat at about
> 420MB/s in zcav @ .6ms for access time with XFS and ~470MB/s @.56ms for
> ext3.
> 
> FWIW I get worthless results with zcav and bonnie++ using 1.03 or 1.96
> sometimes, which isn't something I have had happen before even though
> greg does mention it.
> 
> 
> Also when running zcav I will see kswapdX (0 and 1 in my two socket
> case) start to eat significant cpu time (~40-50% each), with dd -
> kswapd and pdflush become very active as well. This only happens once
> free mem gets low. As well zcav or dd looks to get CPU bound at 100%
> while i/o wait stays almost at 0.0 most of the time. (iostat -x -d
> shows util % at 98% though). I see this with either XFS or ext3. Also
> when I cat /proc/zoneinfo it looks like I am getting heavy contention
> for a single page in DMA while the tests are running. (see end of email
> for zoneinfo)
> 
> Bonnie is giving me 99% cpu usage reported. Watching it while running
> it bounces between 100 and 99. Kswap goes nuts here as well.
> 
> 
> I am lead to believe that I may need a 2.6.32 (rhel 6.1) or higher
> kernel to see some of the kswapd issues go away. (testing that
> hopefully later this week). Maybe that will take care of everything. I
> don't know yet.
> 
>  Side note: Setting vm.swappiness to 10 (or 0) doesn't help, although
> others on the RHEL support site indicated it did fix kswap issues for
> them.
> 
> 
> 
> Running zcav on my home system (4 disk raid 1+0 3ware controller +BBWC
> using ext4 ubunut 2.6.38-8 I don't see zcav near 100% and I see lots of
> i/o wait as expected, and my zoneinfo for DMA doesn't sit at 1)
> 
> Not going to focus too much on ext3 since I am pretty sure I should be
> able to get better numbers from XFS.
> 
> 
> 
> With mkfs.xfs I have done some reading and it appears that it can't
> automatically read the stripsize (aka stripe size to anyone other than
> HP) or the number of disks. So I have been using the following:
> 
> mkfs.xfs -b size=4k -d su=256k,sw=6,agcount=256
> 
> (256K is the default hp stripsize for raid1+0, I have 12 disks in raid
> 10 so I used sw=6, agcount of 256 because that is a (random) number I
> got from google that seemed in the ball park.)
> 
> 
> 
> 
> 
> 
> which gives me:
> meta-data=/dev/cciss/c0d0isize=256agcount=256,
> agsize=839936 blks
>  =   sectsz=512   attr=2
> data =   bsize=4096   blocks=215012774,
> imaxpct=25
>  =   sunit=64 swidth=384 blks
> naming   =version 2  bsize=4096   ascii-ci=0
> log  =internal log   bsize=4096   blocks=32768, version=2
>  =   sectsz=512   sunit=64 blks, lazy-
> count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> 
> 
> (if I don't specif the agcount or su,sw stuff I get
> meta-data=/dev/cciss/c0d0isize=256agcount=4,
> agsize=53753194 blks
>  =

Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++

2011-08-17 Thread mark



> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Greg Smith
> Sent: Wednesday, August 17, 2011 3:18 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++
> 
> On 08/17/2011 02:26 PM, Ogden wrote:
> > I am using bonnie++ to benchmark our current Postgres system (on RAID
> > 5) with the new one we have, which I have configured with RAID 10.
> The
> > drives are the same (SAS 15K). I tried the new system with ext3 and
> > then XFS but the results seem really outrageous as compared to the
> > current system, or am I reading things wrong?
> >
> > The benchmark results are here:
> > http://malekkoheavyindustry.com/benchmark.html
> 
> Congratulations--you're now qualified to be a member of the "RAID5
> sucks" club.  You can find other members at
> http://www.miracleas.com/BAARF/BAARF2.html  Reasonable read speeds and
> just terrible write ones are expected if that's on your old hardware.
> Your new results are what I would expect from the hardware you've
> described.
> 
> The only thing that looks weird are your ext4 "Sequential Output -
> Block" results.  They should be between the ext3 and the XFS results,
> not far lower than either.  Normally this only comes from using a bad
> set of mount options.  With a battery-backed write cache, you'd want to
> use "nobarrier" for example; if you didn't do that, that can crush
> output rates.
> 

To clarify maybe for those new at using non-default mount options.

With XFS the mount option is nobarrier. With ext4 I think it is barrier=0

Someone please correct me if I am misleading people or otherwise mistaken.

-mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?

2011-09-11 Thread mark



>From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Anthony Presley
>Sent: Sunday, September 11, 2011 4:45 PM
>To: pgsql-performance@postgresql.org
>Subject: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?

>We've currently got PG 8.4.4 running on a whitebox hardware set up, with
(2) 5410 Xeon's, and 16GB of RAM.  It's also got (4) 7200RPM SATA drives,
using the onboard IDE controller and ext3. 

>A few weeks back, we purchased two refurb'd HP DL360's G5's, and were
hoping to set them up with PG 9.0.2, running replicated.  These machines
have (2) 5410 Xeon's, 36GB of RAM, (6) 10k SAS drives, and are using the HP
SA P400i with 512MB of BBWC.  PG is running on an ext4 (noatime) partition,
and they drives configured as RAID 1+0 (seems with this controller, I cannot
do JBOD).  I've spent a few hours going back and forth benchmarking the new
systems, and have set up the DWC, and the accelerator cache using hpacucli.
 I've tried accelerator caches of 25/75, 50/50, and 75/25.
>


I would start of by recommending a more current version of 9.0...like 9.0.4
since you are building a new box. The rumor mill says 9.0.5 and 9.1.0 might
be out soon (days?). but that is just rumor mill. Don't bank on it. 


What kernel are you on ?

Long time HP user here, for better and worse... so here are a few other
little things I recommend. 

Check the bios power management. Make sure it is set where you want it.
(IIRC the G5s have this, I know G6s and G7s do). This can help with nasty
latency problems if the box has been idle for a while then needs to start
doing work.

The p400i is not a great card, compared to more modern one, but you should
be able to beat the old setup with what you have. Faster clocked cpu's more
spindles, faster RPM spindles. 

Assuming the battery is working, with XFS or ext4 you can use nobarrier
mount option and you should see some improvement. 


Make sure the raid card's firmware is current. I can't stress this enough.
HP fixed a nasty bug with Raid 1+0 a few months ago where you could eat your
data... They also seem to be fixing a lot of other bugs along the way as
well. So do yourself a big favor and make sure that firmware is current. It
might just head off headache down the road.

Also make sure you have a 8.10.? (IIRC the version number right) or better
version of hpacucli... there have been some fixes to that utility as well.
IIRC most of the fixes in this have been around recognizing newere cards
(812s and 410s) but some interface bugs have been fixed as well.   You may
need new packages for HP health. (I don't recall the official name, but new
versions if hpacucli might not play well with old versions of hp health. 

Its HP so they have a new version about every month for firmware and their
cli utility... thats HP for us. 

Anyways that is my fast input.

Best of luck,


-Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] pg9 replication over WAN ?

2011-10-06 Thread mark



> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Ben Ciceron
> Sent: Wednesday, October 05, 2011 3:53 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] pg9 replication over WAN ?
> 
> Hello,
> 
> Has any performance or evaluation done for pg9.x streaming replication
> over WAN ?
> How adequate is the protocol to push WALs over long distance ?
> Any best practice tuning wal_* for WAN ?
> 
> Cheers,
> Ben-


Works for us between the middle of the US and the UK. This on some databases
that do a few gig of log files per day. 

I keep a large number of wal_keep_segments because I have the disk capacity
for it and the initial rsync takes a while at ~2-4MB/s. 


You may find you need to tune the tcp-keep alive settings, but I am not sure
if that was really a change in the setting or a fix to our load balancers
that fixed an issue I was seeing. 

Overall I am extremely pleased with streaming replication + read only hot
standby. 




-Mark


> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> performa...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Deferred constraints performance impact ?

2012-07-19 Thread mark

Hi all, 


We have put some deferred constraints (some initially immediate, some
initially deferred) into our database for testing with our applications. 

I haven't seen any noticeable loss in performance, but I am not sure I can
properly simulate our full production environment load levels in my tests.
I was wondering if I am turning on something that has known significant
negative impacts to performance, just from having them there. 

I understand a lot more may have to be tracked through a transaction and
there could be some impact from that. Similar to an after update trigger? Or
are the two not comparable in terms of impact from what is tracked and then
checked. 


Anyways, just looking for feedback if anyone has any. 


-Mark


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Delete query takes exorbitant amount of time

2005-03-28 Thread Mark Lewis

Tom Lane Wrote:
> Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show
> the runtime expended in each trigger when the statement is of a kind
> that has triggers.  We couldn't break down the time *within* the
> triggers, but even this info would help a lot in terms of finger
> pointing ...
> 
>   Seq Scan on ... (nn.nnn ms)
>   Trigger foo: nn.mmm ms
>   Trigger bar: nn.mmm ms
>   Total time: nn.mmm ms


So I got the latest from CVS on Friday night to see how hard it would be
to implement this, but it turns out that Tom has already committed the
improvement, so I'm in Tom's fan club today.  I imported my test dataset
and was almost immediately able to track down the cause of my
performance problem.

Thanks!
Mark Lewis


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Correcting Hash Join Estimates

2005-04-03 Thread mark . lubratt

Hello!

I posted a similar question to this one about a month ago; but, for some
reason, it never seemed to be broadcast eventhough it ended up in the
archives.  So, since I'm still struggling with this, I thought I'd
repost...

I'm trying to optimize a query and the EXPLAIN ANALYZE (see link below)
shows that some hash join row estimates are wrong by a factor of 2-3,
and upwards of 7-8. There is a corresponding mis-estimation of the
amount of time taken for these steps. The database is vacuum analyzed
nightly by a cron job.  How would I go about tightening up these
errors?  I suspect that I need to SET STATISTIC on some columns, but
how can I tell which columns?

Any help would be appreciated.

WinXP (dual Xeon 1.2GB RAM) PgSQL 8.0.1
Explain Analyze:  <http://www.indeq.com/EA.txt>
View Definition: <http://www.indeq.com/VGAUA.txt>

The largest table contains about 10,000 rows.  All tables have indexes
on their foreign keys.

Thanks!
Mark



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Correcting Hash Join Estimates

2005-04-03 Thread Mark Lubratt

On Apr 4, 2005, at 12:54 AM, Tom Lane wrote:
[EMAIL PROTECTED] writes:
I'm trying to optimize a query and the EXPLAIN ANALYZE (see link 
below)
shows that some hash join row estimates are wrong by a factor of 2-3,
and upwards of 7-8.
I doubt that improving those estimates would lead to markedly better
results.  You need to think about improving the view design instead.
What context is this view used in --- do you just do "select * from
view_get_all_user_award2", or are there conditions added to it, or
perhaps it gets joined with other things?
Yes.  I forgot to show how the query is executed...
select * from view_get_all_user_award2 where person_id = 1;

  Do you really need the
DISTINCT constraint?
Yes.
Do you really need the ORDER BY?
The customer wants an initial ordering in the displayed data.
Can you
simplify the WHERE clause at all?
I originally had a bunch of LEFT JOINs.  After reading Tow's "SQL 
Tuning", I was hoping to steer the planner into a more "optimal" plan 
by using a large where clause instead and doing the joins there (I 
think they're called implicit joins).  I was able to shave a couple of 
hundred milliseconds off the execution time by doing this.

Half a second sounds pretty decent to me for a ten-way join with a 
WHERE
clause as unstructured as that.  If you really need it to execute in 
way
less time, you're probably going to have to rethink your data
representation to make the query simpler.

Unfortunately, I'm not sure I can restructure the data.  I did consider 
materialized views.  However, they couldn't be lazy and that seemed 
like a lot of extra work for the backend for very little improvement.

If this sounds like decent performance to you...  I guess I can just 
tell the complainers that it's as good as it's going to get (barring a 
major hardware upgrade...).

Thanks!
Mark
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [PERFORM] [NOVICE] Many connections lingering

2005-04-13 Thread Mark Lewis

If there are potentially hundreds of clients at a time, then you may be
running into the maximum connection limit.

In postgresql.conf, there is a max_connections setting which IIRC
defaults to 100.  If you try to open more concurrent connections to the
backend than that, you will get a connection refused.

If your DB is fairly gnarly and your performance needs are minimal it
should be safe to increase max_connections.  An alternative approach
would be to add some kind of database broker program.  Instead of each
agent connecting directly to the database, they could pass their data to
a broker, which could then implement connection pooling.

-- Mark Lewis

On Tue, 2005-04-12 at 22:09, Slavisa Garic wrote:
> This is a serious problem for me as there are multiple users using our
> software on our server and I would want to avoid having connections
> open for a long time. In the scenario mentioned below I haven't
> explained the magnitute of the communications happening between Agents
> and DBServer. There could possibly be 100 or more Agents per
> experiment, per user running on remote machines at the same time,
> hence we need short transactions/pgsql connections. Agents need a
> reliable connection because failure to connect could mean a loss of
> computation results that were gathered over long periods of time.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] PLM pulling from CVS nightly for testing in STP

2005-04-13 Thread Mark Wong

Hi all,

Just wanted everyone to know what we're pulling CVS HEAD nightly so it
can be tested in STP now.  Let me know if you have any questions.

Tests are not automatically run yet, but I hope to remedy that
shortly.

For those not familiar with STP and PLM, here are a couple of links:

STP
http://www.osdl.org/stp/

PLM
http://www.osdl.org/plm-cgi/plm

Mark

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] PLM pulling from CVS nightly for testing in STP

2005-04-13 Thread Mark Wong

On Wed, Apr 13, 2005 at 11:35:36AM -0700, Josh Berkus wrote:
> Mark,
> 
> > Just wanted everyone to know what we're pulling CVS HEAD nightly so it
> > can be tested in STP now.  Let me know if you have any questions.
> 
> Way cool.How do I find the PLM number?   How are you nameing these?

The naming convention I'm using is postgresql-MMDD, for example
postgresql-20050413, for the anonymous cvs export from today (April
13).  I have a cronjob that'll do the export at 1AM PST8PDT.

The search page for the PLM numbers is here:
https://www.osdl.org/plm-cgi/plm?module=search

or you can use the stpbot on linuxnet.mit.edu#osdl.

Mark

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] [HACKERS] PLM pulling from CVS nightly for testing in STP

2005-04-19 Thread Mark Wong

I have dbt-2 tests automatically running against each pull from CVS
and have started to automatically compile results here:
http://developer.osdl.org/markw/postgrescvs/

I did start with a bit of a minimalistic approach, so I'm open for any
comments, feedback, etc.

Mark

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Kernel Resources and max_connections

2005-05-03 Thread Mark Kirkwood

Chris Hebrard wrote:
kern.ipc.shmmax and kern.ipc.shmmin will not stay to what I set them to.
What am I doing wrong or not doing at all?
These need to go in /etc/sysctl.conf. You might need to set shmall as well.
(This not-very-clear distinction between what is sysctl'abe and what is 
a kernel tunable is a bit of a downer).

cheers
Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Kernel Resources and max_connections

2005-05-03 Thread Mark Kirkwood

Chris Hebrard wrote:
I set the values in etc/sysctl.conf:
# $FreeBSD: src/etc/sysctl.conf,v 1.1.2.3 2002/04/15 00:44:13 dougb Exp $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#
# Added by IMP 2005-05-04
net.inet.tcp.rfc1323=1
kern.ipc.somaxconn=1024
kern.ipc.maxsockbuf=8388608
net.inet.tcp.sendspace=3217968
net.inet.tcp.recvspace=3217968
kern.ipc.semmns="272"
kern.ipc.semmni="256"
kern.ipc.shmmax="66099200"
kern.ipc.shmmin="256"
After a restart both shmmax and shmmin are now 0 and postgres failed to 
start.


Hmmm - puzzling. One point to check, did you take them out of 
/boot/loader.conf ?

Assuming so, maybe don't quote 'em (see below).
Finally you need to to set shmall, otherwise it will over(under)ride the 
shmmax setting. So try:

net.inet.tcp.rfc1323=1
kern.ipc.somaxconn=1024
kern.ipc.maxsockbuf=8388608
net.inet.tcp.sendspace=3217968
net.inet.tcp.recvspace=3217968
kern.ipc.semmns=272
kern.ipc.semmni=256
kern.ipc.shmmax=66099200
kern.ipc.shmmin=256
kern.ipc.shmall=32768



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mark Lewis

If the original paper was published in 1984, then it's been more than 20
years.  Any potential patents would already have expired, no?

-- Mark Lewis

On Tue, 2005-05-10 at 14:35, Mischa Sandberg wrote:
> Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>:
> 
> > Well, in a hash-join right now you normally end up feeding at least
> > one
> > side of the join with a seqscan. Wouldn't it speed things up
> > considerably if you could look up hashes in the hash index instead?
> 
> You might want to google on "grace hash" and "hybrid hash".
> 
> The PG hash join is the simplest possible: build a hash table in memory,
> and match an input stream against it.
> 
> *Hybrid hash* is where you spill the hash to disk in a well-designed
> way. Instead of thinking of it as building a hash table in memory, think
> of it as partitioning one input; if some or all of it fits in memory,
> all the better. The boundary condition is the same. 
> 
> The real wizard of hybrid hash has to be Goetz Graefe, who sadly has now
> joined the MS Borg. He demonstrated that for entire-table joins, hybrid
> hash completely dominates sort-merge. MSSQL now uses what he developed
> as an academic, but I don't know what the patent state is.
> 
> "Grace hash" is the original implementation of hybrid hash:
>   Kitsuregawa, M., Tanaka, H., and Moto-oka, T. (1984).
>   Architecture and Performance of Relational Algebra Machine Grace. 
> 
> 
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[PERFORM] Select performance vs. mssql

2005-05-23 Thread mark durrant

Hi,

I have some experience with MSSQL and am examining
PostgreSQL. I'm running under Windows. I like what I
see so far, but I'm hoping for some performance
advice:

1. My test database has 7 million records. 
2. There are two columns - an integer and a char
column called Day which has a random value of Mon or
Tues, etc. in it.
3. I made an index on Day.

My query is:

select count(*) from mtable where day='Mon'

Results:

1. P3 600 512MB RAM MSSQL. It takes about 4-5 secs to
run. If I run a few queries and everything is cached,
it is sometimes  just 1 second.

2. Athlon 1.3 Ghz 1GB RAM. PostgreSQL takes 7 seconds.
I have played with the buffers setting and currently
have it at 7500. At 2 it took over 20 seconds to
run.

5 seconds vs 7 isn't that big of a deal, but 1 second
vs 7 seconds is. Also, the slower performance is with
much lesser hardware.

Any ideas to try?

Thanks much,
Mark

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Select performance vs. mssql

2005-05-23 Thread mark durrant

> Post the result of this for us:
> 
> explain analyze select count(*) from mtable where
> day='Mon';
> 
> On both machines.

Hi Chris --

PostgreSQL Machine:
"Aggregate  (cost=140122.56..140122.56 rows=1 width=0)
(actual time=24516.000..24516.000 rows=1 loops=1)"
"  ->  Index Scan using "day" on mtable 
(cost=0.00..140035.06 rows=35000 width=0) (actual
time=47.000..21841.000 rows=1166025 loops=1)"
"Index Cond: ("day" = 'Mon'::bpchar)"
"Total runtime: 24516.000 ms"
(Note this took 24 seconds after fresh reboot, next
execution was 11, and execution without explain
analyze was 6.7 seconds)

MSSQL Machine:
That "Explain Analyze" command doesn't work for MSSQL,
but I did view the Query plan. 97% of it was "Scanning
a particular range of rows from a nonclustered index"

Thanks for your help --Mark

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread mark durrant

First, thanks for all the helpful replies. I've
listened to the suggestions and done some more digging
and have results:

I did show_plan_all in MSSQL and found that it was
doing an Index Scan. I've read someplace that if the
data you need is all in the index, then MSSQL has a
feature/hack where it does not have to go to the
table, it can do my COUNT using the index alone. I
think this explains the 1 second query performance.

I changed the query to also include the other column
which is not indexed. The results were MSSQL now used
a TableScan and was MUCH slower than PostgreSQL. 

I clustered the index on MSSQL and PostgreSQL and
increased buffers to 15000 on PGSQL. I saw a
noticeable performance increase on both. On the more
complicated query, PostgreSQL is now 3.5 seconds.
MSSQL is faster again doing an index scan and is at 2
seconds. Remember the MSSQL machine has a slower CPU
as well.

My interpretations:

--Given having to do a table scan, PostgreSQL seems to
be faster. The hardware on my PostrgreSQL machine is
nicer than the MSSQL one, so perhaps they are just
about the same speed with speed determined by the
disk.

--Tuning helps. Clustered index cut my query time
down. More buffers helped. 

--As Chris pointed out, how real-world is this test?
His point is valid. The database we're planning will
have a lot of rows and require a lot of summarization
(hence my attempt at a "test"), but we shouldn't be
pulling a million rows at a time.

--MSSQL's ability to hit the index only and not having
to go to the table itself results in a _big_
performance/efficiency gain. If someone who's in
development wants to pass this along, it would be a
nice addition to PostgreSQL sometime in the future.
I'd suspect that as well as making one query faster,
it would make everything else faster/more scalable as
the server load is so much less.

Thanks again,

Mark



__ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new Resources site
http://smallbusiness.yahoo.com/resources/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread mark durrant

I'm far from an expert, so this may be off-base... but
perhaps a suggestion would be to allow a hint to be
sent to the optimizer if the user doesn't care that
the result is "approximate" maybe then this wouldn't
require adding more overhead to the indexes.

MSSQL has something like this with (nolock) 
i.e. select count(*) from customers (nolock) where
name like 'Mark%' 

Regardless, I'm very impressed with PostgreSQL and I
think we're moving ahead with it.

Mark

--- Bruno Wolff III <[EMAIL PROTECTED]> wrote:
> On Tue, May 24, 2005 at 08:36:36 -0700,
>   mark durrant <[EMAIL PROTECTED]> wrote:
> > 
> > --MSSQL's ability to hit the index only and not
> having
> > to go to the table itself results in a _big_
> > performance/efficiency gain. If someone who's in
> > development wants to pass this along, it would be
> a
> > nice addition to PostgreSQL sometime in the
> future.
> > I'd suspect that as well as making one query
> faster,
> > it would make everything else faster/more scalable
> as
> > the server load is so much less.
> 
> This gets brought up a lot. The problem is that the
> index doesn't include
> information about whether the current transaction
> can see the referenced
> row. Putting this information in the index will add
> significant overhead
> to every update and the opinion of the developers is
> that this would be
> a net loss overall.

__ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new Resources site
http://smallbusiness.yahoo.com/resources/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] tuning

2005-05-30 Thread Mark Kirkwood


list wrote:

hi-

i would like to see if someone could recommend something
to make my query run faster.


Values in postgresql.conf:
shared_buffers = 1000
sort_mem is commented out
effective_cache_size is commented out
random_page_cost is commented out



I would increase shared_buffers (say 5000 - 1), and also 
effective_cache_size (say around 2 - 5 - but work out how much 
memory this box has free or cached and adjust accordingly).


From your explain output, it looks like sorting is not too much of a 
problem - so you can leave it unchanged (for this query anyway).



Here is the query in question:
select * from productvendorview where (productlistid=3 or 
productlistid=5 or productlistid=4) and (lower(item) like '9229%' or 
lower(descrip) like 'toner%') order by vendorname,item limit 100;




You might want to break this into 2 queries and union them, so you can 
(potentially) use the indexes on productlistid,lower(item) and 
productlistid, lower(descrip) separately.




This query scans 412,457 records.

Here is the EXPLAIN ANALYZE for the query:

 Limit  (cost=45718.83..45719.08 rows=100 width=108) (actual 
time=39093.636..39093.708 rows=100 loops=1)
   ->  Sort  (cost=45718.83..45727.48 rows=3458 width=108) (actual 
time=39093.629..39093.655 rows=100 loops=1)

 Sort Key: v.vendorname, p.item
 ->  Hash Join  (cost=22.50..45515.57 rows=3458 width=108) 
(actual time=95.490..39062.927 rows=2440 loops=1)

   Hash Cond: ("outer".vendorid = "inner".id)
   ->  Seq Scan on test p  (cost=0.00..45432.57 rows=3457 
width=62) (actual time=89.066..39041.654 rows=2444 loops=1)
 Filter: (((productlistid = 3) OR (productlistid = 
5) OR (productlistid = 4)) AND
  ((lower((item)::text) ~~ '9229%'::text) OR 
(lower((descrip)::text) ~~ 'toner%'::text)))
   ->  Hash  (cost=20.00..20.00 rows=1000 width=54) (actual 
time=6.289..6.289 rows=0 loops=1)
 ->  Seq Scan on vendor v  (cost=0.00..20.00 
rows=1000 width=54) (actual time=0.060..3.653 rows=2797 loops=1)

 Total runtime: 39094.713 ms
(10 rows)



I guess the relation 'test' is a copy of product (?)

Cheers

Mark



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] postgresql-8.0.1 performance tuning

2005-05-31 Thread Mark Kirkwood


Cosimo Streppone wrote:

# Config 

/etc/sysctl.conf:
kernel.shmall = 786432000
kernel.shmmax = 786432000



I think you have a problem here.
kernel.shmmax should *not* be set to an amount of RAM, but
to maximum number of shared memory pages, which on a typical linux system
is 4kb. Google around:

  



This is somewhat confusing :

kernel.shmmax is in bytes (max single segment size)
kernel.shmall is in (4k) pages (max system wide allocated segment pages)

cheers

Mark



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 951 matches

Mail list logo