Re: [PERFORM] Caching by Postgres
On Tue, Aug 23, 2005 at 02:41:39PM -0400, Donald Courtney wrote: > I mean well with this comment - > This whole issue of data caching is a troubling issue with postreSQL > in that even if you ran postgreSQL on a 64 bit address space > with larger number of CPUs you won't see much of a scale up > and possibly even a drop. I am not alone in having the *expectation* > that a database should have some cache size parameter and > the option to skip the file system. If I use oracle, sybase, mysql > and maxdb they all have the ability to size a data cache and move > to 64 bits. > Is this a crazy idea - that a project be started to get this adopted? > Is it > too big and structural to contemplate? > From one who likes postgreSQL Hey Donald. :-) This is an operating system issue, not a PostgreSQL issue. If you have more physical memory than fits in 32-bit addresses, and your operating system isn't using this extra memory to cache files (or anything else), than your OS is what I would consider to be broken (or at the very least, not designed for a 64-bit host). The only questions that can be asked here is - 1) can PostgreSQL do a better job than the OS at best utilizing system RAM, and 2) if so, is the net gain worth the added complexity to PostgreSQL? I happen to think that yes, PostgreSQL can do a better job than most OS's, as it has better information to make decisions as to which pages are worth keeping, and which are not, but no, it isn't worth the effort until PostgreSQL developers start running out of things to do. Buy your 64-bit platforms - but if page caching is your concern, 1) ensure that you really have more physical memory than can fit in 32 bits, and 2) ensure that your operating system is comfortable caching data pages from files above the 32-bit mark. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching by Postgres
On Wed, Aug 24, 2005 at 09:21:12AM -0400, Donald Courtney wrote: > I built postgreSQL 8.1 64K bit on solaris 10 a few months ago > and side by side with the 32 bit postgreSQL build saw no improvement. > In fact the 64 bit result was slightly lower. I've had this sort of argument with a friend of mine who works at a retail computer sales company who always tries to pitch 64-bit platforms to me (I don't have one yet). There are a few issues in here that are hard to properly detach to allow for a fair comparison. The first, to always remember - is that the move from 64-bits to 32-bits doesn't come for free. In a real 64-bit system with a 64-bit operating system, and 64-bit applications, pointers are now double their 32-bit size. This means more bytes to copy around memory, and in an extreme case, has the potential to approach halfing both the memory latency to access many such pointers from RAM, and half the effective amount of RAM. In real world cases, not everything is a pointer, so this sort of performance degradation is doubtful - but it is something to keep in mind. In response to this, it appears that, at least on the Intel/AMD side of things, they've increased the bandwidth on the motherboard, and allowed for faster memory to be connected to the motherboard. They've increased the complexity of the chip, to allow 64-bit register operations to be equivalent in speed to 32-bit register operations. I have no idea what else they've done... :-) So, it may be difficult to properly compare a 32-bit system to a 64-bit system. Even if the Ghz on the chip appears equal, it isn't the same chip, and unless it is the exact same make, product and version of the motherboard, it may not be a fair compairson. Turning support for 32-bit on or off, and using a kernel that is only 32-bit may give good comparisons - but with the above explanation, I would expect the 32-bit application + kernel to out-perform the 64-bit application. So then we move on to what 64-bit is really useful for. Obviously, there is the arithmetic. If you were previously doing 64-bit arithmetic through software, you will notice an immediate speed improvement when doing it through hardware instead. If you have a program that is scanning memory in any way, it may benefit from 64-bit instructions (for example - copying data 64-bit words at a time instead of 32-bit words at a time). PostgreSQL might benefit slightly from either of these, slightly balancing the performance degradation of using more memory to store the pointers, and more memory bandwidth the access the pointers. The real benefit of 64-bit is address space. From the kernel perspective, it means that more programs, or bigger programs can run at once. From the application perspective, it means your application can use more than 32-bits of address space. For programs that make extensive use of mmap(), this can be a necessity. They are mapping very large files into their own address space. This isn't a performance boost, as much as it is a 'you can't do it', if the files mmap()'ed at the same time, will not fit within 32-bits of address space. This also becomes, potentially, a performance degradation, as the system is now having to manage applications that have very large page tables. Page faults may become expensive. PostgreSQL uses read(), instead of mmap(), and uses <2 Gbyte files. PostgreSQL doesn't require the additional address space for normal operation. If, however, you happen to have a very large amount of physical memory - more memory than is supported by a 32-bit system, but is supported by your 64-bit system, then the operating system should be able to use this additional physical memory to cache file system data pages, which will benefit PostgreSQL if used with tables that are larger than the memory supported by your 32-bit system, and which have queries which require more pages than the memory supported by your 32-bit system to be frequently accessed. If you have a huge database, with many clients accessing the data, this would be a definate yes. With anything less, it is a maybe, or a probably not. I've been looking at switching to 64-bit, mostly to benefit from the better motherboard bandwidth, and just to play around. I'm not expecting to require the 64-bit instructions. Hope this helps, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching by Postgres
On Wed, Aug 24, 2005 at 02:47:09PM -0400, Alan Stange wrote: > At least on Sparc processors, v8 and newer, any double precision math > (including longs) is performed with a single instruction, just like for > a 32 bit datum. Loads and stores of 8 byte datums are also handled via > a single instruction. The urban myth that 64bit math is > different/better on a 64 bit processor is just that; yes, some lower > end processors would emulate/trap those instructions but that an > implementation detail, not architecture. It isn't an urban myth that 64-bit math on a 64-bit processor is faster, at least if done using registers. It definately is faster. It may be an urban myth, though, that most applications perform a sufficient amount of 64-bit arithmetic to warrant the upgrade. The benefit may be lost in the noise for an application such as PostgreSQL. It takes, effectively, infinately longer to access a disk page, than to increment a 64-bit integer in software. For the lower end processors that emulate/trap these instructions, they are being performed in software, along with the overhead of a trap, and are therefore not a single instruction any more. We are coming at this from different sides (which is good - perspective is always good :-) ). From the Intel/AMD side of things, ALL non 64-bit platforms are 'lower end processors', and don't emulate/trap the instructions as they didn't exist (at least not yet - who knows what clever and sufficiently motivated people will do :-) ). > >If, however, you happen to have a very large amount of physical memory > >- more memory than is supported by a 32-bit system, but is supported > >by your 64-bit system, then the operating system should be able to use > >this additional physical memory to cache file system data pages, which > >will benefit PostgreSQL if used with tables that are larger than the > >memory supported by your 32-bit system, and which have queries which > >require more pages than the memory supported by your 32-bit system to > >be frequently accessed. If you have a huge database, with many clients > >accessing the data, this would be a definate yes. With anything less, > >it is a maybe, or a probably not. > Solaris, at least, provided support for far more than 4GB of physical > memory on 32 bit kernels. A newer 64 bit kernel might be more > efficient, but that's just because the time was taken to support large > page sizes and more efficient data structures. It's nothing intrinsic > to a 32 vs 64 bit kernel. Hehe. That's why I was so careful to qualify my statements. :-) But yeah, I agree. It's a lot of hype, for not much gain (and some loss, depending on what it is being used for). I only want one because they're built better, and because I want to play around. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Caching by Postgres
On Wed, Aug 24, 2005 at 05:09:04PM -0400, Alan Stange wrote: > The older 32bit RISC processors do have 64 bit registers, ALUs and > datapaths, and they are marketed toward high end scientific computing, > and you're claiming that such a processor is slower than one which has > the addition of 64 bit pointers added to it? No. I'm claiming that you are talking about a hybrid 64/32 processor, and that this isn't fair to declare that 64-bit arithmetic units don't provide benefit for 64-bit math. :-) > As an example, an UltraSparc running a 32 bit kernel+application will > have the same double precision floating point performance as one > running a 64bit kernel+application (except for the additional FP > registers in the 64bit API). For a function like daxpy, it's the exact > same hardware running the exact same instructions! So why do you think > the performance would be different? Double precision floating point isn't 64-bit integer arithmetic. I think this is all a little besides the point. If you point is that the SPARC was designed well - I agree with you. I won't agree that a SPARC with 64-bit registers should be considered a 32-bit machine. The AMD 64-bit machines come in two forms as well - the ones that allow you to use the 64-bit integer registers (not floating point! those are already 80-bit!), and the ones that allow you to address more memory. I wouldn't consider either to be a 32-bit CPU, although they will allow 32-bit applications to run fine. > I believe the IBM Power processors also upped everything to double > precision internally because of some details of the "multiply-add fused" > instructions. It's been a few years since I taught H&P to CS > undergrads, but I'm fairly sure the details are all the same for MIPS > processors as well. Smart design, that obscures the difference - but doesn't make the difference a myth. If it's already there, then it's already there, and we can't talk as if it isn't. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] When to do a vacuum for highly active table
On Tue, Aug 30, 2005 at 05:29:17PM -0400, Tom Lane wrote: > Markus Benne <[EMAIL PROTECTED]> writes: > > We have a highly active table that has virtually all > > entries updated every 5 minutes. Typical size of the > > table is 50,000 entries, and entries have grown fat. > ... > > We are thinking of splitting the table in two: the > > part the updates often, and the part the updates > > infrequently as we suspect that record size impacts > > vacuum. > You just said that virtually all rows update constantly --- where's > the "infrequent" part? I think he means splitting it vertically, instead of horizontally, and it sounds like an excellent idea, if a large enough portion of each record is in fact mostly fixed. Otherwise, PostgreSQL is copying data multiple times, only to have the data expire as part of a dead row. I've already started to notice such issues with postgresql - but more because I'm using low-end hardware, and I'm projecting the effect for when our database becomes much larger with much higher demand on the database. This is the sort of scenario where a database without transactional integrity would significantly out-perform one designed around it. If records are fixed sized, and updated in place, these problems would occur far less often. Is it heresy to suggest MySQL in here? :-) I switched from MySQL to PostgreSQL several months ago, and haven't looked back - but they do work differently, and for certain uses, one can destroy the other. Using a MyISAM table would be the way I would go with this sort of problem. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 'Real' auto vacuum?
On Wed, Aug 31, 2005 at 10:21:20AM +1200, Ralph Mason wrote: > This is a wild and crazy thought which I am sure is invalid for some > good reason. > > But why can't postgres just vacuum itself as it goes along? > > When a row is orphaned it's added to a list of possibly available rows. > When a new row is needed the list of possible rows is examined and the > first one with a transaction id less then the lowest running transaction > id is chosen to be the new row? These rows can be in a heap so it's > really fast to find one. > > Like magic - no more vacuuming. No more holes for people to fall into. Yes please. :-) > Is this an oversimplification of the problem? But, yeah. It's probably not that easy, especially with really big databases. Where is this free list stored? How efficient is it to keep track of the lowest running transaction at all times? How does one synchronize access to this free list, to ensure that processes don't block up waiting for access to the free list? Is the fre list journalled to prevent corruption, and the accidental re-use of a still in use row? And, there would be a cost to scanning this list on every insert or update. As an outsider (like you?) I see the current model as a design flaw as well. A neat and tidy model on paper. Not so nice in real life. The need to vacuum in batch mode, to keep the database from dying, seems intuitively bad. I think there must be answers to this problem. Even simple optimizations, such as defining a table such that any delete or update within a table, upon commit, will attempt to vacuum just the rows that should not be considered free for any new transactions. If it's in use by an active transaction, oh well. It can be picked up by a batch run of vacuum. If it's free though - let's do it now. I think any optimizations we come up with, will be more happily accepted with a working patch that causes no breakage... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Is There Any Way ....
n. Memory management, or disk management, is "good enough" as provided by decent operating systems, and the itch just isn't bad enough to scratch yet. They remain unconvinced that the gain in performance, would be worth the cost of maintaining this extra complexity in the code base. If you believe the case can be made, it is up to you to make it. Cheers! mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] A Better External Sort?
On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote: > On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote: > > What? strlen is definitely not in the kernel, and thus won't count as > > system time. > System time on Linux includes time spent in glibc routines. Do you have a reference for this? I believe this statement to be 100% false. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
On Fri, Oct 07, 2005 at 09:20:59PM -0700, Luke Lonergan wrote: > On 10/7/05 5:17 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > On Fri, Oct 07, 2005 at 04:55:28PM -0700, Luke Lonergan wrote: > >> On 10/5/05 5:12 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote: > >>> What? strlen is definitely not in the kernel, and thus won't count as > >>> system time. > >> System time on Linux includes time spent in glibc routines. > > Do you have a reference for this? > > I believe this statement to be 100% false. > How about 99%? OK, you're right, I had this confused with the profiling > problem where glibc routines aren't included in dynamic linked profiles. Sorry to emphasize the 100%. It wasn't meant to judge you. It was meant to indicate that I believe 100% of system time is accounted for, while the system call is actually active, which is not possible while glibc is active. I believe the way it works, is that a periodic timer interrupt increments a specific integer every time it wakes up. If it finds itself within the kernel, it increments the system time for the active process, if it finds itself outside the kernel, it incremenets the user time for the active process. > Back to the statements earlier - the output of time had much of time for a > dd spent in system, which means kernel, so where in the kernel would that be > exactly? Not really an expert here. I only play around. At a minimum, their is a cost to switching from user context to system context and back, and then filling in the zero bits. There may be other inefficiencies, however. Perhaps /dev/zero always fill in a whole block (8192 usually), before allowing the standard file system code to read only one byte. I dunno. But, I see this oddity too: $ time dd if=/dev/zero of=/dev/zero bs=1 count=1000 1000+0 records in 1000+0 records out dd if=/dev/zero of=/dev/zero bs=1 count=1000 4.05s user 11.13s system 94% cpu 16.061 total $ time dd if=/dev/zero of=/dev/zero bs=10 count=100 100+0 records in 100+0 records out dd if=/dev/zero of=/dev/zero bs=10 count=100 0.37s user 1.37s system 100% cpu 1.738 total >From my numbers, it looks like 1 byte reads are hard in both the user context and the system context. It looks almost linearly, even: $ time dd if=/dev/zero of=/dev/zero bs=100 count=10 10+0 records in 10+0 records out dd if=/dev/zero of=/dev/zero bs=100 count=10 0.04s user 0.15s system 95% cpu 0.199 total $ time dd if=/dev/zero of=/dev/zero bs=1000 count=1 1+0 records in 1+0 records out dd if=/dev/zero of=/dev/zero bs=1000 count=1 0.01s user 0.02s system 140% cpu 0.021 total At least some of this gets into the very in-depth discussions as to whether kernel threads, or user threads, are more efficient. Depending on the application, user threads can switch many times faster than kernel threads. Other parts of this may just mean that /dev/zero isn't implemented optimally. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment
On Fri, Oct 07, 2005 at 12:48:16PM +0200, Steinar H. Gunderson wrote: > On Fri, Oct 07, 2005 at 11:24:05AM +0200, Cestmir Hybl wrote: > > Isn't it possible (and reasonable) for these environments to keep track of > > whether there is a transaction in progress with update to given table and > > if not, use an index scan (count(*) where) or cached value (count(*)) to > > perform this kind of query? > Even if there is no running update, there might still be dead rows in the > table. In any case, of course, a new update could always be occurring while > your counting query was still running. I don't see this being different from count(*) as it is today. Updating a count column is certainly clever. If using a trigger, perhaps it would allow the equivalent of: select count(*) from table for update; :-) Cheers, mark (not that this is necessarily a good thing!) -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Best way to get all different values in a column
On Fri, Oct 14, 2005 at 06:02:56PM +0200, [EMAIL PROTECTED] wrote: > Does anyone here know what is the most efficient way to list all > different values of a given column with low cardinality ? For instance > I have a table with columns DAY, NAME, ID, etc. The table is updated > about each week with thousands of records with the same (current) date. > Now I would like to list all values for DAY, only if possible without > scanning all the table each time I submit the request. > I can think of: > ... > Solution 6: Store the values in a separate table, recreated each time > TABLE is updated. I've found a variant on 6 to work well for this problem domain. Why not insert into the separate table, when you insert into the table? Either as a trigger, or in your application. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] prepared transactions that persist across sessions?
Hey all. Please point me to a place I should be looking if this is a common question that has been debated periodically and at great length already. :-) I have a complex query. It's a few Kbytes large, and yes, I've already worked on reducing it to be efficient in terms of database design, and minimizing the expressions used to join the tables. Running some timing tests, I've finding that the query itself, when issued in full, takes around 60 milliseconds to complete on modest hardware. If prepared, and then executed, however, it appears to take around 60 milliseconds to prepare, and 20 milliseconds to execute. I'm not surprised. PostgreSQL is very likely calculating the costs of many, many query plans. This is telling me that the quickest method of me to accelerate these queries, is to have them pre-select a query plan, and to use it. Unfortunately, I'll only be executing this query once per session, so "PREPARE" seems to be out of the question. I am using PHP's PDO PGSQL interface - I haven't read up enough on it to determine whether a persistent connection can re-use server-side prepared queries as an option. Anybody know? My read of the PLPGSQL documentation seems to suggest that it will do some sort of query plan caching. Is there better documentation on this that would explain exactly how it works? What is the best way to define a PLPGSQL function that will return a set of records? Is RETURNS SETOF the only option in this regard? It seems inefficient to me. Am I doing it wrong? Not understanding it? For very simple queries, it seems that using PLPGSQL and SELECT INTO, RETURN, and then SELECT * FROM F(arg)" actually slows down the query slightly. It wasn't giving me much faith, and I wanted to pick up some people's opinions befor egoing further. What is the reason that SQL and/or PostgreSQL have not added server-defined prepared statements? As in, one defines a server-defined prepared statement, and all sessions that have permission can execute the prepared statement. Is this just an issue of nobody implementing it? Or was there some deeper explanation as to why this would be a bad thing? My reading of views, are that views would not accelerate the queries. Perhaps the bytes sent to the server would reduce, however, the cost to prepare, and execute the statement would be similar, or possibly even longer? I'm thinking I need some way of defined a server side query, that takes arguments, that will infrequently prepare the query, such that the majority of the time that it is executed, it will not have to choose a query plan. Am I missing something obvious? :-) Thanks, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 8.x index insert performance
On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > if that index is causing the problem, you may want to consider setting > up partial index to exclude null values. Hey all. Pardon my ignorance. :-) I've been trying to figure out whether null values are indexed or not from the documentation. I was under the impression, that null values are not stored in the index. Occassionally, though, I then see a suggestion such as the above, that seems to indicate to me that null values *are* stored in the index, allowing for the 'exclude null values' to have effect? Which is it? :-) Thanks, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 8.x index insert performance
On Mon, Oct 31, 2005 at 03:27:31PM -0500, Merlin Moncure wrote: > > On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote: > > > if that index is causing the problem, you may want to consider setting > > > up partial index to exclude null values. > > Hey all. > > Pardon my ignorance. :-) > > I've been trying to figure out whether null values are indexed or not from > > the documentation. I was under the impression, that null values are not > > stored in the index. Occassionally, though, I then see a suggestion such > > as the above, that seems to indicate to me that null values *are* stored > > in the index, allowing for the 'exclude null values' to have effect? > > Which is it? :-) > I think I'm the ignorant one...do explain on any lookup on an indexed > field where the field value is null and you get a seqscan. Nahhh... I think the documentation could use more explicit or obvious explanation. Or, I could have checked the source code to see. In any case, I expect we aren't the only ones that lacked confidence. Tom was kind enough to point out that null values are stored. I expect that the seqscan is used if the null values are not selective enough, the same as any other value that isn't selective enough. Now we can both have a little more confidence! :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] WAL sync behaviour
On Thu, Nov 10, 2005 at 11:39:34AM -0500, Tom Lane wrote: > No, Mike is right: for WAL you shouldn't need any journaling. This is > because we zero out *and fsync* an entire WAL file before we ever > consider putting live WAL data in it. During live use of a WAL file, > its metadata is not changing. As long as the filesystem follows > the minimal rule of syncing metadata about a file when it fsyncs the > file, all the live WAL files should survive crashes OK. Yes, with emphasis on the zero out... :-) > You do need metadata journaling for all non-WAL PG files, since we don't > fsync them every time we extend them; which means the filesystem could > lose track of which disk blocks belong to such a file, if it's not > journaled. I think there may be theoretical problems with regard to the ordering of the fsync operation, for files that are not pre-allocated. For example, if a new block is allocated - there are two blocks that need to be updated. The indirect reference block (or inode block, if block references fit into the inode entry), and the block itself. If the indirect reference block is written first, before the data block, the state of the disk is inconsistent. This would be a crash during the fsync() operation. The metadata journalling can ensure that the data block is allocated first, and then all the necessary references updated, allowing for the operation to be incomplete and rolled back, or committed in full. Or, that is my understanding, anyways, and this is why I would not use ext2 for the database, even if it was claimed that fsync() was used. For WAL, with pre-allocated zero blocks? Sure. Ext2... :-) mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Opinions on Raid
Hope you don't mind, Ron. This might be splitting hairs. On Tue, Feb 27, 2007 at 11:05:39AM -0500, Ron wrote: > The real CPU overhead when using SW RAID is when using any form of SW > RAID that does XOR operations as part of writes (RAID 5, 6, 50, ..., > etc). At that point, you are essentially hammering on the CPU just > as hard as you would on a dedicated RAID controller... ...and the > dedicated RAID controller probably has custom HW helping it do this > sort of thing more efficiently. > That being said, SW RAID 5 in this sort of scenario can be reasonable > if you =dedicate= a CPU core to it. So in such a system, your "n" > core box is essentially a "n-1" core box because you have to lock a > core to doing nothing but RAID management. I have an issue with the above explanation. XOR is cheap. It's one of the cheapest CPU instructions available. Even with high bandwidth, the CPU should always be able to XOR very fast. This leads me to the belief that the RAID 5 problem has to do with getting the data ready to XOR. With RAID 5, the L1/L2 cache is never large enoguh to hold multiple stripes of data under regular load, and the system may not have the blocks in RAM. Reading from RAM to find the missing blocks shows up as CPU load. Reading from disk to find the missing blocks shows up as system load. Dedicating a core to RAID 5 focuses on the CPU - which I believe to be mostly idle waiting for a memory read. Dedicating a core reduces the impact, but can't eliminate it, and the cost of a whole core to sit mostly idle waiting for memory reads is high. Also, any reads scheduled by this core will affect the bandwidth/latency for other cores. Hardware RAID 5 solves this by using its own memory modules - like a video card using its own memory modules. The hardware RAID can read from its own memory or disk all day and not affect system performance. Hopefully it has plenty of memory dedicated to holding the most frequently required blocks. > SW RAID 5 etc in usage scenarios involving far more reads than writes > and light write loads can work quite well even if you don't dedicate > a core to RAID management, but you must be careful about workloads > that are, or that contain parts that are, examples of the first > scenario I gave. If you have any doubts about whether you are doing > too many writes, dedicate a core to RAID stuff as in the first scenario. I found software RAID 5 to suck such that I only use it for backups now. It seemed that Linux didn't care to read-ahead or hold blocks in memory for too long, and preferred to read and then write. It was awful. RAID 5 doesn't seem like a good option even with hardware RAID. They mask the issues with it behind a black box (dedicated hardware). The issues still exist. Most of my system is RAID 1+0 now. I have it broken up. Rarely read or written files (long term storage) in RAID 5, The main system data on RAID 1+0. The main system on RAID 1. A larger build partition on RAID 0. For a crappy server in my basement, I've very happy with my software RAID performance now. :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance of count(*)
On Thu, Mar 22, 2007 at 10:18:10AM -0400, Michael Stone wrote: > IIRC, that's basically what you get with the mysql count anyway, since > there are corner cases for results in a transaction. Avoiding those > cases is why the postgres count takes so long; sometimes that's what's > desired and sometimes it is not. Adding to this point: In any production system, the count presented to the user is usually wrong very shortly after it is displayed anyways. Transactions in the background or from other users are adding or removing items, perhaps even before the count reaches the user's display. The idea of transaction-safety for counts doesn't apply in this case. Both the transaction and the number are complete before the value is displayed. In my own systems, I rarely use count(*) for anything except user visible results. For the PostgreSQL system I use, I keep a table of counts, and lock the row for update when adding or removing items. This turns out to be best in this system anyways, as I need my new rows to be ordered, and locking the 'count' row lets me assign a new sequence number for the row. (Don't want to use SEQUENCE objects, as there could as the rows are [key, sequence, data], with thousands or more keys) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SCSI vs SATA
On Wed, Apr 04, 2007 at 08:50:44AM -0700, Joshua D. Drake wrote: > >difference. OTOH, the SCSI discs were way less reliable than the SATA > >discs, that might have been bad luck. > Probably bad luck. I find that SCSI is very reliable, but I don't find > it any more reliable than SATA. That is assuming correct ventilation etc... Perhaps a basic question - but why does the interface matter? :-) I find the subject interesting to read about - but I am having trouble understanding why SATAII is technically superior or inferior to SCSI as an interface, in any place that counts. Is the opinion being expressed that manufacturers who have decided to move to SATAII are not designing for the enterprise market yes? I find myself doubting this... Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SCSI vs SATA
On Sat, Apr 07, 2007 at 08:46:33PM -0400, Ron wrote: > The Google and CMU studies are =not= based on data drawn from > businesses where the lesser consequences of an outage are losing > $10Ks or $100K per minute... ...and where the greater consequences > include the chance of loss of human life. > Nor are they based on businesses that must rely exclusively on highly > skilled and therefore expensive labor. Google up time seems to be quite good. Reliability can be had from trusting more reputable (and usually more expensive) manufacturers and product lines, or it can be had through redundancy. The "I" in RAID. I recall reading the Google study before, and believe I recall it lacking in terms of how much it costs to pay the employees to maintain the system. It would be interesting to know whether the inexpensive drives require more staff time to be spent on it. Staff time can easily become more expensive than the drives themselves. I believe there are factors that exist that are not easy to calculate. Somebody else mentioned how Intel was not the cleanest architecture, and yet, how Intel architecture makes up the world's fastest machines, and the cheapest machines per work to complete. There is a game of numbers being played. A manufacturer that sells 10+ million units has the resources, the profit margin, and the motivation, to ensure that their drives are better than a manufacturer that sells 100 thousand units. Even if the manufacturer of the 100 K units spends double in development per unit, they would only be spending 1/50 as much as the manufacturer who makes 10+ million units. As for your experience - no disrespect - but if your experience is over the last 25 years, then you should agree that most of those years are no longer relevant in terms of experience. SATA has only existed for 5 years or less, and is only now stabilizing in terms of having the different layers of a solution supporting the features like command queuing. The drives of today have already broken all sorts of rules that people assumed were not possible to break 5 years ago, 10 years ago, or 20 years ago. The playing field is changing. Even if your experience is correct or valid today - it may not be true tomorrow. The drives of today, I consider to be incredible in terms of quality, reliability, speed, and density. All of the major brands, for desktops or servers, IDE, SATA, or SCSI, are amazing compared to only 10 years ago. To say that they don't meet a standard - which standard? Everything has a cost. Having a drive never break, will have a very large cost. It will cost more to turn 99.9% to 99.99%. Given that the products will never be perfect, perhaps it is valid to invest in a low-cost fast-to-implement recovery solution, that will assumes that some number of drives will fail in 6 months, 1 year, 2 years, and 5 years. Assume they will fail, because regardless of what you buy - their is a good chance that they *will* fail. Paying double price for hardware, with a hope that they will not fail, may not be a good strategy. I don't have a conclusion here - only things to consider. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] LIKE search and performance
On Thu, May 24, 2007 at 02:02:40PM -0700, Mark Lewis wrote: > PG could scan the index looking for matches first and only load the > actual rows if it found a match, but that could only be a possible win > if there were very few matches, because the difference in cost between a > full index scan and a sequential scan would need to be greater than the > cost of randomly fetching all of the matching data rows from the table > to look up the visibility information. > So yes it would be possible, but the odds of it being faster than a > sequential scan are small enough to make it not very useful. > And since it's basically impossible to know the selectivity of this kind > of where condition, I doubt the planner would ever realistically want to > choose that plan anyway because of its poor worst-case behavior. What is a real life example where an intelligent and researched database application would issue a like or ilike query as their primary condition in a situation where they expected very high selectivity? Avoiding a poor worst-case behaviour for a worst-case behaviour that won't happen doesn't seem practical. What real life examples, that could not be implemented a better way, would behave poorly if like/ilike looked at the index first to filter? I don't understand... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] LIKE search and performance
On Fri, May 25, 2007 at 09:13:25AM +0100, Richard Huxton wrote: > [EMAIL PROTECTED] wrote: > >>And since it's basically impossible to know the selectivity of this kind > >>of where condition, I doubt the planner would ever realistically want to > >>choose that plan anyway because of its poor worst-case behavior. > >What is a real life example where an intelligent and researched > >database application would issue a like or ilike query as their > >primary condition in a situation where they expected very high > >selectivity? > >Avoiding a poor worst-case behaviour for a worst-case behaviour that > >won't happen doesn't seem practical. > But if you are also filtering on e.g. date, and that has an index with > good selectivity, you're never going to use the text index anyway are > you? If you've only got a dozen rows to check against, might as well > just read them in. > The only time it's worth considering the behaviour at all is *if* the > worst-case is possible. I notice you did not provide a real life example as requested. :-) This seems like an ivory tower restriction. Not allowing best performance in a common situation vs not allowing worst performance in a not-so-common situation. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] LIKE search and performance
On Fri, May 25, 2007 at 04:35:22PM +0100, Richard Huxton wrote: > >I notice you did not provide a real life example as requested. :-) > OK - any application that allows user-built queries: foo> > Want another? Any application that has a "search by name" box - users > can (and do) put one letter in and hit enter. > Unfortunately you don't always have control over the selectivity of > queries issued. The database has 10 million records. The user enters "bar" and it translates to "%bar%". You are suggesting that we expect bar to match 1 million+ records? :-) I hope not. I would define this as bad process. I would also use "LIMIT" to something like "100". > >This seems like an ivory tower restriction. Not allowing best performance > >in a common situation vs not allowing worst performance in a not-so-common > >situation. > What best performance plan are you thinking of? I'm assuming we're > talking about trailing-wildcard matches here, rather than "contains" > style matches. "Trailing-wildcard" already uses B-Tree index, does it not? I am speaking of contains, as contains is the one that was said to require a seqscan. I am questioning why it requires a seqscan. The claim was made that with MVCC, the index is insufficient to check for visibility and that the table would need to be accessed anyways, therefore a seqscan is required. I question whether a like '%bar%' should be considered a high selectivity query in the general case. I question whether a worst case should be assumed. Perhaps I question too much? :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] ECC RAM really needed?
On Sat, May 26, 2007 at 08:43:15AM -0400, Michael Stone wrote: > On Fri, May 25, 2007 at 06:45:15PM -0700, Craig James wrote: > >We're thinking of building some new servers. We bought some a while back > >that have ECC (error correcting) RAM, which is absurdly expensive compared > >to the same amount of non-ECC RAM. Does anyone have any real-life data > >about the error rate of non-ECC RAM, and whether it matters or not? In my > >long career, I've never once had a computer that corrupted memory, or at > >least I never knew if it did. > ...because ECC RAM will correct single bit errors. FWIW, I've seen *a > lot* of single bit errors over the years. Some systems are much better > about reporting than others, but any system will have occasional errors. > Also, if a stick starts to go bad you'll generally be told about with > ECC memory, rather than having the system just start to flake out. First: I would use ECC RAM for a server. The memory is not significantly more expensive. Now that this is out of the way - I found this thread interesting because although it talked about RAM bit errors, I haven't seen reference to the significance of RAM bit errors. Quite a bit of memory is only rarely used (sent out to swap or flushed before it is accessed), or used in a read-only capacity in a limited form. For example, if searching table rows - as long as the row is not selected, and the bit error is in a field that isn't involved in the selection criteria, who cares if it is wrong? So, the question then becomes, what percentage of memory is required to be correct all of the time? I believe the estimates for bit error are high estimates with regard to actual effect. Stating that a bit may be wrong once every two weeks does not describe effect. In my opinion, software defects have a similar estimate for potential for damage to occur. In the last 10 years - the only problems with memory I have ever successfully diagnosed were with cheap hardware running in a poor environment, where the problem became quickly obvious, to the point that the system would be unusable or the BIOS would refuse to boot with the broken memory stick. (This paragraph represents the primary state of many of my father's machines :-) ) Replacing the memory stick made the problems go away. In any case - the word 'cheap' is significant in the above paragraph. non-ECC RAM should be considered 'cheap' memory. It will work fine most of the time and most people will never notice a problem. Do you want to be the one person who does notice a problem? :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] setting up raid10 with more than 4 drives
On Wed, May 30, 2007 at 08:51:45AM -0700, Luke Lonergan wrote: > > This is standard stuff, very well proven: try googling 'self healing zfs'. > The first hit on this search is a demo of ZFS detecting corruption of one of > the mirror pair using checksums, very cool: > > http://www.opensolaris.org/os/community/zfs/demos/selfheal/;jsessionid=52508 > D464883F194061E341F58F4E7E1 > > The bad drive is pointed out directly using the checksum and the data > integrity is preserved. One part is corruption. Another is ordering and consistency. ZFS represents both RAID-style storage *and* journal-style file system. I imagine consistency and ordering is handled through journalling. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] setting up raid10 with more than 4 drives
On Thu, May 31, 2007 at 01:28:58AM +0530, Rajesh Kumar Mallah wrote: > i am still not clear what is the best way of throwing in more > disks into the system. > does more stripes means more performance (mostly) ? > also is there any thumb rule about best stripe size ? (8k,16k,32k...) It isn't that simple. RAID1 should theoretically give you the best read performance. If all you care about is read, then "best performance" would be to add more mirrors to your array. For write performance, RAID0 is the best. I think this is what you mean by "more stripes". This is where RAID 1+0/0+1 come in. To reconcile the above. Your question seems to be: I have a RAID 1+0/0+1 system. Should I add disks onto the 0 part of the array? Or the 1 part of the array? My conclusion to you would be: Both, unless you are certain that you load is scaled heavily towards read, in which case the 1, or if scaled heavily towards write, then 0. Then comes the other factors. Do you want redundancy? Then you want 1. Do you want capacity? Then you want 0. There is no single answer for most people. For me, stripe size is the last decision to make, and may be heavily sensitive to load patterns. This suggests a trial and error / benchmarking requirement to determine the optimal stripe size for your use. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Autodetect of software RAID1+0 fails
> On Fri, Jun 01, 2007 at 10:57:56AM -0700, Craig James wrote: > > The Linux kernel doesn't properly detect my software RAID1+0 when I boot > > up. It detects the two RAID1 arrays, the partitions of which are marked > > properly. But it can't find the RAID0 on top of that, because there's no > > corresponding device to auto-detect. The result is that it creates > > /dev/md0 and /dev/md1 and assembles the RAID1 devices on bootup, but > > /dev/md2 isn't created, so the RAID0 can't be assembled at boot time. Hi Craig: I had the same problem for a short time. There *is* a device to base the RAID0 off, however, it needs to be recursively detected. mdadm will do this for you, however, if the device order isn't optimal, it may need some help via /etc/mdadm.conf. For a while, I used something like: DEVICE partitions ... ARRAY /dev/md3 level=raid0 num-devices=2 UUID=10d58416:5cd52161:7703b48e:cd93a0e0 ARRAY /dev/md5 level=raid1 num-devices=2 UUID=1515ac26:033ebf60:fa5930c5:1e1f0f12 ARRAY /dev/md6 level=raid1 num-devices=2 UUID=72ddd3b6:b063445c:d7718865:bb79aad7 My symptoms were that it worked where started from user space, but failed during reboot without the above hints. I believe if I had defined md5 and md6 before md3, it may have worked automatically without hints. On Fri, Jun 01, 2007 at 11:35:01PM +0200, Steinar H. Gunderson wrote: > Either do your md discovery in userspace via mdadm (your distribution can > probably help you with this), or simply use the raid10 module instead of > building raid1+0 yourself. I agree with using the mdadm RAID10 support. RAID1+0 has the flexibility of allowing you to fine-control the RAID1 vs RAID0 if you want to add disks later. RAID10 from mdadm has the flexibility that you don't need an even number of disks. As I don't intend to add disks to my array - the RAID10 as a single layer, with potentially better intelligence in terms of performance, appeals to me. They both worked for me - but I am sticking with the single layer now. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Question about SQL performance
On Mon, Jun 04, 2007 at 11:18:30PM -0400, Jason Lustig wrote: > I have some questions about the performance of certain types of SQL > statements. > > What sort of speed increase is there usually with binding parameters > (and thus preparing statements) v. straight sql with interpolated > variables? Will Postgresql realize that the following queries are > effectively the same (and thus re-use the query plan) or will it > think they are different? > > SELECT * FROM mytable WHERE item = 5; > SELECT * FROM mytable WHERE item = 10; > > Obviously to me or you they could use the same plan. From what I > understand (correct me if I'm wrong), if you use parameter binding - > like "SELECT * FROM mytable WHERE item = ?" - Postgresql will know > that the queries can re-use the query plan, but I don't know if the > system will recognize this with above situation. Although they could use the same plan, it is possible that using the same plan is non-optimal. For example, if I know that 99% of the table contains item = 5, but only 1% of the table contains item = 10, then the 'best plan' may be a sequential scan for item = 5, but an index scan for item = 10. In the case of a prepared query, PostgreSQL will pick a plan that will be good for all values, which may not be best for specific queries. You save parsing time and planning time, but may risk increasing execution time. > Also, what's the difference between prepared statements (using > PREPARE and EXECUTE) and regular functions (CREATE FUNCTION)? How do > they impact performance? From what I understand there is no exact > parallel to stored procedures (as in MS SQL or oracle, that are > completely precompiled) in Postgresql. At the same time, the > documentation (and other sites as well, probably because they don't > know what they're talking about when it comes to databases) is vague > because PL/pgSQL is often said to be able to write stored procedures > but nowhere does it say that PL/pgSQL programs are precompiled. I think you can find all of these answers in the documentation, including my comments about prepared queries. Does it matter if the program is precompiled? I believe it is, but why would it matter? Are you addressing a real performance problem? Or are you trying to avoid issues that you are not sure if they exist or not? :-) Prepared queries are going to improve performance due to being able to execute multiple queries without communicating back to the client. Especially for short queries, network latency can be a significant factor for execution speed. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] LIKE search and performance
On Wed, Jun 06, 2007 at 11:23:13PM +0100, James Mansion wrote: > [EMAIL PROTECTED] wrote: > >What is a real life example where an intelligent and researched > >database application would issue a like or ilike query as their > >primary condition in a situation where they expected very high > >selectivity? > In my case the canonical example is to search against textual keys > where the search is performed automatically if the user hs typed > enough data and paused. In almost all cases the '%' trails, and I'm > looking for 'starts with' in effect. usually the search will have a > specified upper number of returned rows, if that's an available > facility. I realise in this case that matching against the index > does not allow the match count unless we check MVCC as we go, but I > don't see why another thread can't be doing that. I believe PostgreSQL already considers using the index for "starts with", so this wasn't part of the discussion for me. Sorry that this wasn't clear. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [OT] Re: [PERFORM] How much ram is too much
On Fri, Jun 08, 2007 at 08:54:39PM +0200, Zoltan Boszormenyi wrote: > Joshua D. Drake írta: > >Zoltan Boszormenyi wrote: > >>Dave Cramer írta: > >>>It's an IBM x3850 using linux redhat 4.0 > >>Isn't that a bit old? I have a RedHat 4.2 somewhere > >>that was bundled with Applixware 3. :-) > >He means redhat ES/AS 4 I assume. > I guessed that, hence the smiley. > But it's very unfortunate that version numbers > are reused - it can cause confusion. > There was a RH 4.0 already a long ago, > when the commercial and the community > version were the same. I think Microsoft > will avoid reusing its versions when year 2095 comes. :-) He should have written RHEL 4.0. RH 4.0 is long enough ago, though, that I think few would assume it meant the much older release. You'll find a similar thing with products like "CuteFTP 7.0" or "CuteFTP Pro 3.0". Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Held idle connections vs use of a Pooler
Hello, I am relatively new to postgres (just a few months) so apologies if any of you are bearing with me. I am trying to get a rough idea of the amount of bang for the buck I might see if I put in a connection pooling service into the enviroment vs our current methodology of using persistent open connections. We have a number of in house applications that connect to a central Postgres instance. (8.3.7). The box is admitting underpowered with only 8 cores, and 8gb or ram and not great disk IO out of an MSA-70. the database is about 35GB on disk and does mainly (~95%) OTLP type queries. I am currently begging for more ram. Most of the connections from the various apps hold idle connections until they need to execute a query once done go back to holding an open idle connection. (there are ~600 open connections at any given time, and most of the time most are idle) this is typically fine while the number of active queries is low, but some other application (that doesn't use connection pooling or holding open connections when not in use) is hitting the db from time to time with 50-100 small queries (2ms queries from my testing) nearly all at once. when this happens the whole response time goes out the door however). I think from reading this list for a few weeks the answer is move to using connection pooling package elsewhere to better manage incoming connections, with a lower number to the db. I am told this will require some re-working of some app code as I understand pg-pool was tried a while back in our QA environment and server parts of various in-house apps/scripts/..etc started to experience show stopping problems. to help make my case to the devs and various managers I was wondering if someone could expand on what extra work is having to be done while queries run and there is a high (500-600) number of open yet idle connections to db. lots of the queries executed use sub-transactions if that makes a difference. basically what I am paying extra for with that many persistent connections, that I might save if I go to the effort of getting the in-house stuff to make use of a connection pooler ? thank you for your time. ..: mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] cleanup on pg_ system tables?
Hi All, (pg 8.3.7 on RHEL 2.6.18-92.el5 ) I ran the query below (copied from http://pgsql.tapoueh.org/site/html/news/20080131.bloat.html ) on a production DB we have and I am looking at some pretty nasty looking numbers for tables in the pg_catalog schema. I have tried a reindex and vaccum but neither seem to be clearing these out, tried a cluster and it won't let me. I am viewing the problem wrong? is there anything I can do while the DB is online ? do I need to clean up other things first ? thanks, ..: Mark -[ RECORD 1 ]+ schemaname | pg_catalog tablename| pg_attribute reltuples| 5669 relpages | 113529 otta | 92 tbloat | 1234.0 wastedpages | 113437 wastedbytes | 929275904 wastedsize | 886 MB iname| pg_attribute_relid_attnam_index ituples | 5669 ipages | 68 iotta| 80 ibloat | 0.9 wastedipages | 0 wastedibytes | 0 wastedisize | 0 bytes SELECT schemaname, tablename, reltuples::bigint, relpages::bigint, otta, ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, relpages::bigint - otta AS wastedpages, bs*(sml.relpages-otta)::bigint AS wastedbytes, pg_size_pretty((bs*(relpages-otta))::bigint) AS wastedsize, iname, ituples::bigint, ipages::bigint, iotta, ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages, CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes, CASE WHEN ipages < iotta THEN pg_size_pretty(0) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml WHERE sml.relpages - otta > 0 OR ipages - iotta > 10 ORDER BY wastedbytes DESC, wastedibytes DESC -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using Between
>>> The question is how can we make it faster. >>If there's just one region ID for any given postal code, you might try >>adding a column to vehicleused and storing the postal codes there. >>You could possibly populate that column using a trigger; probably it >>doesn't change unless the postalcode changes. Then you could index >>that column and query against it directly, rather than joining to >>PostalCodeRegionCountyCity. Short of that, I don't see any obvious >>way to avoid reading most of the vehicleused table. There may or may >>not be an index that can speed that up slightly and of course you can >>always throw hardware at the problem, but fundamentally reading half a >>million or more rows isn't going to be instantaneous. >>Incidentally, it would probably simplify things to store postal codes >>in the same case throughout the system. If you can avoid the need to >>write lower(x) = lower(y) and just write x = y you may get better >>plans. I'm not sure that's the case in this particular example but >>it's something to think about. Something else you might test is bumping the read-ahead value. Most linux installs have this at 256, might try bumping the value to ~8Meg and tune from there . this may help you slightly for seq scan performance. As always: YMMV. It's not going to magically fix low performing I/O subsystems and it won't help many applications of PG but there are a few outlying instances where this change can help a little bit. I am sure someone will step in and tell you it is a bad idea - AND they will probably have perfectly valid reasons for why it is, so you will need to consider the ramifications.. if at all possible test and tune to see. ..: Mark >>-- >>Robert Haas >>EnterpriseDB: http://www.enterprisedb.com >>The Enterprise Postgres Company -- >>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >>To make changes to your subscription: >>http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Could this be an interesting test use of https://www.fossexperts.com/ ? 'Community' driven proposal - multiple people / orgs agree to pay various portions? Maybe with multiple funders a reasonable target fund amount could be reached. Just throwing around ideas here. Mark -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Robert Haas Sent: Wednesday, October 13, 2010 7:29 AM To: Neil Whelchel Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow count(*) again... On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel wrote: > I might go as far as to rattle the cage of the developers to see if it makes > any sense to add some column oriented storage capability to Postgres. That > would be the hot ticket to be able to specify an attribute on a column so that > the back end could shadow or store a column in a column oriented table so > aggregate functions could work on them with good efficiency, or is that an > INDEX? I'd love to work on that, but without funding it's tough to find the time. It's a big project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware recommendations
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Andy Sent: Wednesday, December 08, 2010 5:24 PM To: Marti Raudsepp Cc: pgsql-performance@postgresql.org; Benjamin Krajmalnik Subject: Re: [PERFORM] Hardware recommendations >> > If you are IO-bound, you might want to consider using >> SSD. >> > >> > A single SSD could easily give you more IOPS than 16 >> 15k SAS in RAID 10. >> >> Are there any that don't risk your data on power loss, AND >> are cheaper >> than SAS RAID 10? >> >Vertex 2 Pro has a built-in supercapacitor to save data on power loss. It's spec'd at 50K IOPS and a 200GB one costs around $1,000. Viking offers 6Gbps SAS physical connector SSD drives as well - with a super capacitor. I have not seen any official pricing yet, but I would suspect it would be in the same ballpark. I am currently begging to get some for eval. I will let everyone know if I swing that and can post numbers. -mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hardware recommendations
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Benjamin Krajmalnik > Sent: Monday, December 13, 2010 1:45 PM > To: Greg Smith > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Hardware recommendations > > > > > -Original Message- > > From: Greg Smith [mailto:g...@2ndquadrant.com] > > Sent: Saturday, December 11, 2010 2:18 AM > > To: Benjamin Krajmalnik > > Cc: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Hardware recommendations > > > > Have you increased checkpoint parameters like checkpoint_segments? > You > > need to avoid having checkpoints too often if you're going to try and > > use 4GB of memory for shared_buffers. > > > > Yes, I have it configured at 1024 checkpoint_segments, 5min timeout,0.9 > compiostat -x 5letion_target I would consider bumping that checkpoint timeout duration to a bit longer and see if that helps any if you are still looking for knobs to fiddle with. YMMV. -Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Migrating to Postgresql and new hardware
Comments in line, take em for what you paid for em. > -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Lars > Sent: Tuesday, January 18, 2011 3:57 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Migrating to Postgresql and new hardware > > Hi, > > We are in the process of moving a web based application from a MySql to > Postgresql database. > Our main reason for moving to Postgresql is problems with MySql > (MyISAM) table locking. I would never try and talk someone out of switching but MyISAM? What version of MySQL and did you pick MyISAM for a good reason or just happened to end up there? > We will buy a new set of servers to run the Postgresql databases. > > The current setup is five Dell PowerEdge 2950 with 2 * XEON E5410, 4GB > RAM. PERC 5/I 256MB NV Cache, 4 * 10K Disks (3 in RAID 5 + 1 spare). > > One server is used for shared data. > Four servers are used for sharded data. A user in the system only has > data in one of the shards. > There is another server to which all data is replicated but I'll leave > that one out of this discussion. > These are dedicated database servers. There are more or less no stored > procedures. The shared database size is about 20GB and each shard > database is about 40GB (total of 20 + 40 * 4 = 180GB). I would expect > the size will grow 10%-15% this year. Server load might increase with > 15%-30% this year. This setup is disk I/O bound. The overwhelming > majority of sql statements are fast (typically single row selects, > updates, inserts and deletes on primary key) but there are some slow > long running (10min) queries. > > As new server we are thinking of PowerEdge R510, 1 * Xeon X5650, 24Gb > RAM, H700 512MB NV Cache. One would think you should notice a nice speed improvement, ceteris paribus, since the X5650 will have ->significantly<- more memory bandwidth than the 5410s you are used to, and you are going to have a heck of a lot more ram for things to cache in. I think the H700 is a step up in raid cards as well but with only 4 disks your probably not maxing out there. > Dell has offered two alternative SSDs: > Samsung model SS805 (100GB Solid State Disk SATA 2.5"). > (http://www.plianttechnology.com/lightning_lb.php) > Pliant model LB 150S (149GB Solid State Drive SAS 3Gbps 2.5"). > (http://www.samsung.com/global/business/semiconductor/products/SSD/Prod > ucts_Enterprise_SSD.html) The Samsung ones seems to indicate that they have protection in the event of a power failure, and the pliant does not mention it. Granted I haven't done or seen any pull the plug under max load tests on either family, so I got nothing beyond that it is the first thing I have looked at with every SSD that crosses my path. > > Both are SLC drives. The price of the Pliant is about 2,3 times the > price of the Samsung (does it have twice the performance?). > > One alternative is 5 servers (1 shared and 4 shards) with 5 Samsung > drives (4 in RAID 10 + 1 spare). > Another alternative would be 3 servers (1 shared and 2 shards) with 5 > Pliant drives (4 in RAID 10 + 1 spare). This would be slightly more > expensive than the first alternative but would be easier to upgrade > with two new shard servers when it's needed. As others have mentioned, how are you going to be doing your "shards"? > > Anyone have experience using the Samsung or the Pliant SSD? Any > information about degraded performance over time? > Any comments on the setups? How would an alternative with 15K disks (6 > RAID 10 + 1 spare, or even 10 RAID10 + 1 spare) compare? You still may find that breaking xlog out to its own logical drive (2 drives in raid 1) gives a speed improvement to the overall. YMMV - so tinker and find out before you go deploying. > How would these alternatives compare in I/O performance compared to the > old setup? > Anyone care to guess how the two alternatives would compare in > performance running Postgresql? > How would the hardware usage of Postgresql compare to MySqls? I won't hazard a guess on the performance difference between PG w/ Fsync ON and MySQL running with MyISAM. If you can get your OS and PG tuned you should be able to have a database that can have pretty decent throughput for an OLTP workload. Since that seems to be the majority of your intended workload. -Mark > > > > Regards > /Lars > > -- > Sent via pgsql-performance mailing list (pgsql- > performa...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queries becoming slow under heavy load
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Ivan Voras > Sent: Wednesday, January 26, 2011 6:25 AM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Queries becoming slow under heavy load > > On 25/01/2011 22:37, Anne Rosset wrote: > > Hi, > > > > We are running some performances tests. With a lot of concurrent > > access, queries get very slow. When there is no load, those queries > run > > fast. > > As others said, you need to stat how many concurrent clients are > working > on the database and also the number of logical CPUs (CPU cores, > hyperthreading) are present in the machine. So far, as a rule of thumb, > if you have more concurrent active connections (i.e. doing queries, not > idling), you will experience a sharp decline in performance if this > number exceeds the number of logical CPUs in the machine. > Depending on what version the OP is running - I didn't see where a version was givin - if there is a "lot" number of idle connections it can affect things as well. Tom indicated to me this should be "much better" in 8.4 and later. We cut our idle connections from 600+ to a bit over 300 and saw a good drop in box load and query responsiveness. (still get large user cpu load spikes though when a few hundred idle connection processes are woken open because they all appear to be sleeping on the same semaphore and one of them has work to do. ) (yeah I know get a pooler, to bad only bouncer seems to pool out idle connections with transaction pooling but then I lose prepared statements... I am still working on that part and getting off 8.3. yes our app tried to do its own quasi connection pooling. When we deployed the app on a few hundred boxes the error of this choice years ago when this app lived on much fewer machines is now pretty evident.) > > -- > Sent via pgsql-performance mailing list (pgsql- > performa...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Xeon twice the performance of opteron
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Jeff > Sent: Thursday, March 17, 2011 9:14 AM > To: pgsql-performance@postgresql.org > Cc: Brian Ristuccia > Subject: [PERFORM] Xeon twice the performance of opteron > > hey folks, > > Running into some odd performance issues between a few of our db > boxes. While trying to speed up a query I ran it on another box and > it was twice as fast. The plans are identical and various portions of > the query run in the same amount of time - it all boils down to most > of the time being spent in a join filter. The plan is as good as it > is going to get but the thing that is concerning me, which hopefully > some folks here may have some insight on, is the very large difference > in runtime. > > three boxes: > A: Intel(R) Xeon(R) CPU E5345 @ 2.33GHz (Runs query > fastest) > 4MB cache > B: Quad-Core AMD Opteron(tm) Processor 2352 (2.1GHZ) (Main > production > box, currently, middle speed) > 512k cache > C: Quad-Core AMD Opteron(tm) Processor 2378 (2.4GHZ) > 512k cache > > A & B are running PG 8.4.2 (yes, I know it desperately need to be > upgraded). C was also on 8.4.2 and since it was not in production I > upgraded it to 8.4.7 and got the same performance as 8.4.2. Dataset > on A & B is the same C is mostly the same, but is missing a couple > weeks of data (but since this query runs over 3 years of data, it is > negligable - plus C runs the slowest!) > > All three running FC10 with kernel Linux db06 > 2.6.27.19-170.2.35.fc10.x86_64 #1 SMP Mon Feb 23 13:00:23 EST 2009 > x86_64 x86_64 x86_64 GNU/Linux > > Load is very low on each box. The query is running from shared_buffers > - no real IO is occuring. > > The average timing for the query in question is 90ms on A, 180ms on B > and 190ms on C. > > Now here's where some odd stuff starts piling up: explain analyze > overhead on said queries: > 20ms on A, 50ms on B and 85ms on C(!!) > > We had one thought about potential NUMA issues, but doing a series > (100) of connect, query, disconnect and looking at the timings reveals > them all to be solid... but even still we wouldn't expect it to be > that awful. The smaller cache of the opterons is also a valid > argument. > > I know we're running an old kernel, I'm tempted to upgrade to see what > will happen, but at the same time I'm afraid it'll upgrade to a kernel > with a broken [insert major subsystem here] which has happened before. > > Anybody have some insight into this or run into this before? > > btw, little more background on the query: > > -> Nested Loop (cost=5.87..2763.69 rows=9943 width=0) (actual > time=0.571..2 > 74.750 rows=766 loops=1) > Join Filter: (ce.eventdate >= (md.date - '6 days'::interval)) > -> Nested Loop (cost=5.87..1717.98 rows=27 width=8) > (actual time=0.53 > 3..8.301 rows=159 loops=1) > [stuff removed here] > -> Index Scan using xxx_date_idx on xx md > (cost=0.00..19.50 rows=1099 width=8) (actual time=0.023..0.729 > rows=951 loops=15 > 9) > Index Cond: (ce.eventdate <= md.date) > > > On all three boxes that inner nestloop completes in about the same > amount of time - it is that join filter that is causing the pain and > agony. (If you are noticing the timing differences, that is because > the numbers above are the actual numbers, not explain analyze). The > query is pulling up a rolling window of events that occured on a > specific date. This query pulls up al the data for a period of time. > ce.eventdate is indexed, and is used in the outer nestloop. Thinking > more about what is going on cache thrashing is certainly a possibility. > > the amazing explain analyze overhead is also very curious - we all > know it adds overhead, but 85ms? Yow. > > -- > Jeff Trout > http://www.stuarthamm.net/ > http://www.dellsmartexitin.com/ I am sure you might have already checked for this, but just incase... Did you verify that no power savings stuff is turned on in the BIOS or at the kernel ? I have to set ours to something HP calls static high performance or something like that if I want boxes that are normally pretty idle to execute in a predictable fashion for sub second queries. I assume you checked with a steam benchmark results on the AMD machines to make sure they are getting in the ballpark of where they are supposed to ? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Select in subselect vs select = any array
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Adam Tistler > Sent: Monday, March 21, 2011 12:17 AM > To: Pavel Stehule > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Select in subselect vs select = any array > > Pavel, thanks for the help. > > I increased work_mem from 16MB to 64MB, no difference. The queries are > really just a test case. My actual queries are actual just large > number of primary keys that I am selecting from the db: > > For example: >select * from nodes where node_id in ( 1, 2, 3 . ) > What does "large" number of primary keys mean ? I have seen some "odd" things happen when I passed, carelessly, tens of thousands of items to an in list for a generated query, but I don't get the feeling that isn't the case here. ..: Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Suspending SELECTs
On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote: > I understand most of these issues, and expected this kind of reply. Please, > allow me to insist that we reason on this problem and try to find a > solution. My reason for doing so is that the future software industry is > likely to see more and more web applications retrieving data from virtually > endless databases, and in such contexts, it is sensible to ask the final > client--the web client--to store the "cursor state", because web > interaction is intrinsically asynchronous, and you cannot count on users > logging out when they're done, releasing resources allocated to them. Think > of Google. What is wrong with LIMIT and OFFSET? I assume your results are ordered in some manner. Especially with web users, who become bored if the page doesn't flicker in a way that appeals to them, how could one have any expectation that the cursor would ever be useful at all? As a 'general' solution, I think optimizing the case where the same query is executed multiple times, with only the LIMIT and OFFSET parameters changing, would be a better bang for the buck. I'm thinking along the lines of materialized views, for queries executed more than a dozen times in a short length of time... :-) In the mean time, I successfully use LIMIT and OFFSET without such an optimization, and things have been fine for me. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Suspending SELECTs
On Wed, Jan 18, 2006 at 09:57:50AM +0100, Alessandro Baretta wrote: > [EMAIL PROTECTED] wrote: > >On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote: > >>I understand most of these issues, and expected this kind of reply. > >>Please, allow me to insist that we reason on this problem and try to find > >>a solution. My reason for doing so is that the future software industry > >>is likely to see more and more web applications retrieving data from > >>virtually endless databases, and in such contexts, it is sensible to ask > >>the final client--the web client--to store the "cursor state", because > >>web interaction is intrinsically asynchronous, and you cannot count on > >>users logging out when they're done, releasing resources allocated to > >>them. Think of Google. > >What is wrong with LIMIT and OFFSET? I assume your results are ordered > >in some manner. > It looks like this is the only possible solution at present--and in the > future, too--but it has a tremendouse performance impact on queries > returning thousands of rows. In the case of one web user generating one query, I don't see how it would have a tremendous performance impact on large queries. You mentioned google. I don't know how you use google - but most of the people I know, rarely ever search through the pages. Generally the answer we want is on the first page. If the ratio of users who search through multiple pages of results, and users who always stop on the first page, is anything significant (better than 2:1?) LIMIT and OFFSET are the desired approach. Why have complicated magic in an application, for a subset of the users? I there is to be a change to PostgreSQL to optimize for this case, I suggest it involve the caching of query plans, executor plans, query results (materialized views?), LIMIT and OFFSET. If we had all of this, you would have exactly what you want, while benefitting many more people than just you. No ugly 'persistent state cursors' or 'import/export cursor state' implementation. People would automatically benefit, without changing their applications. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Suspending SELECTs
On Wed, Jan 18, 2006 at 03:41:57PM +, Harry Jackson wrote: > There are various reason why google might want to limit the search > result returned ie to encourage people to narrow their search. Prevent > screen scrapers from hitting them really hard blah blah. Perhaps less > than 0.0001% of real users (not scrapers) actually dig down to the > 10th page so whats the point. I recall a day when google crashed, apparently due to a Windows virus that would use google to obtain email addresses. As an unsubstantiated theory - this may have involved many, many clients, all accessing search page results beyond the first page. I don't see google optimizing for the multiple page scenario. Most people (as I think you agree above), are happy with the first or second page, and they are gone. Keeping a cursor for these people as anything more than an offset into search criteria, would not be useful. Cheers, -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] LIKE query on indexes
On Tue, Feb 21, 2006 at 05:57:12PM +0200, Ibrahim Tekin wrote: > i have btree index on a text type field. i want see rows which starts with > certain characters on that field. so i write a query like this: > SELECT * FROM mytable WHERE myfield LIKE 'john%' > since this condition is from start of the field, query planner should use > index to find such elements but explain command shows me it will do a > sequential scan. > is this lack of a feature or i am wrong somewhere? Is the query fast enough? How big is your table? What does explain analyze select tell you? Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote: > Sad, cause the AMD is, on a price / performance scale, twice the > processor for the same money as the Intel. Maybe a year or two ago. Prices are all coming down. Intel more than AMD. AMD still seems better - but not X2, and it depends on the workload. X2 sounds like biggotry against Intel... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Tue, Apr 25, 2006 at 01:42:31PM -0500, Scott Marlowe wrote: > On Tue, 2006-04-25 at 13:38, [EMAIL PROTECTED] wrote: > > On Tue, Apr 25, 2006 at 01:33:38PM -0500, Scott Marlowe wrote: > > > Sad, cause the AMD is, on a price / performance scale, twice the > > > processor for the same money as the Intel. > > Maybe a year or two ago. Prices are all coming down. Intel more > > than AMD. > > AMD still seems better - but not X2, and it depends on the workload. > > X2 sounds like biggotry against Intel... :-) > Actually, that was from an article from this last month that compared > the dual core intel to the amd. for every dollar spent on the intel, > you got about half the performance of the amd. Not bigotry. fact. > But don't believe me or the other people who've seen the difference. Go > buy the Intel box. No skin off my back. AMD Opteron vs Intel Xeon is different than AMD X2 vs Pentium D. For AMD X2 vs Pentium D - I have both - in similar price range, and similar speed. I choose to use the AMD X2 as my server, and Pentium D as my Windows desktop. They're both quite fast. I made the choice I describe based on a lot of research. I was going to go both Intel, until I noticed that the Intel prices were dropping fast. 30% price cut in 2 months. AMD didn't drop at all during the same time. There are plenty of reasons to choose one over the other. Generally the AMD comes out on top. It is *not* 2X though. Anybody who claims this is being highly selective about which benchmarks they consider. One article is nothing. There is a lot of hype these days. AMD is winning the elite market, which means that they are able to continue to sell high. Intel, losing this market, is cutting its prices to compete. And they do compete. Quite well. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Tue, Apr 25, 2006 at 08:54:40PM -0400, [EMAIL PROTECTED] wrote: > I made the choice I describe based on a lot of research. I was going > to go both Intel, until I noticed that the Intel prices were dropping > fast. 30% price cut in 2 months. AMD didn't drop at all during the > same time. Errr.. big mistake. That was going to be - I was going to go both AMD. > There are plenty of reasons to choose one over the other. Generally > the AMD comes out on top. It is *not* 2X though. Anybody who claims > this is being highly selective about which benchmarks they consider. I have an Intel Pentium D 920, and an AMD X2 3800+. These are very close in performance. The retail price difference is: Intel Pentium D 920 is selling for $310 CDN AMD X2 3800+is selling for $347 CDN Another benefit of Pentium D over AMD X2, at least until AMD chooses to switch, is that Pentium D supports DDR2, whereas AMD only supports DDR. There are a lot of technical pros and cons to each - with claims from AMD that DDR2 can be slower than DDR - but one claim that isn't often made, but that helped me make my choice: 1) DDR2 supports higher transfer speeds. I'm using DDR2 5400 on the Intel. I think I'm at 3200 or so on the AMD X2. 2) DDR2 is cheaper. I purchased 1 Gbyte DDR2 5400 for $147 CDN. 1 Gbyte of DDR 3200 starts at around the same price, and stretches into $200 - $300 CDN. Now, granted, the Intel 920 requires more electricity to run. Running 24/7 for a year might make the difference in cost. It doesn't address point 1) though. I like my DDR2 5400. So, unfortunately, I won't be able to do a good test for you to prove that my Windows Pentium D box is not only cheaper to buy, but faster, because the specs aren't exactly equivalent. In the mean time, I'm quite enjoying my 3d games while doing other things at the same time. I imagine my desktop load approaches that of a CPU-bound database load. 3d games require significant I/O and CPU. Anybody who claims that Intel is 2X more expensive for the same performance, isn't considering all factors. No question at all - the Opteron is good, and the Xeon isn't - but the original poster didn't ask about Opeteron or Xeon, did he? For the desktop lines - X2 is not double Pentium D. Maybe 10%. Maybe not at all. Especially now that Intel is dropping it's prices due to overstock. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Tue, Apr 25, 2006 at 11:07:17PM -0400, Ron Peacetree wrote: > THROUGHPUT is better with DDR2 if and only if there is enough data > to be fetched in a serial fashion from memory. > LATENCY however is dependent on the base clock rate of the RAM > involved. So PC3200, 200MHz x2, is going to actually perform better > than PC2-5400, 166MHz x4, for almost any memory access pattern > except those that are highly sequential. I had forgotten about this. Still, it's not quite as simple as you say. DDR2 has increased latency, however, it has a greater upper limit, and when run at the same clock speed (200 Mhz for 200 Mhz), it is not going to perform worse. Add in double the pre-fetching capability, and what you get is that most benchmarks show DDR2 5400 as being slightly faster than DDR 3200. AMD is switching to DDR2, and I believe that, even after making such a big deal about latency, and why they wouldn't switch to DDR2, they are now saying that their on-chip memory controller will be able to access DDR2 memory (when they support it soon) faster than Intel can, not having an on-chip memory controller. You said that DB accesses are random. I'm not so sure. In PostgreSQL, are not the individual pages often scanned sequentially, especially because all records are variable length? You don't think PostgreSQL will regularly read 32 bytes (8 bytes x 4) at a time, in sequence? Whether for table pages, or index pages - I'm not seeing why the accesses wouldn't be sequential. You believe PostgreSQL will access the table pages and index pages randomly on a per-byte basis? What is the minimum PostgreSQL record size again? Isn't it 32 bytes or over? :-) I wish my systems were running the same OS, and I'd run a test for you. Alas, I don't think comparing Windows to Linux would be valuable. > A minor point to be noted in addition here is that most DB servers > under load are limited by their physical IO subsystem, their HDs, > and not the speed of their RAM. It seems like a pretty major point to me. :-) It's why Opteron with RAID kicks ass over HyperTransport. > All of the above comments about the relative performance of > different RAM types become insignificant when performance is gated > by the HD subsystem. Yes. Luckily - we don't all have Terrabyte databases... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Large (8M) cache vs. dual-core CPUs
On Wed, Apr 26, 2006 at 05:37:31PM -0500, Jim C. Nasby wrote: > On Wed, Apr 26, 2006 at 06:16:46PM -0400, Bruce Momjian wrote: > > AMD transfers the dirty cache line directly from cpu to cpu. I can > > imaging that helping our test-and-set shared memory usage quite a bit. > Wasn't the whole point of test-and-set that it's the recommended way to > do lightweight spinlocks according to AMD/Intel? You'd think they'd have > a way to make that performant on multiple CPUs (though if it's relying > on possibly modifying an underlying data page I can't really think of > how to do that without snaking through the cache...) It's expensive no matter what. One method might be less expensive than another. :-) AMD definately seems to have things right for lowest absolute latency. 2X still sounds like an extreme case - but until I've actually tried a very large, or thread intensive PostgreSQL db on both, I probably shouldn't doubt the work of others too much. :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 64-bit vs 32-bit performance ... backwards?
I've been trying to track this stuff - in fact, I'll likely be switching from AMD32 to AMD64 in the next few weeks. I believe I have a handle on the + vs - of 64-bit. It makes sense that full 64-bit would be slower. At an extreme it halfs the amount of available memory or doubles the required memory bandwidth, depending on the work load. Has anybody taken a look at PostgreSQL to ensure that it uses 32-bit integers instead of 64-bit integers where only 32-bit is necessary? 32-bit offsets instead of 64-bit pointers? This sort of thing? I haven't. I'm meaning to take a look. Within registers, 64-bit should be equal speed to 32-bit. Outside the registers, it would make sense to only deal with the lower 32-bits where 32-bits is all that is required. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Postgresql Performance on an HP DL385 and
On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote: > On Mon, Aug 14, 2006 at 01:09:04PM -0400, Michael Stone wrote: > > On Mon, Aug 14, 2006 at 12:05:46PM -0500, Jim C. Nasby wrote: > > >Wow, interesting. IIRC, XFS is lower performing than ext3, > > For xlog, maybe. For data, no. Both are definately slower than ext2 for > > xlog, which is another reason to have xlog on a small filesystem which > > doesn't need metadata journalling. > Are 'we' sure that such a setup can't lose any data? I'm worried about > files getting lost when they get written out before the metadata does. I've been worrying about this myself, and my current conclusion is that ext2 is bad because: a) fsck, and b) data can be lost or corrupted, which could lead to the need to trash the xlog. Even ext3 in writeback mode allows for the indirect blocks to be updated without the data underneath, allowing for blocks to point to random data, or worse, previous apparently sane data (especially if the data is from a drive only used for xlog - the chance is high that a block might look partially valid?). So, I'm sticking with ext3 in ordered mode. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgresql Performance on an HP DL385 and
On Tue, Aug 15, 2006 at 01:26:46PM -0400, Michael Stone wrote: > On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote: > >Are 'we' sure that such a setup can't lose any data? > Yes. If you check the archives, you can even find the last time this was > discussed... I looked last night (coincidence actually) and didn't find proof that you cannot lose data. How do you deal with the file system structure being updated before the data blocks are (re-)written? I don't think you can. > The bottom line is that the only reason you need a metadata journalling > filesystem is to save the fsck time when you come up. On a little > partition like xlog, that's not an issue. fsck isn't only about time to fix. fsck is needed, because the file system is broken. If the file system is broken, how can you guarantee data has not been corrupted? Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Postgresql Performance on an HP DL385 and
On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote: > On Tue, Aug 15, 2006 at 02:33:27PM -0400, [EMAIL PROTECTED] wrote: > >>>Are 'we' sure that such a setup can't lose any data? > >>Yes. If you check the archives, you can even find the last time this was > >>discussed... > >I looked last night (coincidence actually) and didn't find proof that > >you cannot lose data. > You aren't going to find proof, any more than you'll find proof that you > won't lose data if you do lose a journalling fs. (Because there isn't > any.) Unfortunately, many people misunderstand the what a metadata > journal does for you, and overstate its importance in this type of > application. Yes, many people do. :-) > >How do you deal with the file system structure being updated before the > >data blocks are (re-)written? > *That's what the postgres log is for.* If the latest xlog entries don't > make it to disk, they won't be replayed; if they didn't make it to > disk, the transaction would not have been reported as commited. An > application that understands filesystem semantics can guarantee data > integrity without metadata journaling. No. This is not true. Updating the file system structure (inodes, indirect blocks) touches a separate part of the disk than the actual data. If the file system structure is modified, say, to extend a file to allow it to contain more data, but the data itself is not written, then upon a restore, with a system such as ext2, or ext3 with writeback, or xfs, it is possible that the end of the file, even the postgres log file, will contain a random block of data from the disk. If this random block of data happens to look like a valid xlog block, it may be played back, and the database corrupted. If the file system is only used for xlog data, the chance that it looks like a valid block increases, would it not? > >>The bottom line is that the only reason you need a metadata journalling > >>filesystem is to save the fsck time when you come up. On a little > >>partition like xlog, that's not an issue. > >fsck isn't only about time to fix. fsck is needed, because the file system > >is broken. > fsck is needed to reconcile the metadata with the on-disk allocations. > To do that, it reads all the inodes and their corresponding directory > entries. The time to do that is proportional to the size of the > filesystem, hence the comment about time. fsck is not needed "because > the filesystem is broken", it's needed because the filesystem is marked > dirty. This is also wrong. fsck is needed because the file system is broken. It takes time, because it doesn't have a journal to help it, therefore it must look through the entire file system and guess what the problems are. There are classes of problems such as I describe above, for which fsck *cannot* guess how to solve the problem. There is not enough information available for it to deduce that anything is wrong at all. The probability is low, for sure - but then, the chance of a file system failure is already low. Betting on ext2 + postgresql xlog has not been confirmed to me as reliable. Telling me that journalling is misunderstood doesn't prove to me that you understand it. I don't mean to be offensive, but I won't accept what you say, as it does not make sense with my understanding of how file systems work. :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgresql Performance on an HP DL385 and
On Tue, Aug 15, 2006 at 02:15:05PM -0500, Jim C. Nasby wrote: > So what causes files to get 'lost' and get stuck in lost+found? > AFAIK that's because the file was written before the metadata. Now, if > fsync'ing a file also ensures that all the metadata is written, then > we're probably fine... if not, then we're at risk every time we create a > new file (every WAL segment if archiving is on, and every time a > relation passes a 1GB boundary). Only if fsync ensures that the data written to disk is ordered, which as far as I know, is not done for ext2. Dirty blocks are written in whatever order is fastest for them to be written, or sequential order, or some order that isn't based on examining the metadata. If my understanding is correct - and I've seen nothing yet to say that it isn't - ext2 is not safe, postgresql xlog or not, fsck or not. It is safer than no postgresql xlog - but there exists windows, however small, where the file system can be corrupted. The need for fsck is due to this problem. If fsck needs to do anything at all, other than replay a journal, the file system is broken. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgresql Performance on an HP DL385 and
On Tue, Aug 15, 2006 at 04:05:17PM -0400, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > I've been worrying about this myself, and my current conclusion is that > > ext2 is bad because: a) fsck, and b) data can be lost or corrupted, which > > could lead to the need to trash the xlog. > > Even ext3 in writeback mode allows for the indirect blocks to be updated > > without the data underneath, allowing for blocks to point to random data, > > or worse, previous apparently sane data (especially if the data is from > > a drive only used for xlog - the chance is high that a block might look > > partially valid?). > At least for xlog, this worrying is misguided, because we zero and fsync > a WAL file before we ever put any valuable data into it. Unless the > filesystem is lying through its teeth about having done an fsync, there > should be no metadata changes happening for an active WAL file (other > than mtime of course). Hmmm... I may have missed a post about this in the archive. WAL file is never appended - only re-written? If so, then I'm wrong, and ext2 is fine. The requirement is that no file system structures change as a result of any writes that PostgreSQL does. If no file system structures change, then I take everything back as uninformed. Please confirm whichever. :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Postgresql Performance on an HP DL385 and
On Tue, Aug 15, 2006 at 04:58:59PM -0400, Michael Stone wrote: > On Tue, Aug 15, 2006 at 03:39:51PM -0400, [EMAIL PROTECTED] wrote: > >No. This is not true. Updating the file system structure (inodes, indirect > >blocks) touches a separate part of the disk than the actual data. If > >the file system structure is modified, say, to extend a file to allow > >it to contain more data, but the data itself is not written, then upon > >a restore, with a system such as ext2, or ext3 with writeback, or xfs, > >it is possible that the end of the file, even the postgres log file, > >will contain a random block of data from the disk. If this random block > >of data happens to look like a valid xlog block, it may be played back, > >and the database corrupted. > you're conflating a whole lot of different issues here. You're ignoring > the fact that postgres preallocates the xlog segment, you're ignoring > the fact that you can sync a directory entry, you're ignoring the fact > that syncing some metadata (such as atime) doesn't matter (only the > block allocation is important in this case, and the blocks are > pre-allocated). Yes, no, no, no. :-) I didn't know that the xlog segment only uses pre-allocated space. I ignore mtime/atime as they don't count as file system structure changes to me. It's updating a field in place. No change to the structure. With the pre-allocation knowledge, I agree with you. Not sure how I missed that in my reviewing of the archives... I did know it pre-allocated once upon a time... Hmm > >This is also wrong. fsck is needed because the file system is broken. > nope, the file system *may* be broken. the dirty flag simply indicates > that the filesystem needs to be checked to find out whether or not it is > broken. Ah, but if we knew it wasn't broken, then fsck wouldn't be needed, now would it? So we assume that it is broken. A little bit of a game, but it is important to me. If I assumed the file system was not broken, I wouldn't run fsck. I run fsck, because I assume it may be broken. If broken, it indicates potential corruption. The difference for me, is that if you are correct, that the xlog is safe, than for a disk that only uses xlog, fsck is not ever necessary, even after a system crash. If fsck is necessary, then there is potential for a problem. With the pre-allocation knowledge, I'm tempted to agree with you that fsck is not ever necessary for partitions that only hold a properly pre-allocated xlog. > >I don't mean to be offensive, but I won't accept what you say, as it does > >not make sense with my understanding of how file systems work. :-) > I'm not getting paid to convince you of anything. Just getting you to back up your claim a bit... As I said, no intent to offend. I learned from it. Thanks, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Q: Performance of join vs embedded query for simple queries?
942c.dd2911d5.ad56.08:00:09:fd:1b:da') and lower(name) = lower('markm-Q00855572'); QUERY PLAN --- Index Scan using sm_change_name_key on sm_change (cost=2.99..7.82 rows=1 width=80) (actual time=0.322..0.328 rows=1 loops=1) Index Cond: (lower((name)::text) = 'markm-q00855572'::text) Filter: (system_dbid = $0) InitPlan -> Seq Scan on sm_system (cost=0.00..2.99 rows=1 width=4) (actual time=0.052..0.106 rows=1 loops=1) Filter: ((uid)::text = '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text) Total runtime: 0.419 ms (7 rows) Time: 16.494 ms neudb=> explain analyze select sm_change.uid, sm_change.name from sm_change join sm_system using (system_dbid) where sm_system.uid = '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da' and lower(sm_change.name) = lower('markm-Q00855572'); QUERY PLAN - Nested Loop (cost=0.00..7.83 rows=1 width=80) (actual time=0.099..0.159 rows=1 loops=1) Join Filter: ("outer".system_dbid = "inner".system_dbid) -> Index Scan using sm_change_name_key on sm_change (cost=0.00..4.83 rows=1 width=84) (actual time=0.053..0.059 rows=1 loops=1) Index Cond: (lower((name)::text) = 'markm-q00855572'::text) -> Seq Scan on sm_system (cost=0.00..2.99 rows=1 width=4) (actual time=0.030..0.077 rows=1 loops=1) Filter: ((uid)::text = '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text) Total runtime: 0.250 ms (7 rows) Time: 1.898 ms I'm still learning how PostgreSQL works internally. My understanding is that the above are essentially the same. The first finds the system row using a sequential scan, then looks for the change row using the index, filtering by the system value. The second finds the change rows using the same index, expecting to find one row, and finding only one row, and matches it up against the system row using a sequential scan. So why does one reliably run faster than the other? neudb=> prepare plan1 (varchar(80), varchar(80)) as select uid, name from sm_change where system_dbid = (select system_dbid from sm_system where uid = $1) and lower(name) = lower($2); neudb=> prepare plan2 (varchar(80), varchar(80)) as select sm_change.uid, sm_change.name from sm_change join sm_system using (system_dbid) where sm_system.uid = $1 and lower(sm_change.name) = lower($2); Now: neudb=> execute plan1 ('2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da', 'markm-q00855572'); uid| name --+- ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572 (1 row) Time: 0.794 ms neudb=> execute plan2 ('2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da', 'markm-q00855572'); uid| name --+- ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572 (1 row) Time: 0.715 ms The numbers above don't mean anything. I ran both a few dozen times, and my conclusion is that after the plan is prepared (I did explain analyze to ensure that the prepared plans were the same as the dynamically generated plans), the times are the same. Both ranged from 0.690 ms -> 0.850 ms. Timings at these resolutions are not so reliable. :-) I think this means that the planner takes longer to figure out what to do about the join, and that my writing the select out as an embedded select reduces the effort required by the planner. This makes sense to me, except that I thought PostgreSQL would convert back and forth between the two forms automatically. They are the same query, are they not? Why wouldn't they both take longer, or both take shorter? What if I invented a scenario where the difference in plans made a major difference, such as making the system table much larger, still without an index? Should they not both come up with the same plan - the better estimated plan? Am I expecting too much? :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Q: Performance of join vs embedded query for simple queries?
On Thu, Aug 17, 2006 at 09:21:33PM -0400, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > I have two simple queries that do what I believe to be the exact same > > thing. > These are actually not equivalent per spec. > ... > This still isn't equivalent to the join: it'll return at most one copy > of any sm_change row, whereas you can get multiple copies of the same > sm_change row from the join, if there were multiple matching sm_system > rows. (Hm, given the unique index on (system_dbid, uid), I guess that > couldn't actually happen --- but you have to reason about it knowing > that that index is there, it's not obvious from the form of the query.) > Anyway: given the way that the planner works, the IN form and the join > form will probably take comparable amounts of time to plan. The "= > subselect" form is much more constrained in terms of the number of > alternative implementations we have, so it doesn't surprise me that it > takes less time to plan. That makes sense. Would it be reasonable for the planner to eliminate plan considerations based on the existence of unique indexes, or is this a fundamentally difficult thing to get right in the general case? I did the elimination in my head, which is why I considered the plans to be the same. Can the planner do it? Sub-millisecond planning/execution for simple queries on moderate hardware seems sexy... :-) Thanks, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Q: Performance of join vs embedded query for simple queries?
On Thu, Aug 17, 2006 at 09:21:33PM -0400, Tom Lane wrote: > Another related form is > > neudb=> select uid, name from sm_change where system_dbid IN (select > system_dbid from sm_system where uid = > '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da') and lower(name) = > lower('markm-Q00855572'); > ... > Anyway: given the way that the planner works, the IN form and the join > form will probably take comparable amounts of time to plan. The "= > subselect" form is much more constrained in terms of the number of > alternative implementations we have, so it doesn't surprise me that it > takes less time to plan. FYI: You are correct. The IN takes about as long as the join to plan, and does generate the same plan as the join. This restores confidence for me that PostgreSQL is able to understand the two as equivalent. With regard to that unique constraint planning - I gave you the wrong query from my log. I had already thought that through, and realized that my original query missed the type. The timings and plans are the functionally the same for all the three queries, with or without the type qualifier. This is the table: Table "public.sm_system" Column| Type |Modifiers -++- system_dbid | integer| not null default nextval('sm_system_system_dbid_seq'::regclass) type| character varying(10) | not null uid | character varying(200) | not null name| character varying(200) | not null owner | character varying(80) | not null Indexes: "sm_system_pkey" PRIMARY KEY, btree (system_dbid) CLUSTER "sm_system_type_key" UNIQUE, btree ("type", uid) Check constraints: "sm_system_type_check" CHECK ("type"::text = 'NEU'::text OR "type"::text = 'PLS'::text) And this is what the query should have been: neudb=> explain analyze select uid, name from sm_change where system_dbid IN (select system_dbid from sm_system where type = 'NEU' and uid = '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da') and lower(name) = lower('markm-Q00855572'); QUERY PLAN - Nested Loop IN Join (cost=0.00..7.86 rows=1 width=80) (actual time=19.438..19.453 rows=1 loops=1) -> Index Scan using sm_change_name_key on sm_change (cost=0.00..4.83 rows=1 width=84) (actual time=0.064..0.073 rows=1 loops=1) Index Cond: (lower((name)::text) = 'markm-q00855572'::text) -> Index Scan using sm_system_pkey on sm_system (cost=0.00..3.02 rows=1 width=4) (actual time=19.358..19.358 rows=1 loops=1) Index Cond: ("outer".system_dbid = sm_system.system_dbid) Filter: ((("type")::text = 'NEU'::text) AND ((uid)::text = '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text)) Total runtime: 19.568 ms (7 rows) Time: 21.449 ms I guess the case isn't as simple as I thought. It would need to recognize that the specification of both the 'type' and the 'uid' are static, and unique, therefore the argument to the IN, or the table that it is joining with will be either 0 rows or 1 row. Too complicated to be worth it, eh? :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Large tables (was: RAID 0 not as fast as expected)
On Mon, Sep 18, 2006 at 06:10:13PM -0700, Luke Lonergan wrote: > Also be sure to set the random_page_cost parameter in > postgresql.conf to 100 or even higher when you use indexes, as the actual > seek rate for random access ranges between 50 and 300 for modern disk > hardware. If this parameter is left at the default of 4, indexes will often > be used inappropriately. Does a tool exist yet to time this for a particular configuration? Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Large tables (was: RAID 0 not as fast as
On Fri, Sep 22, 2006 at 02:52:09PM +1200, Guy Thornley wrote: > > >> I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly > > >> meant for this purpose? > > > This is a good idea - I wasn't aware that this was possible. > > This possibility was the reason for me to propose it. :-) > posix_fadvise() features in the TODO list already; I'm not sure if any work > on it has been done for pg8.2. > > Anyway, I understand that POSIX_FADV_DONTNEED on a linux 2.6 kernel allows > pages to be discarded from memory earlier than usual. This is useful, since > it means you can prevent your seqscan from nuking the OS cache. > > Of course you could argue the OS should be able to detect this, and prevent > it occuring anyway. I don't know anything about linux's behaviour in this > area. I recall either monitoring or participating in the discussion when this call was added to Linux. I don't believe the kernel can auto-detect that you do not need a page any longer. It can only prioritize pages to keep when memory is fully in use and a new page must be loaded. This is often some sort of LRU scheme. If the page is truly useless, only the application can know. I'm not convinced that PostgreSQL can know this. The case where it is useful is if a single process is sequentially scanning a large file (much larger than memory). As soon as it is more than one process, or if it is not a sequential scan, or if it is not a large file, this call hurts more than it gains. Just because I'm done with the page does not mean that *you* are done with the page. I'd advise against using this call unless it can be shown that the page will not be used in the future, or at least, that the page is less useful than all other pages currently in memory. This is what the call really means. It means, "There is no value to keeping this page in memory". Perhaps certain PostgreSQL loads fit this pattern. None of my uses fit this pattern, and I have trouble believing that a majority of PostgreSQL loads fits this pattern. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Opteron vs. Xeon "benchmark"
On Fri, Sep 22, 2006 at 11:50:47PM +0200, Arjen van der Meijden wrote: > If you're an AMD-fan, by all means, buy their products, those processors > are indeed fast and you can build decent servers with them. But don't > rule out Intel, just because with previous processors they were the > slower player ;) Yep. From what I understand, Intel is 8 to 10 times the size of AMD. It's somewhat amazing that AMD even competes, and excellent for us, the consumer, that they compete well, ensuring that we get very fast computers, for amazingly low prices. But Intel isn't crashing down any time soon. Perhaps they became a little lazy, and made a few mistakes. AMD is forcing them to clean up. May the competition continue... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hints proposal
On Sun, Oct 15, 2006 at 05:25:31PM -0700, Craig A. James wrote: > So my question is: Is there any argument that can be made to persuade those > of you who are volunteering your time on the optimizer to even consider a > HINTS proposal? Has all this discussion changed your perspective on 2(c), > and why it really matters to some of us? Are we just wasting our time, or > is this a fruitful discussion? They're waiting for an idea that captures their imagination. So far, it seems like a re-hashing of old ideas that have been previously shot down, none of which seem overly imaginative, or can be shown to provide significant improvement short term or long term... :-) Haha. That's my take on it. Sorry if it is harsh. To get very competent people to volunteer their time, you need to make them believe. They need to dream about it, and wake up the next morning filled with a desire to try out some of their ideas. You need to brain wash them... :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Optimization of this SQL sentence
On Wed, Oct 18, 2006 at 11:31:44AM +0200, Mario Weilguni wrote: > It's not a bad idea. Usually I use postal codes with 25 chars, and never had > any problem. With text, the limit would be ~1 GB. No matter how much testing > in the application happens, the varchar(25) as last resort is a good idea. > And in most cases, the application itself limits the length, and thus it's > good to reflect this in the database design. > Feel free to use text anywhere for your application, and feel free to use > numeric(1000) instead of numeric(4) if you want to be prepared for really > long numbers, but don't tell other people it's bad database design - it > isn't. It's unnecessary design. Suggestions in this regard lead towards the user seeing a database error, instead of a nice specific message provided by the application. I used to use varchar instead of text, but have since softened, as the number of times it has ever actually saved me is zero, and the number of times it has screwed me up (picking too small of a limit too early) has been a few. It's kind of like pre-optimization before there is a problem. Sometimes it works for you, sometimes it works against. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
OT: TCL vs Perl Re: [PERFORM] commit so slow program looks frozen
Perl started out fast - TCL started out slow. Perl used syntax that, although it would drive some people crazy, followed a linguistic curve that Larry Wall claimed was healthy. The English language is crazy, and yet, it has become standard world wide as well. Designed, regular languages like Esperanto have not received much support either. Perl is designed to be practical. TCL was designed to be minimalistic. Perl uses common idioms for UNIX programmers. // for regular expressions, $VAR for variables, Many of the statement are familiar for C programmers. ++ for increment (compare against 'incr abc' for TCL). $a=5 for assignment, compare against 'set abc 5' in TCL. TCL tries to have a reduced syntax, where 'everything is a string' which requires wierdness for people. For example, newline is end-of-line, so { must be positioned correctly. Code is a string, so in some cases you need to escape code, otherwise not. Perl has object oriented support built-in. It's ugly, but it works. TCL has a questionable '[incr tcl]' package. Perl has a wealth of modules on CPAN to do almost anything you need to. TCL has the beginning of one (not as rich), but comes built-in with things like event loops, and graphicals (Tk). I could go on and on - but I won't, because this is the PostgreSQL mailing list. People either get Perl, or TCL, or they don't. More people 'get' Perl, because it was marketted better, it's syntax is deceivingly comparable to other well known languages, and for the longest time, it was much faster than TCL to write (especially when using regular expressions) and faster to run. Did TCL get treated unfairly as a result? It's a language. Who cares! :-) Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] MVCC & indexes?
On Tue, Oct 31, 2006 at 10:55:40PM +0100, Ivan Voras wrote: > Ok, so MVCC is the best thing since a guy put a round stone on a stick > and called it "the wheel", but I've seen several references on this list > about "indexes not being under MVCC" - at least that's how I read it, > the original posts were explaining why indexes can't be used for solving > MIN()/MAX()/COUNT() aggregates. Is this correct? > In particular, I'm trying to find out is there (b)locking involved when > concurrently updating and/or inserting records in an indexed table. My > guess is that, since PG does copy+delete on updating, even updating a > non-indexed field will require fixups in the index tree (to point to the > new record) and thus (b)locking. Short bits of blocking. The PostgreSQL index 'problem', is that indexes are conservative. They only guarantee to return at least as much data as you should see. They cannot be used to limit what you see to only as much as you should see. Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Linux: more cores = less concurrency.
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Scott Marlowe > Sent: Monday, April 11, 2011 1:29 PM > To: Glyn Astill > Cc: Kevin Grittner; Joshua D. Drake; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Linux: more cores = less concurrency. > > On Mon, Apr 11, 2011 at 12:23 PM, Glyn Astill > wrote: > > > > > > --- On Mon, 11/4/11, Joshua D. Drake wrote: > > > >> From: Joshua D. Drake > >> Subject: Re: [PERFORM] Linux: more cores = less concurrency. > >> To: "Kevin Grittner" > >> Cc: pgsql-performance@postgresql.org, "Glyn Astill" > > >> Date: Monday, 11 April, 2011, 19:12 > >> On Mon, 11 Apr 2011 13:09:15 -0500, > >> "Kevin Grittner" > >> > >> wrote: > >> > Glyn Astill > >> wrote: > >> > > >> >> The new server uses 4 x 8 core Xeon X7550 CPUs at > >> 2Ghz > >> > > >> > Which has hyperthreading. > >> > > >> >> our current servers are 2 x 4 core Xeon E5320 CPUs > >> at 2Ghz. > >> > > >> > Which doesn't have hyperthreading. > >> > > > > > Yep, off. If you look at the benchmarks I took, HT absoloutely killed > it. > > > >> > PostgreSQL often performs worse with hyperthreading > >> than without. > >> > Have you turned HT off on your new machine? If > >> not, I would start > >> > there. > >> > >> And then make sure you aren't running CFQ. > >> > >> JD > >> > > > > Not running CFQ, running the no-op i/o scheduler. > > Just FYI, in synthetic pgbench type benchmarks, a 48 core AMD Magny > Cours with LSI HW RAID and 34 15k6 Hard drives scales almost linearly > up to 48 or so threads, getting into the 7000+ tps range. With SW > RAID it gets into the 5500 tps range. Just wondering, which LSI card ? Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or some other config? -M > > -- > Sent via pgsql-performance mailing list (pgsql- > performa...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Linux: more cores = less concurrency.
> -Original Message- > From: Scott Marlowe [mailto:scott.marl...@gmail.com] > Sent: Monday, April 11, 2011 6:18 PM > To: mark > Cc: Glyn Astill; Kevin Grittner; Joshua D. Drake; pgsql- > performa...@postgresql.org > Subject: Re: [PERFORM] Linux: more cores = less concurrency. > > On Mon, Apr 11, 2011 at 6:05 PM, mark wrote: > > Just wondering, which LSI card ? > > Was this 32 drives in Raid 1+0 with a two drive raid 1 for logs or > some > > other config? > > We were using teh LSI but I'll be switching back to Areca when we > go back to HW RAID. The LSI only performed well if we setup 15 > RAID-1 pairs in HW and use linux SW RAID 0 on top. RAID1+0 in the > LSI was a pretty mediocre performer. Areca 1680 OTOH, beats it in > every test, with HW RAID10 only. Much simpler to admin. Interesting, thanks for sharing. I guess I have never gotten to the point where I felt I needed more than 2 drives for my xlogs. Maybe I have been dismissing that as a possibility something. (my biggest array is only 24 SFF drives tho) I am trying to get my hands on a dual core lsi card for testing at work. (either a 9265-8i or 9285-8e) don't see any dual core 6Gbps SAS Areca cards yetstill rocking a Arcea 1130 at home tho. -M -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] rant ? check the BBWC
So sometime along yellow brick firmware road HP changed (and maybe your vendor did too) the output of what happens when the write cache is off due to failed batteries attached to the card/cache. (and no they don't always beep with a self test in case someone happens to be walking near your cage, and yes you will be wondering why batteries attached to these things are not hot swappable.) So yeah -- even if you were pretty sure some code had properly checked with say nagios checks (or whatever) and were monitoring them... if someone hasn't replaced a dead battery in a while you should probably be wondering why. Now go forth and test/ your HA/DR plans, check you raid and UPS batteries, test your monitoring (again). -M -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Query improvement
Hi I have 3 tables page - revision - pagecontent CREATE TABLE mediawiki.page ( page_id serial NOT NULL, page_namespace smallint NOT NULL, page_title text NOT NULL, page_restrictions text, page_counter bigint NOT NULL DEFAULT 0, page_is_redirect smallint NOT NULL DEFAULT 0, page_is_new smallint NOT NULL DEFAULT 0, page_random numeric(15,14) NOT NULL DEFAULT random(), page_touched timestamp with time zone, page_latest integer NOT NULL, page_len integer NOT NULL, titlevector tsvector, page_type integer NOT NULL DEFAULT 0, CONSTRAINT page_pkey PRIMARY KEY (page_id) ); CREATE TABLE mediawiki.revision ( rev_id serial NOT NULL, rev_page integer, rev_text_id integer, rev_comment text, rev_user integer NOT NULL, rev_user_text text NOT NULL, rev_timestamp timestamp with time zone NOT NULL, rev_minor_edit smallint NOT NULL DEFAULT 0, rev_deleted smallint NOT NULL DEFAULT 0, rev_len integer, rev_parent_id integer, CONSTRAINT revision_rev_page_fkey FOREIGN KEY (rev_page) REFERENCES mediawiki.page (page_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT revision_rev_id_key UNIQUE (rev_id) ) CREATE TABLE mediawiki.pagecontent ( old_id integer NOT NULL DEFAULT nextval('mediawiki.text_old_id_seq'::regclass), old_text text, old_flags text, textvector tsvector, CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id) ) where i have query SELECT pa.page_id, pa.page_title, ts_rank(pc.textvector,(to_tsquery('fotbal')))+ts_rank(pa.titlevector,(to_tsquery('fotbal')))*10 as totalrank from mediawiki.page pa, mediawiki.revision re, mediawiki.pagecontent pc WHERE pa.page_id in (SELECT page_id FROM mediawiki.page WHERE page_id IN (SELECT page_id FROM mediawiki.page WHERE (titlevector @@ (to_tsquery('fotbal' OR page_id IN (SELECT p.page_id from mediawiki.page p,mediawiki.revision r, (SELECT old_id FROM mediawiki.pagecontent WHERE (textvector @@ (to_tsquery('fotbal' ss WHERE (p.page_id=r.rev_page AND r.rev_id=ss.old_id))) AND (pa.page_id=re.rev_page AND re.rev_id=pc.old_id) ORDER BY totalrank LIMIT 100; This query find out titles of pages in page and content in page content by full text search - @@ afterwards i count for the resulted id by ts_rank the relevance. Now the problem. When I try ANALYZE it shows: "Limit (cost=136568.00..136568.25 rows=100 width=185)" " -> Sort (cost=136568.00..137152.26 rows=233703 width=185)" "Sort Key: ((ts_rank(pc.textvector, to_tsquery('fotbal'::text)) + (ts_rank(pa.titlevector, to_tsquery('fotbal'::text)) * 10::double precision)))" "-> Hash Join (cost=61707.99..127636.04 rows=233703 width=185)" " Hash Cond: (re.rev_id = pc.old_id)" " -> Merge Join (cost=24098.90..71107.48 rows=233703 width=66)" "Merge Cond: (pa.page_id = re.rev_page)" "-> Merge Semi Join (cost=24096.98..55665.69 rows=233703 width=66)" " Merge Cond: (pa.page_id = mediawiki.page.page_id)" " -> Index Scan using page_btree_id on page pa (cost=0.00..13155.20 rows=311604 width=62)" " -> Index Scan using page_btree_id on page (cost=24096.98..38810.19 rows=233703 width=4)" "Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))" "SubPlan 1" " -> Bitmap Heap Scan on page (cost=10.41..900.33 rows=270 width=4)" "Recheck Cond: (titlevector @@ to_tsquery('fotbal'::text))" "-> Bitmap Index Scan on gin_index (cost=0.00..10.34 rows=270 width=0)" " Index Cond: (titlevector @@ to_tsquery('fotbal'::text))" "SubPlan 2" " -> Nested Loop (cost=1499.29..23192.08 rows=1558 width=4)" "-> Nested Loop (cost=1499.29..15967.11 rows=1558 width=4)" " -> Bitmap Heap Scan on pagecontent (cost=1499.29..6448.12 rows=1558 width=4)" "Recheck Cond: (textvector @@ to_tsquery('fotbal'::text))" "-> Bitmap Index Scan on gin_index2 (cost=0.00..1498.90 rows=1558 width=0)" " Index Cond: (textvector @@ to_tsquery('fotbal'::text))" " -> Index Scan using page_btree_rev_content_id on revision r (cost=0.00..6.10 rows=1 width=8)" "Index Cond: (r.rev_id = pagecontent.old_id)" "->
Re: [PERFORM] Query improvement
Here is EXPLAIN ANALYZE: "Limit (cost=136568.00..136568.25 rows=100 width=185) (actual time=1952.174..1952.215 rows=100 loops=1)" " -> Sort (cost=136568.00..137152.26 rows=233703 width=185) (actual time=1952.172..1952.188 rows=100 loops=1)" "Sort Key: ((ts_rank(pc.textvector, to_tsquery('fotbal'::text)) + (ts_rank(pa.titlevector, to_tsquery('fotbal'::text)) * 10::double precision)))" "Sort Method: top-N heapsort Memory: 23kB" "-> Hash Join (cost=61707.99..127636.04 rows=233703 width=185) (actual time=1046.838..1947.815 rows=3278 loops=1)" " Hash Cond: (re.rev_id = pc.old_id)" " -> Merge Join (cost=24098.90..71107.48 rows=233703 width=66) (actual time=200.884..859.453 rows=3278 loops=1)" "Merge Cond: (pa.page_id = re.rev_page)" "-> Merge Semi Join (cost=24096.98..55665.69 rows=233703 width=66) (actual time=200.843..629.821 rows=3278 loops=1)" " Merge Cond: (pa.page_id = mediawiki.page.page_id)" " -> Index Scan using page_btree_id on page pa (cost=0.00..13155.20 rows=311604 width=62) (actual time=0.027..145.989 rows=311175 loops=1)" " -> Index Scan using page_btree_id on page (cost=24096.98..38810.19 rows=233703 width=4) (actual time=200.779..429.219 rows=3278 loops=1)" "Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))" "SubPlan 1" " -> Bitmap Heap Scan on page (cost=10.41..900.33 rows=270 width=4) (actual time=0.748..9.845 rows=280 loops=1)" "Recheck Cond: (titlevector @@ to_tsquery('fotbal'::text))" "-> Bitmap Index Scan on gin_index (cost=0.00..10.34 rows=270 width=0) (actual time=0.586..0.586 rows=280 loops=1)" " Index Cond: (titlevector @@ to_tsquery('fotbal'::text))" "SubPlan 2" " -> Nested Loop (cost=1499.29..23192.08 rows=1558 width=4) (actual time=2.032..185.743 rows=3250 loops=1)" "-> Nested Loop (cost=1499.29..15967.11 rows=1558 width=4) (actual time=1.980..109.491 rows=3250 loops=1)" " -> Bitmap Heap Scan on pagecontent (cost=1499.29..6448.12 rows=1558 width=4) (actual time=1.901..36.583 rows=3250 loops=1)" "Recheck Cond: (textvector @@ to_tsquery('fotbal'::text))" "-> Bitmap Index Scan on gin_index2 (cost=0.00..1498.90 rows=1558 width=0) (actual time=1.405..1.405 rows=3250 loops=1)" " Index Cond: (textvector @@ to_tsquery('fotbal'::text))" " -> Index Scan using page_btree_rev_content_id on revision r (cost=0.00..6.10 rows=1 width=8) (actual time=0.020..0.021 rows=1 loops=3250)" "Index Cond: (r.rev_id = pagecontent.old_id)" "-> Index Scan using page_btree_id on page p (cost=0.00..4.62 rows=1 width=4) (actual time=0.022..0.022 rows=1 loops=3250)" " Index Cond: (p.page_id = r.rev_page)" "-> Index Scan using page_btree_rev_page_id on revision re (cost=0.00..11850.52 rows=311604 width=8) (actual time=0.012..166.042 rows=311175 loops=1)" " -> Hash (cost=27932.04..27932.04 rows=311604 width=127) (actual time=801.000..801.000 rows=311604 loops=1)" "Buckets: 1024 Batches: 64 Memory Usage: 744kB" "-> Seq Scan on pagecontent pc (cost=0.00..27932.04 rows=311604 width=127) (actual time=0.018..465.686 rows=311604 loops=1)" "Total runtime: 1952.962 ms" I have tried set enable_hashjoin = false; set enable_hashjoin = true; but the result have been worst than before. By the way is there a posibility to create beeter query with same effect? I have tried more queries, but this has got best performance yet. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4365717.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query improvement
Thanks for replies. Finally I have used UNION and JOINS, which helped. Mainly the UNION helped a lot. Now the query takes 1sec max. Thanks a lot. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378163.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query improvement
Thanks for reply both UNION and JOINS helped. Mainly the UNION helped a lot. Now the query takes 1sec max. Thanks a lot. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378157.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query improvement
Thanks a lot for reply. Finally I have used UNION, but thanks for your help. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378160.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] CLUSTER versus a dedicated table
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Robert James > Sent: Wednesday, June 01, 2011 5:55 PM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] CLUSTER versus a dedicated table > > Hi. I'm interested in understanding the differences between > CLUSTERing a table and making a dedicated one. > > We have a table with about 1 million records. On a given day, only > about 1% of them are of interest. That 1% changes every day (it's > WHERE active_date = today), and so we index and cluster on it. > > Even so, the planner shows a very large cost for the Index Scan: about > 3500. If I instead do a SELECT INTO temp_table FROM big_table WHERE > active_date = today, and then do SELECT * FROM temp_table, I get a > planned cost of 65. Yet, the actual time for both queries is almost > identical. > > Questions: > 1. Why is there such a discrepancy between the planner's estimate and > the actual cost? > > 2. In a case like this, will I in general see a performance gain by > doing a daily SELECT INTO and then querying from that table? My ad hoc > test doesn't indicate I would (despite the planner's prediction), and > I'd rather avoid this if it won't help. > > 3. In general, does CLUSTER provide all the performance benefits of a > dedicated table? If it doesn't, what does it lack? > > Thank you. Start here: http://wiki.postgresql.org/wiki/SlowQueryQuestions 1: there could be many reasons for the planner to come up a grossly inaccurate ESTIMATE for some values. Last time the table was analyzed, is usually where people start. 2: look at table partitioning it's pretty straight forward and sounds like it might be a good fit for you. It will however involve some triggers or rules and check constraints. Table partitioning has some downsides though, you should be aware of what they are before you commit to it. 3: clustering, from a high level, just reorders the data on disk by a given index. Depending on your setup keeping it close to that clustered ordering might be trivial or it might not be. Big tables are relative to different people, 1M rows might be a big table or it might not be, since you didn't post the size of the table and indexes we can only guess. Table partitioning helps most with table maintenance, IMO, but can be very useful it the constraint exclusion can eliminate a large number of child tables right off so it doesn't have to traverse large indexes or do lots of random IO. You will need to post at lot more specific info if you want more specific help. The guide to reporting slow queries or guide to reporting problems and start gathering specific information and then post back to the list. -Mark > > -- > Sent via pgsql-performance mailing list (pgsql- > performa...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] not exits slow compared to not in. (nested loops killing me)
Hi all, I am trying to speed up a query on a DB I inherited and I am falling flat on my face . I changed a query from NOT IN to use NOT EXISTS and my query time went from 19000ms to several hours (~5000 ms). this shocked me so much I pretty much had to post. This seems like a corner case of the planner not knowing that the nested-loops are going to turn out badly in this case. The planner choosing a 13hr nested loop here is basically the reason I am posting. I have played around with rewriting this query using some CTEs and a left join but thus far my results are not encouraging. Given what little I know , it seems like a LEFT JOIN where right_table.col is null gets the same performance and estimates as a NOT EXISTS. (and still picks a nested loop in this case) I can see where it all goes to hell time wise, turning off nested loops seems to keep it from running for hours for this query, but not something I am looking to do globally. The time is not really that much better than just leaving it alone with a NOT IN. two queries are at http://pgsql.privatepaste.com/a0b672bab0# the "pretty" explain versions : NOT IN (with large work mem - 1GB) http://explain.depesz.com/s/ukj NOT IN (with only 64MB for work_mem) http://explain.depesz.com/s/wT0 NOT EXISTS (with 64MB of work_mem) http://explain.depesz.com/s/EuX NOT EXISTS (with nested loop off. and 64MB of work_mem) http://explain.depesz.com/s/UXG LEFT JOIN/CTE (with nested loop off and 1GB of work_mem) http://explain.depesz.com/s/Hwm table defs, with estimated row counts (which all 100% match exact row count) http://pgsql.privatepaste.com/c2ff39b653 tried running an analyze across the whole database, no affect. I haven't gotten creative with explicit join orders yet . postgresql 9.0.2. willing to try stuff for people as I can run things on a VM for days and it is no big deal. I can't do that on production machines. thoughts ? ideas ? -Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] not exits slow compared to not in. (nested loops killing me)
> -Original Message- > From: Craig Ringer [mailto:cr...@postnewspapers.com.au] > Sent: Monday, June 06, 2011 5:08 PM > To: mark > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] not exits slow compared to not in. (nested loops > killing me) > > On 06/07/2011 04:38 AM, mark wrote: > > > NOT EXISTS (with 64MB of work_mem) > > http://explain.depesz.com/s/EuX > > Hash Anti Join (cost=443572.19..790776.84 rows=1 width=1560) > (actual time=16337.711..50358.487 rows=2196299 loops=1) > > Note the estimated vs actual rows. Either your stats are completely > ridiculous, or the planner is confused. I am starting to think the planner might be confused in 9.0.2. I got a reasonable query time, given resource constraints, on a very small VM on my laptop running 9.0.4. I am going to work on getting the vm I was using to test this with up to 9.0.4 and test again. There is a note in the 9.0.4 release notes " Improve planner's handling of semi-join and anti-join cases (Tom Lane)" Not sure that is the reason I got a much better outcome with a much smaller vm. But once I do some more testing I will report back. > > What are your stats target levels? Have you tried increasing the stats > levels on the table(s) or at least column(s) affected? Or tweaking > default_statistics_target if you want to use a bigger hammer? Will try that as well. Currently the default stat target is 100. Will try at 250, and 500 and report back. > > Is autovacuum being allowed to do its work and regularly ANALYZE the > database? Does an explicit 'ANALYZE' help? Auto vac is running, I have explicitly vacuum & analyzed the whole db. That didn't change anything. > > -- > Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] benchmark woes and XFS options
Hello PG perf junkies, Sorry this may get a little long winded. Apologies if the formatting gets trashed. Also apologies if this double posts. (I originally set it yesterday with the wrong account and the message is stalled - so my bad there) if someone is a mod and it's still in the wait queue feel free to remove them. Short version: my zcav and dd tests look to get ->CPU bound<-. Yes CPU bound, with junk numbers. My numbers in ZCAV are flat like and SSD which is odd for 15K rpm disks. I am not sure what the point of moving further would be given these unexpected poor numbers. Well I knew 12 disks wasn't going to be something that impressed me, I am used to 24, but I was expecting about 40-50% better than what I am getting. Background: I have been setting up some new servers for PG and I am getting some odd numbers with zcav, I am hoping a second set of eyes here can point me in the right direction. (other tests like bonniee++ (1.03e) and dd also give me odd (flat and low) numbers) I will preface this with, yes I bought greg's book. Yes I read it, and it has helped me in the past, but seem to have hit an oddity. (hardware,os, and config stuff listed at the end) Long version: In the past when dealing with storage I typically see a large gain with moving from ext3 to XFS, provided I set readahead to 16384 on either filesystem. I also see typical down ward trends in the MB/s (expected) and upward trends in access times (expected) with either file system. These blades + storage-blades are giving me atypical results . I am not seeing a dramatic down turn in MB/s in zcav nor am I seeing access time really increase. (something I have only seen before when I forget to have readahead set high enough) things are just flat at about 420MB/s in zcav @ .6ms for access time with XFS and ~470MB/s @.56ms for ext3. FWIW I get worthless results with zcav and bonnie++ using 1.03 or 1.96 sometimes, which isn't something I have had happen before even though greg does mention it. Also when running zcav I will see kswapdX (0 and 1 in my two socket case) start to eat significant cpu time (~40-50% each), with dd - kswapd and pdflush become very active as well. This only happens once free mem gets low. As well zcav or dd looks to get CPU bound at 100% while i/o wait stays almost at 0.0 most of the time. (iostat -x -d shows util % at 98% though). I see this with either XFS or ext3. Also when I cat /proc/zoneinfo it looks like I am getting heavy contention for a single page in DMA while the tests are running. (see end of email for zoneinfo) Bonnie is giving me 99% cpu usage reported. Watching it while running it bounces between 100 and 99. Kswap goes nuts here as well. I am lead to believe that I may need a 2.6.32 (rhel 6.1) or higher kernel to see some of the kswapd issues go away. (testing that hopefully later this week). Maybe that will take care of everything. I don't know yet. Side note: Setting vm.swappiness to 10 (or 0) doesn't help, although others on the RHEL support site indicated it did fix kswap issues for them. Running zcav on my home system (4 disk raid 1+0 3ware controller +BBWC using ext4 ubunut 2.6.38-8 I don't see zcav near 100% and I see lots of i/o wait as expected, and my zoneinfo for DMA doesn't sit at 1) Not going to focus too much on ext3 since I am pretty sure I should be able to get better numbers from XFS. With mkfs.xfs I have done some reading and it appears that it can't automatically read the stripsize (aka stripe size to anyone other than HP) or the number of disks. So I have been using the following: mkfs.xfs -b size=4k -d su=256k,sw=6,agcount=256 (256K is the default hp stripsize for raid1+0, I have 12 disks in raid 10 so I used sw=6, agcount of 256 because that is a (random) number I got from google that seemed in the ball park.) which gives me: meta-data=/dev/cciss/c0d0isize=256agcount=256, agsize=839936 blks = sectsz=512 attr=2 data = bsize=4096 blocks=215012774, imaxpct=25 = sunit=64 swidth=384 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 (if I don't specif the agcount or su,sw stuff I get meta-data=/dev/cciss/c0d0isize=256agcount=4, agsize=53753194 blks = sectsz=512 attr=2 data = bsize=4096 blocks=215012774, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=
Re: [PERFORM] benchmark woes and XFS options
> -Original Message- > From: Greg Smith [mailto:g...@2ndquadrant.com] > Sent: Monday, August 08, 2011 9:42 PM > To: mark > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] benchmark woes and XFS options > > I think your notion that you have an HP CCISS driver in this older > kernel that just doesn't drive your card very fast is worth exploring. > What I sometimes do in the situation you're in is boot a Linux > distribution that comes with a decent live CD, such as Debian or > Ubuntu. Just mount the suspect drive, punch up read-ahead, and re-test > performance. That should work well enough to do a simple dd test, and > probably well enough to compile and run bonnie++ too. If that gives > good performance numbers, it should narrow the list of possible causes > considerably. You really need to separate out "bad driver" from the > other possibilities here given what you've described, and that's a low > impact way to do it. > > -- > Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD > PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us Thanks Greg. I will try and give HPSA a whirl and report back. Both with single disk and the whole raid set. I am out of the office this week so I might have some delay before I can do some more detective work. I don't think I have any gear that won't require either the CCISS or HPSA driver and in SFF drives. But will try and look around. -Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] XFS options and benchmark woes
Hello PG perf junkies, Sorry this may get a little long winded. Apologies if the formatting gets trashed. Background: I have been setting up some new servers for PG and I am getting some odd numbers with zcav, I am hoping a second set of eyes here can point me in the right direction. (other tests like bonniee++ (1.03e) and dd also give me odd (flat and low) numbers) I will preface this with, yes I bought greg's book. Yes I read it, and it has helped me in the past, but seem to have hit an oddity. (hardware,os, and config stuff listed at the end) Short version: my zcav and dd tests look to get I/O bound. My numbers in ZCAV are flat like and SSD which is odd for 15K rpm disks. Long version: In the past when dealing with storage I typically see a large gain with moving from ext3 to XFS, provided I set readahead to 16384 on either filesystem. I also see typical down ward trends in the MB/s (expected) and upward trends in access times (expected) with either file system. These blades + storage-blades are giving me atypical results . I am not seeing a dramatic down turn in MB/s in zcav nor am I seeing access time really increase. (something I have only seen before when I forget to have readahead set high enough) things are just flat at about 420MB/s in zcav @ .6ms for access time with XFS and ~470MB/s @.56ms for ext3. FWIW I get worthless results with zcav and bonnie++ using 1.03 or 1.96 sometimes, which isn't something I have had happen before even though greg does mention it. Also when running zcav I will see kswapdX (0 and 1 in my two socket case) start to eat significant cpu time (~40-50% each), with dd - kswapd and pdflush become very active as well. This only happens once free mem gets low. As well zcav or dd looks to get CPU bound at 100% while i/o wait stays almost at 0.0 most of the time. (iostat -x -d shows util % at 98% though). I see this with either XFS or ext3. Also when I cat /proc/zoneinfo it looks like I am getting heavy contention for a single page in DMA while the tests are running. (see end of email for zoneinfo) Bonnie is giving me 99% cpu usage reported. Watching it while running it bounces between 100 and 99. Kswap goes nuts here as well. I am lead to believe that I may need a 2.6.32 (rhel 6.1) or higher kernel to see some of the kswapd issues go away. (testing that hopefully later this week). Maybe that will take care of everything. I don't know yet. Side note: Setting vm.swappiness to 10 (or 0) doesn't help, although others on the RHEL support site indicated it did fix kswap issues for them. Running zcav on my home system (4 disk raid 1+0 3ware controller +BBWC using ext4 ubunut 2.6.38-8 I don't see zcav near 100% and I see lots of i/o wait as expected, and my zoneinfo for DMA doesn't sit at 1) Not going to focus too much on ext3 since I am pretty sure I should be able to get better numbers from XFS. With mkfs.xfs I have done some reading and it appears that it can't automatically read the stripsize (aka stripe size to anyone other than HP) or the number of disks. So I have been using the following: mkfs.xfs -b size=4k -d su=256k,sw=6,agcount=256 (256K is the default hp stripsize for raid1+0, I have 12 disks in raid 10 so I used sw=6, agcount of 256 because that is a (random) number I got from google that seemed in the ball park.) which gives me: meta-data=/dev/cciss/c0d0isize=256agcount=256, agsize=839936 blks = sectsz=512 attr=2 data = bsize=4096 blocks=215012774, imaxpct=25 = sunit=64 swidth=384 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 (if I don't specif the agcount or su,sw stuff I get meta-data=/dev/cciss/c0d0isize=256agcount=4, agsize=53753194 blks = sectsz=512 attr=2 data = bsize=4096 blocks=215012774, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0) ) So it seems like I should be giving it the extra parameters at mkfs.xfs time... could someone confirm ? In the past I have never specified the su or sw or ag groups I have taken the defaults. But since I am getting odd numbers here I started playing with em. Getting little or no change. for mounting: logbufs=8,noatime,nodiratime,nobarrier,inode64,allocsize=16m (I know that noatime also means nodiratime according xfs.org, but in the past I seem to get better numbers when
Re: [PERFORM] XFS options and benchmark woes
> -Original Message- > From: mark [mailto:m...@sm-a.net] > Sent: Monday, August 08, 2011 12:15 AM > To: 'pgsql-performance@postgresql.org' > Subject: XFS options and benchmark woes > > Hello PG perf junkies, > > > Sorry this may get a little long winded. Apologies if the formatting > gets trashed. > > > > Background: > > I have been setting up some new servers for PG and I am getting some > odd numbers with zcav, I am hoping a second set of eyes here can point > me in the right direction. (other tests like bonniee++ (1.03e) and dd > also give me odd (flat and low) numbers) > > I will preface this with, yes I bought greg's book. Yes I read it, and > it has helped me in the past, but seem to have hit an oddity. > > (hardware,os, and config stuff listed at the end) > > > > > > Short version: my zcav and dd tests look to get I/O bound. My numbers > in ZCAV are flat like and SSD which is odd for 15K rpm disks. uggg, ZCAV numbers appear to be CPU bound. Not i/o . > > > > > Long version: > > > In the past when dealing with storage I typically see a large gain with > moving from ext3 to XFS, provided I set readahead to 16384 on either > filesystem. > > I also see typical down ward trends in the MB/s (expected) and upward > trends in access times (expected) with either file system. > > > These blades + storage-blades are giving me atypical results . > > > I am not seeing a dramatic down turn in MB/s in zcav nor am I seeing > access time really increase. (something I have only seen before when I > forget to have readahead set high enough) things are just flat at about > 420MB/s in zcav @ .6ms for access time with XFS and ~470MB/s @.56ms for > ext3. > > FWIW I get worthless results with zcav and bonnie++ using 1.03 or 1.96 > sometimes, which isn't something I have had happen before even though > greg does mention it. > > > Also when running zcav I will see kswapdX (0 and 1 in my two socket > case) start to eat significant cpu time (~40-50% each), with dd - > kswapd and pdflush become very active as well. This only happens once > free mem gets low. As well zcav or dd looks to get CPU bound at 100% > while i/o wait stays almost at 0.0 most of the time. (iostat -x -d > shows util % at 98% though). I see this with either XFS or ext3. Also > when I cat /proc/zoneinfo it looks like I am getting heavy contention > for a single page in DMA while the tests are running. (see end of email > for zoneinfo) > > Bonnie is giving me 99% cpu usage reported. Watching it while running > it bounces between 100 and 99. Kswap goes nuts here as well. > > > I am lead to believe that I may need a 2.6.32 (rhel 6.1) or higher > kernel to see some of the kswapd issues go away. (testing that > hopefully later this week). Maybe that will take care of everything. I > don't know yet. > > Side note: Setting vm.swappiness to 10 (or 0) doesn't help, although > others on the RHEL support site indicated it did fix kswap issues for > them. > > > > Running zcav on my home system (4 disk raid 1+0 3ware controller +BBWC > using ext4 ubunut 2.6.38-8 I don't see zcav near 100% and I see lots of > i/o wait as expected, and my zoneinfo for DMA doesn't sit at 1) > > Not going to focus too much on ext3 since I am pretty sure I should be > able to get better numbers from XFS. > > > > With mkfs.xfs I have done some reading and it appears that it can't > automatically read the stripsize (aka stripe size to anyone other than > HP) or the number of disks. So I have been using the following: > > mkfs.xfs -b size=4k -d su=256k,sw=6,agcount=256 > > (256K is the default hp stripsize for raid1+0, I have 12 disks in raid > 10 so I used sw=6, agcount of 256 because that is a (random) number I > got from google that seemed in the ball park.) > > > > > > > which gives me: > meta-data=/dev/cciss/c0d0isize=256agcount=256, > agsize=839936 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=215012774, > imaxpct=25 > = sunit=64 swidth=384 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=32768, version=2 > = sectsz=512 sunit=64 blks, lazy- > count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > > (if I don't specif the agcount or su,sw stuff I get > meta-data=/dev/cciss/c0d0isize=256agcount=4, > agsize=53753194 blks > =
Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Greg Smith > Sent: Wednesday, August 17, 2011 3:18 PM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Raid 5 vs Raid 10 Benchmarks Using bonnie++ > > On 08/17/2011 02:26 PM, Ogden wrote: > > I am using bonnie++ to benchmark our current Postgres system (on RAID > > 5) with the new one we have, which I have configured with RAID 10. > The > > drives are the same (SAS 15K). I tried the new system with ext3 and > > then XFS but the results seem really outrageous as compared to the > > current system, or am I reading things wrong? > > > > The benchmark results are here: > > http://malekkoheavyindustry.com/benchmark.html > > Congratulations--you're now qualified to be a member of the "RAID5 > sucks" club. You can find other members at > http://www.miracleas.com/BAARF/BAARF2.html Reasonable read speeds and > just terrible write ones are expected if that's on your old hardware. > Your new results are what I would expect from the hardware you've > described. > > The only thing that looks weird are your ext4 "Sequential Output - > Block" results. They should be between the ext3 and the XFS results, > not far lower than either. Normally this only comes from using a bad > set of mount options. With a battery-backed write cache, you'd want to > use "nobarrier" for example; if you didn't do that, that can crush > output rates. > To clarify maybe for those new at using non-default mount options. With XFS the mount option is nobarrier. With ext4 I think it is barrier=0 Someone please correct me if I am misleading people or otherwise mistaken. -mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID Controller (HP P400) beat by SW-RAID?
>From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Anthony Presley >Sent: Sunday, September 11, 2011 4:45 PM >To: pgsql-performance@postgresql.org >Subject: [PERFORM] RAID Controller (HP P400) beat by SW-RAID? >We've currently got PG 8.4.4 running on a whitebox hardware set up, with (2) 5410 Xeon's, and 16GB of RAM. It's also got (4) 7200RPM SATA drives, using the onboard IDE controller and ext3. >A few weeks back, we purchased two refurb'd HP DL360's G5's, and were hoping to set them up with PG 9.0.2, running replicated. These machines have (2) 5410 Xeon's, 36GB of RAM, (6) 10k SAS drives, and are using the HP SA P400i with 512MB of BBWC. PG is running on an ext4 (noatime) partition, and they drives configured as RAID 1+0 (seems with this controller, I cannot do JBOD). I've spent a few hours going back and forth benchmarking the new systems, and have set up the DWC, and the accelerator cache using hpacucli. I've tried accelerator caches of 25/75, 50/50, and 75/25. > I would start of by recommending a more current version of 9.0...like 9.0.4 since you are building a new box. The rumor mill says 9.0.5 and 9.1.0 might be out soon (days?). but that is just rumor mill. Don't bank on it. What kernel are you on ? Long time HP user here, for better and worse... so here are a few other little things I recommend. Check the bios power management. Make sure it is set where you want it. (IIRC the G5s have this, I know G6s and G7s do). This can help with nasty latency problems if the box has been idle for a while then needs to start doing work. The p400i is not a great card, compared to more modern one, but you should be able to beat the old setup with what you have. Faster clocked cpu's more spindles, faster RPM spindles. Assuming the battery is working, with XFS or ext4 you can use nobarrier mount option and you should see some improvement. Make sure the raid card's firmware is current. I can't stress this enough. HP fixed a nasty bug with Raid 1+0 a few months ago where you could eat your data... They also seem to be fixing a lot of other bugs along the way as well. So do yourself a big favor and make sure that firmware is current. It might just head off headache down the road. Also make sure you have a 8.10.? (IIRC the version number right) or better version of hpacucli... there have been some fixes to that utility as well. IIRC most of the fixes in this have been around recognizing newere cards (812s and 410s) but some interface bugs have been fixed as well. You may need new packages for HP health. (I don't recall the official name, but new versions if hpacucli might not play well with old versions of hp health. Its HP so they have a new version about every month for firmware and their cli utility... thats HP for us. Anyways that is my fast input. Best of luck, -Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg9 replication over WAN ?
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Ben Ciceron > Sent: Wednesday, October 05, 2011 3:53 PM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] pg9 replication over WAN ? > > Hello, > > Has any performance or evaluation done for pg9.x streaming replication > over WAN ? > How adequate is the protocol to push WALs over long distance ? > Any best practice tuning wal_* for WAN ? > > Cheers, > Ben- Works for us between the middle of the US and the UK. This on some databases that do a few gig of log files per day. I keep a large number of wal_keep_segments because I have the disk capacity for it and the initial rsync takes a while at ~2-4MB/s. You may find you need to tune the tcp-keep alive settings, but I am not sure if that was really a change in the setting or a fix to our load balancers that fixed an issue I was seeing. Overall I am extremely pleased with streaming replication + read only hot standby. -Mark > > -- > Sent via pgsql-performance mailing list (pgsql- > performa...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Deferred constraints performance impact ?
Hi all, We have put some deferred constraints (some initially immediate, some initially deferred) into our database for testing with our applications. I haven't seen any noticeable loss in performance, but I am not sure I can properly simulate our full production environment load levels in my tests. I was wondering if I am turning on something that has known significant negative impacts to performance, just from having them there. I understand a lot more may have to be tracked through a transaction and there could be some impact from that. Similar to an after update trigger? Or are the two not comparable in terms of impact from what is tracked and then checked. Anyways, just looking for feedback if anyone has any. -Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete query takes exorbitant amount of time
Tom Lane Wrote: > Hmm ... I wonder how hard it would be to teach EXPLAIN ANALYZE to show > the runtime expended in each trigger when the statement is of a kind > that has triggers. We couldn't break down the time *within* the > triggers, but even this info would help a lot in terms of finger > pointing ... > > Seq Scan on ... (nn.nnn ms) > Trigger foo: nn.mmm ms > Trigger bar: nn.mmm ms > Total time: nn.mmm ms So I got the latest from CVS on Friday night to see how hard it would be to implement this, but it turns out that Tom has already committed the improvement, so I'm in Tom's fan club today. I imported my test dataset and was almost immediately able to track down the cause of my performance problem. Thanks! Mark Lewis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Correcting Hash Join Estimates
Hello! I posted a similar question to this one about a month ago; but, for some reason, it never seemed to be broadcast eventhough it ended up in the archives. So, since I'm still struggling with this, I thought I'd repost... I'm trying to optimize a query and the EXPLAIN ANALYZE (see link below) shows that some hash join row estimates are wrong by a factor of 2-3, and upwards of 7-8. There is a corresponding mis-estimation of the amount of time taken for these steps. The database is vacuum analyzed nightly by a cron job. How would I go about tightening up these errors? I suspect that I need to SET STATISTIC on some columns, but how can I tell which columns? Any help would be appreciated. WinXP (dual Xeon 1.2GB RAM) PgSQL 8.0.1 Explain Analyze: <http://www.indeq.com/EA.txt> View Definition: <http://www.indeq.com/VGAUA.txt> The largest table contains about 10,000 rows. All tables have indexes on their foreign keys. Thanks! Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Correcting Hash Join Estimates
On Apr 4, 2005, at 12:54 AM, Tom Lane wrote: [EMAIL PROTECTED] writes: I'm trying to optimize a query and the EXPLAIN ANALYZE (see link below) shows that some hash join row estimates are wrong by a factor of 2-3, and upwards of 7-8. I doubt that improving those estimates would lead to markedly better results. You need to think about improving the view design instead. What context is this view used in --- do you just do "select * from view_get_all_user_award2", or are there conditions added to it, or perhaps it gets joined with other things? Yes. I forgot to show how the query is executed... select * from view_get_all_user_award2 where person_id = 1; Do you really need the DISTINCT constraint? Yes. Do you really need the ORDER BY? The customer wants an initial ordering in the displayed data. Can you simplify the WHERE clause at all? I originally had a bunch of LEFT JOINs. After reading Tow's "SQL Tuning", I was hoping to steer the planner into a more "optimal" plan by using a large where clause instead and doing the joins there (I think they're called implicit joins). I was able to shave a couple of hundred milliseconds off the execution time by doing this. Half a second sounds pretty decent to me for a ten-way join with a WHERE clause as unstructured as that. If you really need it to execute in way less time, you're probably going to have to rethink your data representation to make the query simpler. Unfortunately, I'm not sure I can restructure the data. I did consider materialized views. However, they couldn't be lazy and that seemed like a lot of extra work for the backend for very little improvement. If this sounds like decent performance to you... I guess I can just tell the complainers that it's as good as it's going to get (barring a major hardware upgrade...). Thanks! Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [NOVICE] Many connections lingering
If there are potentially hundreds of clients at a time, then you may be running into the maximum connection limit. In postgresql.conf, there is a max_connections setting which IIRC defaults to 100. If you try to open more concurrent connections to the backend than that, you will get a connection refused. If your DB is fairly gnarly and your performance needs are minimal it should be safe to increase max_connections. An alternative approach would be to add some kind of database broker program. Instead of each agent connecting directly to the database, they could pass their data to a broker, which could then implement connection pooling. -- Mark Lewis On Tue, 2005-04-12 at 22:09, Slavisa Garic wrote: > This is a serious problem for me as there are multiple users using our > software on our server and I would want to avoid having connections > open for a long time. In the scenario mentioned below I haven't > explained the magnitute of the communications happening between Agents > and DBServer. There could possibly be 100 or more Agents per > experiment, per user running on remote machines at the same time, > hence we need short transactions/pgsql connections. Agents need a > reliable connection because failure to connect could mean a loss of > computation results that were gathered over long periods of time. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] PLM pulling from CVS nightly for testing in STP
Hi all, Just wanted everyone to know what we're pulling CVS HEAD nightly so it can be tested in STP now. Let me know if you have any questions. Tests are not automatically run yet, but I hope to remedy that shortly. For those not familiar with STP and PLM, here are a couple of links: STP http://www.osdl.org/stp/ PLM http://www.osdl.org/plm-cgi/plm Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PLM pulling from CVS nightly for testing in STP
On Wed, Apr 13, 2005 at 11:35:36AM -0700, Josh Berkus wrote: > Mark, > > > Just wanted everyone to know what we're pulling CVS HEAD nightly so it > > can be tested in STP now. Let me know if you have any questions. > > Way cool.How do I find the PLM number? How are you nameing these? The naming convention I'm using is postgresql-MMDD, for example postgresql-20050413, for the anonymous cvs export from today (April 13). I have a cronjob that'll do the export at 1AM PST8PDT. The search page for the PLM numbers is here: https://www.osdl.org/plm-cgi/plm?module=search or you can use the stpbot on linuxnet.mit.edu#osdl. Mark ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] PLM pulling from CVS nightly for testing in STP
I have dbt-2 tests automatically running against each pull from CVS and have started to automatically compile results here: http://developer.osdl.org/markw/postgrescvs/ I did start with a bit of a minimalistic approach, so I'm open for any comments, feedback, etc. Mark ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Kernel Resources and max_connections
Chris Hebrard wrote: kern.ipc.shmmax and kern.ipc.shmmin will not stay to what I set them to. What am I doing wrong or not doing at all? These need to go in /etc/sysctl.conf. You might need to set shmall as well. (This not-very-clear distinction between what is sysctl'abe and what is a kernel tunable is a bit of a downer). cheers Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Kernel Resources and max_connections
Chris Hebrard wrote: I set the values in etc/sysctl.conf: # $FreeBSD: src/etc/sysctl.conf,v 1.1.2.3 2002/04/15 00:44:13 dougb Exp $ # # This file is read when going to multi-user and its contents piped thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. # # Added by IMP 2005-05-04 net.inet.tcp.rfc1323=1 kern.ipc.somaxconn=1024 kern.ipc.maxsockbuf=8388608 net.inet.tcp.sendspace=3217968 net.inet.tcp.recvspace=3217968 kern.ipc.semmns="272" kern.ipc.semmni="256" kern.ipc.shmmax="66099200" kern.ipc.shmmin="256" After a restart both shmmax and shmmin are now 0 and postgres failed to start. Hmmm - puzzling. One point to check, did you take them out of /boot/loader.conf ? Assuming so, maybe don't quote 'em (see below). Finally you need to to set shmall, otherwise it will over(under)ride the shmmax setting. So try: net.inet.tcp.rfc1323=1 kern.ipc.somaxconn=1024 kern.ipc.maxsockbuf=8388608 net.inet.tcp.sendspace=3217968 net.inet.tcp.recvspace=3217968 kern.ipc.semmns=272 kern.ipc.semmni=256 kern.ipc.shmmax=66099200 kern.ipc.shmmin=256 kern.ipc.shmall=32768 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL
If the original paper was published in 1984, then it's been more than 20 years. Any potential patents would already have expired, no? -- Mark Lewis On Tue, 2005-05-10 at 14:35, Mischa Sandberg wrote: > Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > > > Well, in a hash-join right now you normally end up feeding at least > > one > > side of the join with a seqscan. Wouldn't it speed things up > > considerably if you could look up hashes in the hash index instead? > > You might want to google on "grace hash" and "hybrid hash". > > The PG hash join is the simplest possible: build a hash table in memory, > and match an input stream against it. > > *Hybrid hash* is where you spill the hash to disk in a well-designed > way. Instead of thinking of it as building a hash table in memory, think > of it as partitioning one input; if some or all of it fits in memory, > all the better. The boundary condition is the same. > > The real wizard of hybrid hash has to be Goetz Graefe, who sadly has now > joined the MS Borg. He demonstrated that for entire-table joins, hybrid > hash completely dominates sort-merge. MSSQL now uses what he developed > as an academic, but I don't know what the patent state is. > > "Grace hash" is the original implementation of hybrid hash: > Kitsuregawa, M., Tanaka, H., and Moto-oka, T. (1984). > Architecture and Performance of Relational Algebra Machine Grace. > > > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Select performance vs. mssql
Hi, I have some experience with MSSQL and am examining PostgreSQL. I'm running under Windows. I like what I see so far, but I'm hoping for some performance advice: 1. My test database has 7 million records. 2. There are two columns - an integer and a char column called Day which has a random value of Mon or Tues, etc. in it. 3. I made an index on Day. My query is: select count(*) from mtable where day='Mon' Results: 1. P3 600 512MB RAM MSSQL. It takes about 4-5 secs to run. If I run a few queries and everything is cached, it is sometimes just 1 second. 2. Athlon 1.3 Ghz 1GB RAM. PostgreSQL takes 7 seconds. I have played with the buffers setting and currently have it at 7500. At 2 it took over 20 seconds to run. 5 seconds vs 7 isn't that big of a deal, but 1 second vs 7 seconds is. Also, the slower performance is with much lesser hardware. Any ideas to try? Thanks much, Mark __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select performance vs. mssql
> Post the result of this for us: > > explain analyze select count(*) from mtable where > day='Mon'; > > On both machines. Hi Chris -- PostgreSQL Machine: "Aggregate (cost=140122.56..140122.56 rows=1 width=0) (actual time=24516.000..24516.000 rows=1 loops=1)" " -> Index Scan using "day" on mtable (cost=0.00..140035.06 rows=35000 width=0) (actual time=47.000..21841.000 rows=1166025 loops=1)" "Index Cond: ("day" = 'Mon'::bpchar)" "Total runtime: 24516.000 ms" (Note this took 24 seconds after fresh reboot, next execution was 11, and execution without explain analyze was 6.7 seconds) MSSQL Machine: That "Explain Analyze" command doesn't work for MSSQL, but I did view the Query plan. 97% of it was "Scanning a particular range of rows from a nonclustered index" Thanks for your help --Mark __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Select performance vs. mssql
First, thanks for all the helpful replies. I've listened to the suggestions and done some more digging and have results: I did show_plan_all in MSSQL and found that it was doing an Index Scan. I've read someplace that if the data you need is all in the index, then MSSQL has a feature/hack where it does not have to go to the table, it can do my COUNT using the index alone. I think this explains the 1 second query performance. I changed the query to also include the other column which is not indexed. The results were MSSQL now used a TableScan and was MUCH slower than PostgreSQL. I clustered the index on MSSQL and PostgreSQL and increased buffers to 15000 on PGSQL. I saw a noticeable performance increase on both. On the more complicated query, PostgreSQL is now 3.5 seconds. MSSQL is faster again doing an index scan and is at 2 seconds. Remember the MSSQL machine has a slower CPU as well. My interpretations: --Given having to do a table scan, PostgreSQL seems to be faster. The hardware on my PostrgreSQL machine is nicer than the MSSQL one, so perhaps they are just about the same speed with speed determined by the disk. --Tuning helps. Clustered index cut my query time down. More buffers helped. --As Chris pointed out, how real-world is this test? His point is valid. The database we're planning will have a lot of rows and require a lot of summarization (hence my attempt at a "test"), but we shouldn't be pulling a million rows at a time. --MSSQL's ability to hit the index only and not having to go to the table itself results in a _big_ performance/efficiency gain. If someone who's in development wants to pass this along, it would be a nice addition to PostgreSQL sometime in the future. I'd suspect that as well as making one query faster, it would make everything else faster/more scalable as the server load is so much less. Thanks again, Mark __ Do you Yahoo!? Yahoo! Small Business - Try our new Resources site http://smallbusiness.yahoo.com/resources/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Select performance vs. mssql
I'm far from an expert, so this may be off-base... but perhaps a suggestion would be to allow a hint to be sent to the optimizer if the user doesn't care that the result is "approximate" maybe then this wouldn't require adding more overhead to the indexes. MSSQL has something like this with (nolock) i.e. select count(*) from customers (nolock) where name like 'Mark%' Regardless, I'm very impressed with PostgreSQL and I think we're moving ahead with it. Mark --- Bruno Wolff III <[EMAIL PROTECTED]> wrote: > On Tue, May 24, 2005 at 08:36:36 -0700, > mark durrant <[EMAIL PROTECTED]> wrote: > > > > --MSSQL's ability to hit the index only and not > having > > to go to the table itself results in a _big_ > > performance/efficiency gain. If someone who's in > > development wants to pass this along, it would be > a > > nice addition to PostgreSQL sometime in the > future. > > I'd suspect that as well as making one query > faster, > > it would make everything else faster/more scalable > as > > the server load is so much less. > > This gets brought up a lot. The problem is that the > index doesn't include > information about whether the current transaction > can see the referenced > row. Putting this information in the index will add > significant overhead > to every update and the opinion of the developers is > that this would be > a net loss overall. __ Do you Yahoo!? Yahoo! Small Business - Try our new Resources site http://smallbusiness.yahoo.com/resources/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] tuning
list wrote: hi- i would like to see if someone could recommend something to make my query run faster. Values in postgresql.conf: shared_buffers = 1000 sort_mem is commented out effective_cache_size is commented out random_page_cost is commented out I would increase shared_buffers (say 5000 - 1), and also effective_cache_size (say around 2 - 5 - but work out how much memory this box has free or cached and adjust accordingly). From your explain output, it looks like sorting is not too much of a problem - so you can leave it unchanged (for this query anyway). Here is the query in question: select * from productvendorview where (productlistid=3 or productlistid=5 or productlistid=4) and (lower(item) like '9229%' or lower(descrip) like 'toner%') order by vendorname,item limit 100; You might want to break this into 2 queries and union them, so you can (potentially) use the indexes on productlistid,lower(item) and productlistid, lower(descrip) separately. This query scans 412,457 records. Here is the EXPLAIN ANALYZE for the query: Limit (cost=45718.83..45719.08 rows=100 width=108) (actual time=39093.636..39093.708 rows=100 loops=1) -> Sort (cost=45718.83..45727.48 rows=3458 width=108) (actual time=39093.629..39093.655 rows=100 loops=1) Sort Key: v.vendorname, p.item -> Hash Join (cost=22.50..45515.57 rows=3458 width=108) (actual time=95.490..39062.927 rows=2440 loops=1) Hash Cond: ("outer".vendorid = "inner".id) -> Seq Scan on test p (cost=0.00..45432.57 rows=3457 width=62) (actual time=89.066..39041.654 rows=2444 loops=1) Filter: (((productlistid = 3) OR (productlistid = 5) OR (productlistid = 4)) AND ((lower((item)::text) ~~ '9229%'::text) OR (lower((descrip)::text) ~~ 'toner%'::text))) -> Hash (cost=20.00..20.00 rows=1000 width=54) (actual time=6.289..6.289 rows=0 loops=1) -> Seq Scan on vendor v (cost=0.00..20.00 rows=1000 width=54) (actual time=0.060..3.653 rows=2797 loops=1) Total runtime: 39094.713 ms (10 rows) I guess the relation 'test' is a copy of product (?) Cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] postgresql-8.0.1 performance tuning
Cosimo Streppone wrote: # Config /etc/sysctl.conf: kernel.shmall = 786432000 kernel.shmmax = 786432000 I think you have a problem here. kernel.shmmax should *not* be set to an amount of RAM, but to maximum number of shared memory pages, which on a typical linux system is 4kb. Google around: This is somewhat confusing : kernel.shmmax is in bytes (max single segment size) kernel.shmall is in (4k) pages (max system wide allocated segment pages) cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match