Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
Trevor Talbot wrote: > On 10/20/07, Rainer Bauer <[EMAIL PROTECTED]> wrote: > >> Anyway, the problem are the no. of semaphores created by Postgres: >> Every backend creates at least 4* semaphores. Just >> increase to an unusual high value (say 1) and >> start creating new connections while monitoring the handle count. > > Hmm, they're actually the same semaphores, so the only cost is for > slots in each process's handle table, which comes from kernel paged > pool. Testing shows I can easily create about 30 million handles to a > given object on this machine. This is under win2003 with 1.25GB RAM, > which gives it a paged pool limit of 352MB. > > I tried going up to 2 max_connections, and still blew postmaster's > VM space long before paged pool was exhausted. I couldn't test any > higher values, as there's some interaction between max_connections and > shared_buffers that prevents it from mapping the buffer contiguously. > > Something's missing though, since I'm not hitting the same issue you > are. How are you generating the connections? I just have an app > calling PQconnectdb() in a loop, but I guess that's not good enough. Yeah, something is obviously missing.. Are you guys on the exactly the same Windows versions? WRT both version and servivepack. Anybody on x64 windows? Another thing worth testing - check if the amount of shared memory used makes a noticable difference. Try both very small and very large values. I don't think the paged pool is the problem - I think it's the nonpaged pool. Would be interesting to track that one in the failing case (using performance monitor, up to the point where it fails). And the nonpaged one is smaller... If that looks like it's the problem, it could be helpful to do a pooltag trace on it (see for example http://blogs.msdn.com/ntdebugging/archive/2006/12/18/Understanding-Pool-Consumption-and-Event-ID_3A00_--2020-or-2019.aspx) //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
Trevor Talbot wrote: > On 10/17/07, Magnus Hagander <[EMAIL PROTECTED]> wrote: >> On Wed, Oct 17, 2007 at 02:40:14AM -0400, Tom Lane wrote: > >>> Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections >>> to something we know the platform can stand? It'd be more comfortable >>> if we understood exactly where the limit was, but I think I'd rather >>> have an "I'm sorry Dave, I can't do that" than random-seeming crashes. >> Yeayh, that's probably a good idea - except we never managed to figure out >> where the limit is. It appears to vary pretty wildly between different >> machines, for reasons we don't really know why (total RAM has some effect >> on it, but that's not the only one, for example) > > I tried generating idle connections in an effort to reproduce > Laurent's problem, but I ran into a local limit instead: for each > backend, postmaster creates a thread and burns 4MB of its 2GB address > space. It fails around 490. Oh, that's interesting. That's actually a sideeffect of us increasing the stack size for the postgres.exe executable in order to work on other things. By default, it burns 1MB/thread, but ours will do 4MB. Never really thought of the problem that it'll run out of address space. Unfortunately, that size can't be changed in the CreateThread() call - only the initially committed size can be changed there. There are two ways to get around it - one is not using a thread for each backend, but a single thread that handles them all and then some sync objects around it. We originally considered this but said we won't bother changing it because the current way is simpler, and the overhead of a thread is tiny compared to a process. I don't think anybody even thought about the fact that it'd run you out of address space... The other way is to finish off win64 support :-) Which I plan to look at, but I don't think that alone should be considered a solution. The question is if it's worth fixing that part, if it will just fall down for other reasons before we reach these 500 connections anyway. Can you try having your program actually run some queries and so, and not just do a PQconnect? To see if it falls over then, because it's been doing more? > Laurent's issue must depend on other load characteristics. It's > possible to get a trace of DLL loads, but I haven't found a > noninvasive way of doing that. It seems to require a debugger be > attached. AFAIK, it does require that, yes. //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] keeping an index in memory
Rajarshi Guha <[EMAIL PROTECTED]> wrote: > > Hi, relating to my previous queries on doing spatial searches on 10M > rows, it seems that most of my queries return within 2 minutes. > Generally this is not too bad, though faster is always better. > > Interestingly, it appears that the CUBE index for the table in > question is about 3GB (the table itself is about 14GB). Not knowing > the details of the postgres internals, I assume that when a query > tries to use the index, it will need to read a 3GB file. Is this a > correct assumption? > > In such a situation, is there a way to keep the index in memory? My > machine has 8GB installed and currently has about 7.4GB free RAM (64 > bit linux 2.6.9) Free or cached/buffered? Your OS should be using most of that to buffer disk blocks. > A side effect of the size of the index is that if I do a query that > performs a seq scan (say using cube_distance) it takes longer than > when an index is used, but not significantly longer. And this is on a > 10M row table. > > What strategies do people follow when the index becomes very big? What version of PG are you using and what is your shared_buffers setting? With 8G of RAM, you should start with shared_buffers around 2 - 3G, if you're using a modern version of PG. With that much shared memory, a large portion of that index should stay in RAM, as long as it's being used often enough that PG doesn't swap it for other data. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] looking for some real world performance numbers
snacktime <[EMAIL PROTECTED]> wrote: > > I'm working through the architecture design for a new product. We > have a small group working on this. It's a web app that will be using > ruby on rails. The challenge I'm running into is that the latest > conventional wisdom seems to be that since obviously databases don't > scale on the web, you should just not use them at all. Who are the people saying this? It doesn't sound very wise to me. Where are they proposing to put the data, if not in a database? That's what I'd like to know. > I have a group > of otherwise very bright people trying to convince me that a rdbms is > not a good place to store relational data because eventually it won't > scale. What is _their_ evidence? > And of course we don't even have version 1 of our product out > of the door. E.S. Raymond's "The Art of UNIX Programming": Rule #15: Write a prototype before you optimize. Nothing is funnier than watching people try to performance optimize software that hasn't even been written yet. Very few people are smart enough to know where the performance bottlenecks will be before they've coded anything. If they insist on doing it wrong, at least you'll have a good laugh. > I'll admit we do have a very good chance of actually > getting tons of traffic, but my position is to use a rdbms for > relational data, and then if and when it won't scale any more, deal > with it then. That's sane. > So what would really help me is some real world numbers on how > postgresql is doing in the wild under pressure. If anyone cares to > throw some out I would really appreciate it. http://people.freebsd.org/~kris/scaling/ Lots of interesting graphs on that page ... most of them seem to indicate that RDBMS scale rather nicely. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] looking for some real world performance numbers
I'll agree with Bill's response... If they dont want a rdbms what do they want? If they know of something that scales better and is faster, I'll bet they can make a lot of money. Lot of high traffic sites would love to hear what they think. > conventional wisdom seems to be that since obviously databases don't > scale on the web, Conventional? No, I don't think so. If you have 200 Gig of data, what's going to search it faster than a rdbms? If you have 200 Gig of data, with very intensive database queries, what scales better than having one web server round-robin requests to 10 database servers? I think the conventional wisdom is that non-database people cannot setup a database to run quickly to save their life. And then blame the database. -Andy snacktime wrote: I'm working through the architecture design for a new product. We have a small group working on this. It's a web app that will be using ruby on rails. The challenge I'm running into is that the latest conventional wisdom seems to be that since obviously databases don't scale on the web, you should just not use them at all. I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data because eventually it won't scale. And of course we don't even have version 1 of our product out of the door. I'll admit we do have a very good chance of actually getting tons of traffic, but my position is to use a rdbms for relational data, and then if and when it won't scale any more, deal with it then. So what would really help me is some real world numbers on how postgresql is doing in the wild under pressure. If anyone cares to throw some out I would really appreciate it. Chris ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Inheritance foreign key unexpected behaviour
On 10/20/07, M. van Egmond <[EMAIL PROTECTED]> wrote: > Hi all, > > Im trying to use table inheritance in my database. I need it because i want > to be able to link any object in the database to another. So i created a > table my_object which has a serial, nothing more. All the other tables in > the system are inherited from this my_object table. Im having difficulties > adding foreign keys to the tables. This is my test setup: >From the inheritance docs at http://www.postgresql.org/docs/8.2/static/ddl-inherit.html A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children. This is true on both the referencing and referenced sides of a foreign key constraint. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] keeping an index in memory
On Sun, Oct 21, 2007 at 07:36:00AM -0400, Bill Moran wrote: > What version of PG are you using and what is your shared_buffers setting? > > With 8G of RAM, you should start with shared_buffers around 2 - 3G, if > you're using a modern version of PG. With that much shared memory, a > large portion of that index should stay in RAM, as long as it's being > used often enough that PG doesn't swap it for other data. With that much memory, the index is likely to remain in memory no matter what size shared_memory he has. Anything in shared_memory is going to be in the system cache anyway. I wonder if there's something else we havn't been told, like how big the actual table is and whether there are any other large tables/indexes. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [GENERAL] keeping an index in memory
On Oct 21, 2007, at 7:36 AM, Bill Moran wrote: Rajarshi Guha <[EMAIL PROTECTED]> wrote: Hi, relating to my previous queries on doing spatial searches on 10M rows, it seems that most of my queries return within 2 minutes. Generally this is not too bad, though faster is always better. Interestingly, it appears that the CUBE index for the table in question is about 3GB (the table itself is about 14GB). Not knowing the details of the postgres internals, I assume that when a query tries to use the index, it will need to read a 3GB file. Is this a correct assumption? In such a situation, is there a way to keep the index in memory? My machine has 8GB installed and currently has about 7.4GB free RAM (64 bit linux 2.6.9) Free or cached/buffered? Your OS should be using most of that to buffer disk blocks. Aah, correct. Yes they are cached/buffered A side effect of the size of the index is that if I do a query that performs a seq scan (say using cube_distance) it takes longer than when an index is used, but not significantly longer. And this is on a 10M row table. What strategies do people follow when the index becomes very big? What version of PG are you using and what is your shared_buffers setting? 8.2.5 My original shared_buffers setting was 128MB. With 8G of RAM, you should start with shared_buffers around 2 - 3G, if you're using a modern version of PG. I can do that but I'm a little confused. Earlier postings on the list indicate that shared_buffers should be about 10% of the system RAM and that effective_cache_size can be a large fraction of RAM. As a result I had effective_cache_size set to 2500MB Thanks for the pointers --- Rajarshi Guha <[EMAIL PROTECTED]> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE --- How I wish I were what I was when I wished I were what I am. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] keeping an index in memory
On Oct 21, 2007, at 10:40 AM, Martijn van Oosterhout wrote: On Sun, Oct 21, 2007 at 07:36:00AM -0400, Bill Moran wrote: What version of PG are you using and what is your shared_buffers setting? With 8G of RAM, you should start with shared_buffers around 2 - 3G, if you're using a modern version of PG. With that much shared memory, a large portion of that index should stay in RAM, as long as it's being used often enough that PG doesn't swap it for other data. With that much memory, the index is likely to remain in memory no matter what size shared_memory he has. Anything in shared_memory is going to be in the system cache anyway. I wonder if there's something else we havn't been told, like how big the actual table is and whether there are any other large tables/indexes. The table itself is about 10M rows corresponding to 14GB. The only other index on this table is a btree index whose size is ~300MB. The machine is not running anything else. Now, it might just be the case that given the size of the index, I cannot make bounding box queries (which will use the CUBE index) go any faster. But I am surprised that that the other type of query (using cube_distance which by definition must use a seq scan) is only slightly longer. If nothing else, scanning through 14GB of data should be 3 times slower than scanning through 3GB of data. --- Rajarshi Guha <[EMAIL PROTECTED]> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE --- All laws are simulations of reality. -- John C. Lilly ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] keeping an index in memory
Rajarshi Guha <[EMAIL PROTECTED]> writes: > Now, it might just be the case that given the size of the index, I > cannot make bounding box queries (which will use the CUBE index) go > any faster. But I am surprised that that the other type of query > (using cube_distance which by definition must use a seq scan) is only > slightly longer. If nothing else, scanning through 14GB of data > should be 3 times slower than scanning through 3GB of data. A single index probe should not touch anything like all of the index --- unless your data is such that the index is very non-optimally laid out. GiST should work well if there are lots of subsets of the data that have bounding boxes disjoint from other subsets'. If not, maybe you need to reconsider your data representation. Have you done any examination of how much of the index gets touched during a typical query? I'd try turning on stats_block_level and see how the delta in pg_statio_all_indexes.idx_blks_read compares to the index size. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Photos from the PostgreSQL Conference Fall 2007
The PostgreSQL Conference Fall 2007 was informative, fun, and well-executed. Thanks to Selena Deckelmann, Joshua Drake, and everyone else who made it happen. Here are my photos of the event: http://db.endpoint.com/pgcon07/ -- Daniel Browning End Point Corporation http://www.endpoint.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] keeping an index in memory
"Rajarshi Guha" <[EMAIL PROTECTED]> writes: > The table itself is about 10M rows corresponding to 14GB. Each row is on average 1.4kB ? Perhaps you should send more details of the table definition and the typical size of each column. It's possible you have the columns you're selecting on being stored out of line ("toasted") which would hurt performance if you're often accessing many of those columns. If it's not true then you may have a lot of dead space in your table which would decrease performance. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] keeping an index in memory
On 10/21/07, Rajarshi Guha <[EMAIL PROTECTED]> wrote: > > > With 8G of RAM, you should start with shared_buffers around 2 - 3G, if > > you're using a modern version of PG. > > I can do that but I'm a little confused. Earlier postings on the list > indicate that shared_buffers should be about 10% of the system RAM > and that effective_cache_size can be a large fraction of RAM. That was true with 7.4 and before because their cache management wasn't very efficient. With 8.0 and above, PostgreSQL can handle much larger shared_buffer sizes. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Explicit Named Indexes for Constraints
I'm new to PosgtreSQL, a veteran with Informix. We are considering a migration. In Informix, it is recommended to create explicit named indexes on columns for primary and foreign keys prior to creating the constraints. Otherwise, the server create the indexes for you with meaningless names. This is not generally a problem, except when you dump the schema, you get all the constraint indexes in the DDL, exported as if they were explicitly created, but with the server generated names. It's a mess to sort through. What's the recommended procedure in PG? At first glance it appears that PG hides the implicit indexes from you at all times, including pg_dump. So it appears that explicit index creation can be skipped without leaving you with a mess later. Is this just a non-issue in PG? Jeff ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] looking for some real world performance numbers
snacktime wrote: I'm working through the architecture design for a new product. We have a small group working on this. It's a web app that will be using ruby on rails. The challenge I'm running into is that the latest conventional wisdom seems to be that since obviously databases don't scale on the web, you should just not use them at all. I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data because eventually it won't scale. And of course we don't even have version 1 of our product out of the door. I'll admit we do have a very good chance of actually getting tons of traffic, but my position is to use a rdbms for relational data, and then if and when it won't scale any more, deal with it then. So what would really help me is some real world numbers on how postgresql is doing in the wild under pressure. If anyone cares to throw some out I would really appreciate it. I've got a client doing 18M page views/ day and postgresql isn't really sweating. Dave Chris ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
Magnus Hagander wrote: >Trevor Talbot wrote: >> On 10/20/07, Rainer Bauer <[EMAIL PROTECTED]> wrote: >> >>> Anyway, the problem are the no. of semaphores created by Postgres: >>> Every backend creates at least 4* semaphores. Just >>> increase to an unusual high value (say 1) and >>> start creating new connections while monitoring the handle count. >> >> Hmm, they're actually the same semaphores, so the only cost is for >> slots in each process's handle table, which comes from kernel paged >> pool. Testing shows I can easily create about 30 million handles to a >> given object on this machine. This is under win2003 with 1.25GB RAM, >> which gives it a paged pool limit of 352MB. On my system I can only create about 4 millions semaphores. >> I tried going up to 2 max_connections, and still blew postmaster's >> VM space long before paged pool was exhausted. I couldn't test any >> higher values, as there's some interaction between max_connections and >> shared_buffers that prevents it from mapping the buffer contiguously. >> >> Something's missing though, since I'm not hitting the same issue you >> are. How are you generating the connections? I just have an app >> calling PQconnectdb() in a loop, but I guess that's not good enough. I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to establish the test connections. >Yeah, something is obviously missing.. Are you guys on the exactly the >same Windows versions? WRT both version and servivepack. Anybody on x64 >windows? No, I am using WinXP SP2 32 bit with 2GB RAM. These are my altered settings from the default 8.2.5 Postgres installation: ssl = on shared_buffers = 512MB work_mem = 16MB maintenance_work_mem = 256MB wal_sync_method = fsync_writethrough checkpoint_segments = 15 checkpoint_timeout = 30min random_page_cost = 3.0 effective_cache_size = 1GB autovacuum_vacuum_scale_factor = 0.10 autovacuum_analyze_scale_factor = 0.05 >Another thing worth testing - check if the amount of shared memory used >makes a noticable difference. Try both very small and very large values. Well I tried different shared_buffers settings, but the result was consisting: with max_connections set to 1, I can create 150 database connections. However, I checked the handle count at the moment the last connection fails and it is only at 1,5 million. So it seems the handles are not the primary problem. Let me know if you want any other tests performed on this machine. I also have VS2005 installed, but until now I haven't compiled Postgres here (I was waiting for 8.3 which fully supports building with VS). Rainer ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Explicit Named Indexes for Constraints
"Jeff Larsen" <[EMAIL PROTECTED]> writes: > In Informix, it is recommended to create explicit named indexes on > columns for primary and foreign keys prior to creating the > constraints. Otherwise, the server create the indexes for you with > meaningless names. This is not generally a problem, except when you > dump the schema, you get all the constraint indexes in the DDL, > exported as if they were explicitly created, but with the server > generated names. It's a mess to sort through. Ugh. In PG, you can specify the names for server-generated indexes; they're just the same names given to the constraints: CREATE TABLE foo (f1 int constraint foo_primary_key primary key); The index underlying this constraint will be named foo_primary_key. If you leave off the "constraint name" clause then you get an autogenerated name, but it's not so meaningless that there's a strong need to override it --- in this example it'd be "foo_pkey". Manual creation of indexes duplicating a constraint is definitely *not* the thing to do in PG; you'll end up with redundant indexes. > What's the recommended procedure in PG? At first glance it appears > that PG hides the implicit indexes from you at all times, including > pg_dump. I wouldn't say they are "hidden", you just don't need to mention them separately in the DDL commands. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] looking for some real world performance numbers
snacktime wrote: I'm working through the architecture design for a new product. We have a small group working on this. It's a web app that will be using ruby on rails. The challenge I'm running into is that the latest conventional wisdom seems to be that since obviously databases don't scale on the web, you should just not use them at all. I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data because eventually it won't scale. And of course we don't even have version 1 of our product out of the door. I'll admit we do have a very good chance of actually getting tons of traffic, but my position is to use a rdbms for relational data, and then if and when it won't scale any more, deal with it then. So what would really help me is some real world numbers on how postgresql is doing in the wild under pressure. If anyone cares to throw some out I would really appreciate it. It sounds like the RoR people are talking about any relational database, and not just Postgres. Many very busy sites do use relational databases successfully. So it can work. Many other have failed. So it can fail, if the situation is exceptionally unusual, or IMHO more likely, it´s poorly implemented. What the main argument of their ¨won´t scale¨ stance? Why not setup a test case to prove or disprove it? I don´t think anything we can suggest based on what we know of your project will help, unless someone happens to throw out a nearly identical case. I would be surprised if avoiding a database is a better solution. But regardless, I would be more worried about using a technology when most of the core group doesn´t believe in it. That often leads to bad results, regardless of whether it should. Paul ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] looking for some real world performance numbers
snacktime wrote on 21.10.2007 08:11: I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data Hmm. Those bright people say that a /relational/ database management system is not a good place to store /relational/ data? I can't understand that reasoning... Where else do they want to store relational data than in a RDBMS? Thomas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] looking for some real world performance numbers
Thomas Kellerer <[EMAIL PROTECTED]> writes: > Where else do they want to store relational data than in a RDBMS? Indeed. It seems like we can hardly answer the OP's question without asking "compared to what?" If they're afraid an RDBMS won't scale, what have they got in mind that they are so certain will scale? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] SQL spec/implementation question: UPDATE
Hullo list, A perhaps esoteric question: Short version: What do the specs say (if anything) about returning information from UPDATE commands? Or about handling update request that don't effectively do anything? Longer version: CREATE TABLE test ( id SERIAL NOT NULL, nameTEXT NOT NULL, passion TEXT NOT NULL, PRIMARY KEY( id ) ); INSERT INTO test (name, passion) VALUES ('colin', 'contra-dancing'); INSERT INTO test (name, passion) VALUES ('alex', 'contemplating'); INSERT INTO test (name, passion) VALUES ('kevin', 'soccer'); INSERT INTO test (name, passion) VALUES ('toby', 'biking'); BEGIN; UPDATE test SET name = 'kevin' WHERE passion = 'soccer'; Previous statement 5 times (or whatever) COMMIT; Even though the last 5 statements effectively do nothing, every UPDATE returns "UPDATE 1". If I do the same thing in MySQL, I get "Rows matched: 1 Changed: 0 Warnings: 0". (I used the INNODB engine in MySQL.) In PHP, the {pg,mysql}_affected_rows functions return the same results: 1 from Postgres and 0 from MySQL. So, two questions: which behavior is correct, or is it even defined? If Postgres behavior is correct, why does it need to write to disk, (since the tuple isn't actually changing in value)? Experience tells me that Postgres is probably doing the correct thing, but it almost seems that it could be corner case, doesn't matter either way, and is could be just a consequence of the MVCC guarantees, etc. TIA, Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Select Command
I have a column with data structured as follows. 32TT - 0002 32LT- 0004 32PT-0005 Is there a way of selecting all of the rows containing LT in that column?? I have attempted variations of ' *LT* ' with out success. Bob Pawley
Re: [GENERAL] Select Command
Bob Pawley wrote: > I have a column with data structured as follows. > > 32TT - 0002 > 32LT- 0004 > 32PT-0005 > > Is there a way of selecting all of the rows containing LT in that column?? > > > I have attempted variations of ' *LT* ' with out success. LIKE '%LT%' perhaps? Or ~ 'LT' (unanchored regex) -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL spec/implementation question: UPDATE
Kevin Hunter wrote: Hullo list, A perhaps esoteric question: Short version: What do the specs say (if anything) about returning information from UPDATE commands? Or about handling update request that don't effectively do anything? Longer version: CREATE TABLE test ( id SERIAL NOT NULL, nameTEXT NOT NULL, passion TEXT NOT NULL, PRIMARY KEY( id ) ); INSERT INTO test (name, passion) VALUES ('colin', 'contra-dancing'); INSERT INTO test (name, passion) VALUES ('alex', 'contemplating'); INSERT INTO test (name, passion) VALUES ('kevin', 'soccer'); INSERT INTO test (name, passion) VALUES ('toby', 'biking'); BEGIN; UPDATE test SET name = 'kevin' WHERE passion = 'soccer'; Previous statement 5 times (or whatever) COMMIT; Even though the last 5 statements effectively do nothing, every UPDATE returns "UPDATE 1". If I do the same thing in MySQL, I get "Rows matched: 1 Changed: 0 Warnings: 0". (I used the INNODB engine in MySQL.) In PHP, the {pg,mysql}_affected_rows functions return the same results: 1 from Postgres and 0 from MySQL. So, two questions: which behavior is correct, or is it even defined? If Postgres behavior is correct, why does it need to write to disk, (since the tuple isn't actually changing in value)? Experience tells me that Postgres is probably doing the correct thing, but it almost seems that it could be corner case, doesn't matter either way, and is could be just a consequence of the MVCC guarantees, etc. TIA, Kevin I think your comparing apples and oranges. I'll bet that mysql is taking a shortcut and testing the value before updating it. The update is probably more close to: update test set name = 'kevin' where passion = 'soccer' and name <> 'kevin'; In this case, pg too, would only update once. -Andy ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Select Command
On Sunday 21 October 2007 2:32 pm, Bob Pawley wrote: > I have a column with data structured as follows. > > 32TT - 0002 > 32LT- 0004 > 32PT-0005 > > Is there a way of selecting all of the rows containing LT in that column?? > > > I have attempted variations of ' *LT* ' with out success. > > Bob Pawley select col where col LIKE '%LT%'; See also: http://www.postgresql.org/docs/8.2/interactive/functions-matching.html#FUNCTIONS-LIKE -- Adrian Klaver [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL spec/implementation question: UPDATE
andy <[EMAIL PROTECTED]> writes: > I think your comparing apples and oranges. I'll bet that mysql is > taking a shortcut and testing the value before updating it. > The update is probably more close to: > update test set name = 'kevin' where passion = 'soccer' and name <> 'kevin'; Yeah, that seems to be what they're doing. PG does not bother to make such a test, on the grounds that it would waste more net cycles than it would save. Most people are not in the habit of issuing lots of no-op updates. Also, if you have a case where you think that is what will happen, you can add the WHERE-condition for yourself; whereas there is no way in mysql to get rid of the useless test even if you know it's useless. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] looking for some real world performance numbers
Dave Cramer wrote: snacktime wrote: I'm working through the architecture design for a new product. We have a small group working on this. It's a web app that will be using ruby on rails. The challenge I'm running into is that the latest conventional wisdom seems to be that since obviously databases don't scale on the web, you should just not use them at all. I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data because eventually it won't scale. And of course we don't even have version 1 of our product out of the door. I'll admit we do have a very good chance of actually getting tons of traffic, but my position is to use a rdbms for relational data, and then if and when it won't scale any more, deal with it then. So what would really help me is some real world numbers on how postgresql is doing in the wild under pressure. If anyone cares to throw some out I would really appreciate it. I missed the original post on this, so I'm replying to Dave's response. To the OP, I don't know where you obtain your conventional wisdom from, but I'd look for another source. Just about any site you might visit that handles lots of data has a DBMS of some sort behind it; given that IMS and Adabase have been out of favor for 20 years, most of those DBMSs are relational. So if it can work for your bank, E*Trade and eBay, chances are it can work for you. As far as real world numbers, we have a data-intensive app (network data collection for a telecom company) that is currently inserting about 16 million rows a day. I benchmarked PG for that app and with some tweaking, PG could handle it. The current app uses stored procedures for all inserts, and PG didn't do well with that approach; substituting embedded inserts fixed that problem. So PG can definitely "handle" very large transaction volumes. As with any DBMS and any application, you may encounter challenges (like the one I point out with using stored procs for high-volume inserts) that require you to address with some thought. -- Guy Rouillier ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] looking for some real world performance numbers
On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote: > The current app uses stored procedures > for all inserts, and PG didn't do well with that approach; substituting > embedded inserts fixed that problem. So PG can definitely "handle" very Can you explain what is embedded inserts? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL spec/implementation question: UPDATE
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/21/07 17:52, Tom Lane wrote: > andy <[EMAIL PROTECTED]> writes: >> I think your comparing apples and oranges. I'll bet that mysql is >> taking a shortcut and testing the value before updating it. > >> The update is probably more close to: >> update test set name = 'kevin' where passion = 'soccer' and name <> 'kevin'; > > Yeah, that seems to be what they're doing. PG does not bother to make > such a test, on the grounds that it would waste more net cycles than it > would save. Most people are not in the habit of issuing lots of no-op > updates. > > Also, if you have a case where you think that is what will happen, you > can add the WHERE-condition for yourself; whereas there is no way in > mysql to get rid of the useless test even if you know it's useless. Not to bash MySQL (much...) but ISTM that this is another example of MySQL playing fast and loose with SQL. IOW, the RDBMS shouldn't try to out-think me even if I seem seem to be doing something odd. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHHBoGS9HxQb37XmcRAnGwAKCmiUnUvXHDyGs5Z0q0dZYlVOFaUgCcClhu hwwRK9w9RhFM9lmAPZl2oP8= =6Tso -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] command to view the tables
Hi all, I'm new to postgresql. I'm running postgresql database on Linux platform. I just wanna know the command to view all the tables inside a specific database. Can anyone tell me? Thanks! ~Shwe~ __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: [GENERAL] command to view the tables
On Sun, 2007-10-21 at 20:35 -0700, Shwe Yee Than wrote: > I'm new to postgresql. I'm running postgresql database on Linux > platform. I just wanna know the command to view all the tables inside > a specific database. Can anyone tell me? \d would be what you use in psql or just do a \? for help.. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Indexes & Primary Keys (based on the same columns)
I'm wondering if what I'm doing is redundant. I have a primary key on columns (A,B,C,D) and I've also defined an index based on the same columns (A,B,C,D) and sometimes in the query explain, I see the pkey being used for the scan instead of the index. So.. That made me think perhaps the additional index on the _same_ parameter is redundant. Appreciate comments. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Indexes & Primary Keys (based on the same columns)
Ow Mun Heng wrote: I'm wondering if what I'm doing is redundant. I have a primary key on columns (A,B,C,D) and I've also defined an index based on the same columns (A,B,C,D) and sometimes in the query explain, I see the pkey being used for the scan instead of the index. So.. That made me think perhaps the additional index on the _same_ parameter is redundant. A primary key creates an index so having a second index with the same definition is redundant. Appreciate comments. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Indexes & Primary Keys (based on the same columns)
On Sun, 2007-10-21 at 20:49 -0700, Joshua D. Drake wrote: > Ow Mun Heng wrote: > > I'm wondering if what I'm doing is redundant. > > > > I have a primary key on columns (A,B,C,D) > > and I've also defined an index based on the same columns (A,B,C,D) > > > > and sometimes in the query explain, I see the pkey being used for the > > scan instead of the index. > > > > So.. That made me think perhaps the additional index on the _same_ > > parameter is redundant. > > A primary key creates an index so having a second index with the same > definition is redundant. Many thanks for the confirmation. I'm dropping them... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Indexes & Primary Keys (based on the same columns)
Ow Mun Heng <[EMAIL PROTECTED]> writes: > I'm wondering if what I'm doing is redundant. > I have a primary key on columns (A,B,C,D) > and I've also defined an index based on the same columns (A,B,C,D) Yup, 100% redundant. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] looking for some real world performance numbers
Dave Cramer wrote: snacktime wrote: I'm working through the architecture design for a new product. We have a small group working on this. It's a web app that will be using ruby on rails. The challenge I'm running into is that the latest conventional wisdom seems to be that since obviously databases don't scale on the web, you should just not use them at all. I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data because eventually it won't scale. And of course we don't even have version 1 of our product out of the door. I'll admit we do have a very good chance of actually getting tons of traffic, but my position is to use a rdbms for relational data, and then if and when it won't scale any more, deal with it then. So what would really help me is some real world numbers on how postgresql is doing in the wild under pressure. If anyone cares to throw some out I would really appreciate it. I've got a client doing 18M page views/ day and postgresql isn't really sweating. We have a client doing 15k/tps via a website. I would say these, "very bright people" are "very bright but excessively ignorant". Joshua D. Drake Dave Chris ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] looking for some real world performance numbers
> As far as real world numbers, we have a data-intensive app > (network data > collection for a telecom company) that is currently inserting > about 16 > million rows a day. I benchmarked PG for that app and with some > tweaking, PG could handle it. Me too, not telco though. 5.5 million per day across 2240 tables and 4 databases with 50 days on-line. Tweaking to auto vaccuum to get it to keep up with the daily deletes and fsync off, slow disks not PG's fault but have UPS. Allan The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended recipient, use, disclosure or copying of this information is prohibited. If you have received this document in error, please advise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses contained in this email or any attachments. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] looking for some real world performance numbers
Ow Mun Heng wrote: On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote: The current app uses stored procedures for all inserts, and PG didn't do well with that approach; substituting embedded inserts fixed that problem. So PG can definitely "handle" very Can you explain what is embedded inserts? Insert via embedded SQL insert statements in our Java code, as opposed to embedded SQL stored proc invocations, which in turn do the inserts. The existing code base used the latter approach, which didn't work well with PG. I suspect it has to do with PG's stored proc overload capability. The short of it is that *any* DBMS you use will have its own quirks that you become acquainted with and learn to work around. -- Guy Rouillier ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] looking for some real world performance numbers
On Mon, 2007-10-22 at 01:23 -0400, Guy Rouillier wrote: > Ow Mun Heng wrote: > > On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote: > >> The current app uses stored procedures > >> for all inserts, and PG didn't do well with that approach; substituting > >> embedded inserts fixed that problem. So PG can definitely "handle" very > > > > > > Can you explain what is embedded inserts? > > Insert via embedded SQL insert statements in our Java code, as opposed > to embedded SQL stored proc invocations, which in turn do the inserts. AH.. so you sort of hard-code insert into table values($x,$y,$z,$w) instead of execute sp_insert($x,$y,$z,$w) > The existing code base used the latter approach, which didn't work well > with PG. I suspect it has to do with PG's stored proc overload > capability. The short of it is that *any* DBMS you use will have its > own quirks that you become acquainted with and learn to work around. Yeah.. I found out that PG doesn't like this statement. where audit_key_dtime >= (select last_refreshed from denorm_log where tablename = 'zon') and audit_key_dtime < (select last_refreshed + refresh_interval from denorm_log where tablename = 'zon') Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1)) but prefers this where audit_key_dtime >= '2007-08-08 18:00:00' and audit_key_dtime < '2007-08-08 18:01:00' Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone) AND (audit_key_dtime < '2007-08-08 18:01:00'::timestamp without time zone)) even though they are of the same 1 min interval ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] SQL spec/implementation question: UPDATE
At 6:00p -0400 on 21 Oct 2007, andy wrote: > I think your comparing apples and oranges. That's why I ask the list! To learn when I'm doing that. ;-) > I'll bet that mysql is > taking a shortcut and testing the value before updating it. Heh. And as Tom points out downthread, that "shortcut" probably doesn't gain anything in the long run. Kevin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SQL spec/implementation question: UPDATE
At 6:52p -0400 on 21 Oct 2007, Tom Lane wrote: > andy <[EMAIL PROTECTED]> writes: >> I think your comparing apples and oranges. I'll bet that mysql is >> taking a shortcut and testing the value before updating it. > >> The update is probably more close to: >> update test set name = 'kevin' where passion = 'soccer' and name <> 'kevin'; > > Yeah, that seems to be what they're doing. PG does not bother to make > such a test, on the grounds that it would waste more net cycles than it > would save. Most people are not in the habit of issuing lots of no-op > updates. Makes sense. In this particular case, it's a moot point as it's guaranteed to update a single row only (or less), but I was idly curious. In fact, for the application in question, having the behavior of Postgres would make it possible to clean up the application logic a bit, but eh. I'm stuck with MySQL for this project. :-( Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL spec/implementation question: UPDATE
At 11:33p -0400 on 21 Oct 2007, Ron Johnson wrote: > Not to bash MySQL (much...) but ISTM that this is another example of > MySQL playing fast and loose with SQL. I don't have handy a spec guide. Does this mean that MySQL is indeed showing incorrect behavior? I like what's been said upthread: The query MySQL apparently sees: UPDATE test SET name = 'kevin' WHERE passion = 'soccer' AND name <> 'kevin'; The query as I wrote: UPDATE test SET name = 'kevin' WHERE passion = 'soccer'; Even though it is, in fact, the same, it should still be updated because that's what I said. Is that what you're saying? Is that the spec? Thanks, Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL spec/implementation question: UPDATE
Kevin Hunter <[EMAIL PROTECTED]> writes: > I don't have handy a spec guide. Does this mean that MySQL is indeed > showing incorrect behavior? I think this is really outside the spec. The relevant sections of SQL92 seem to be in 13.10 : ::= UPDATE SET [ WHERE ] ... b) If a is specified, then it is applied to each row of T with the bound to that row, and the object rows are those rows for which the result of the is true. The is effectively evaluated for each row of T before updating any row of T. ... 8) Each object row is updated as specified by each . There is not anything I can see addressing whether an "update" should or should not be considered to occur if a target column happens to not change as a result of a commanded update. There is certainly not anything specifically requiring mysql's behavior, but I don't see anything specifically rejecting it either. There is 4) If the set of object rows is empty, then a completion condition is raised: no data. but this refers to the case where the given WHERE condition selects no rows; I see no argument for claiming that it involves whether the new field values match the old ones. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend