Re: [GENERAL] maintenance_work_mem and CREATE INDEX time
On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote wrote: > Hello, > > While understanding the effect of maintenance_work_mem on time taken > by CREATE INDEX, I observed that for the values of > maintenance_work_mem less than the value for which an internal sort is > performed, the time taken by CREATE INDEX increases as > maintenance_work_increases. > > My guess is that for all those values an external sort is chosen at > some point and larger the value of maintenance_work_mem, later the > switch to external sort would be made causing CREATE INDEX to take > longer. That is a smaller value of maintenance_work_mem would be > preferred for when external sort is performed anyway. Does that make > sense? > Upon further investigation, it is found that the delay to switch to external sort caused by a larger value of maintenance_work_mem is small compared to the total time of CREATE INDEX. So, plotting for a number of maintenance_work_mem values shows that its effect is negligible. Are there any other parts of external sort logic that might make it slower with increasing values of maintenance_work_mem. It seems merge order, number of tapes seem are related with state->allowedMem. Does that mean, external sort is affected by the value of maintenance_work_mem in a way roughly similar to above? -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Different transaction log for database/schema
On Mon, Jul 22, 2013 at 10:09 AM, Ondrej Chaloupka wrote: > Hello, > > I would like kindly ask for an advice whether and how the transaction log > behavior could be configured. > > I would like to have possibility to differentiate transaction logs for two > databases or schema. I would need such configuration for my testing > environment where I need to run simultaneously different tests using > transactions. And I need to separate those tests for being sure that they do > not interfere with each other. I'm using XA transactions in Java. It is not clear to me to which logs you are referring: the commit logs or the WALs? Anyway, transaction are numbered in an unique way around the cluster, and therefore it does not matter on which database you are going to run the transaction, it will not interfere with other databases within the same cluster. I don't get what advantage will give you having logs different for each database/schema, considering that it is not clear what kind of maintenance you are going to do on such logs. Can you please better explain what is your aim and use case? Luca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Viewing another role's search path?
Is there some simple way of viewing the search path (or other role-specific setting) for a role different to the current role? Apart from querying 'pg_db_role_setting' directly? Just wondering if I'm missing something obvious. Regards Ian Barwick -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Different transaction log for database/schema
Hi, I'm talking about transaction log for XA/2PC transactions. My tests crash application in some phase of the transaction. After restart I would like check how the recovery proceeded and I would like verify that all the transaction records (I mean info about prepare tx) are gone on DB site. Nevertheless what I took from discussion so far is that for my use case I need more PostgreSQL installations or to synchronize test runs for not being run in parallel. Thank you for all reactions Ondra -- Původní zpráva -- Od: Luca Ferrari Datum: 23. 7. 2013 Předmět: Re: [GENERAL] Different transaction log for database/schema "On Mon, Jul 22, 2013 at 10:09 AM, Ondrej Chaloupka wrote: > Hello, > > I would like kindly ask for an advice whether and how the transaction log > behavior could be configured. > > I would like to have possibility to differentiate transaction logs for two > databases or schema. I would need such configuration for my testing > environment where I need to run simultaneously different tests using > transactions. And I need to separate those tests for being sure that they do > not interfere with each other. I'm using XA transactions in Java. It is not clear to me to which logs you are referring: the commit logs or the WALs? Anyway, transaction are numbered in an unique way around the cluster, and therefore it does not matter on which database you are going to run the transaction, it will not interfere with other databases within the same cluster. I don't get what advantage will give you having logs different for each database/schema, considering that it is not clear what kind of maintenance you are going to do on such logs. Can you please better explain what is your aim and use case? Luca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general";
Re: [GENERAL] Viewing another role's search path?
On Tue, Jul 23, 2013 at 9:15 PM, Ian Lawrence Barwick wrote: > Is there some simple way of viewing the search path (or other > role-specific setting) for a role different to the current role? Apart from > querying 'pg_db_role_setting' directly? > This one perhaps? select rolname, rolconfig from pg_roles; rolconfig contains all the user-specific settings. -- Michael
Re: [GENERAL] Viewing another role's search path?
2013/7/23 Michael Paquier : > > On Tue, Jul 23, 2013 at 9:15 PM, Ian Lawrence Barwick > wrote: >> >> Is there some simple way of viewing the search path (or other >> role-specific setting) for a role different to the current role? Apart >> from >> querying 'pg_db_role_setting' directly? > > This one perhaps? > select rolname, rolconfig from pg_roles; > rolconfig contains all the user-specific settings. Thanks, but what I'm looking for is a more elegant (built-in?) way of extracting the contents of the configuration array which is useable for people who don't want to poke around in the system catalogues. I've created a view which does this, but I want to confirm if there's anything along the lines of "SHOW search_path FOR ROLE foo" which I've missed and would save the trouble of maintaining the view. Regards Ian Barwick -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Viewing another role's search path?
On 07/23/2013 05:15 AM, Ian Lawrence Barwick wrote: Is there some simple way of viewing the search path (or other role-specific setting) for a role different to the current role? Apart from querying 'pg_db_role_setting' directly? Just wondering if I'm missing something obvious. http://www.postgresql.org/docs/9.2/interactive/app-psql.html \drds [ role-pattern [ database-pattern ] ] Lists defined configuration settings. These settings can be role-specific, database-specific, or both. role-pattern and database-pattern are used to select specific roles and databases to list, respectively. If omitted, or if * is specified, all settings are listed, including those not role-specific or database-specific, respectively. The ALTER ROLE and ALTER DATABASE commands are used to define per-role and per-database configuration settings. Regards Ian Barwick -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] maintenance_work_mem and CREATE INDEX time
On Mon, Jul 22, 2013 at 9:11 PM, Amit Langote wrote: > Hello, > > While understanding the effect of maintenance_work_mem on time taken > by CREATE INDEX, I observed that for the values of > maintenance_work_mem less than the value for which an internal sort is > performed, the time taken by CREATE INDEX increases as > maintenance_work_increases. > > My guess is that for all those values an external sort is chosen at > some point and larger the value of maintenance_work_mem, later the > switch to external sort would be made causing CREATE INDEX to take > longer. That is a smaller value of maintenance_work_mem would be > preferred for when external sort is performed anyway. Does that make > sense? The heap structure used in external sorts is cache-unfriendly. The bigger the heap used, the more this unfriendliness becomes apparent. And the bigger maintenance_work_mem, the bigger the heap used. The bigger heap also means you have fewer "runs" to merge in the external sort. However, as long as the number of runs still fits in the same number of merge passes, this is generally not a meaningful difference. Ideally the planner (or something) would figure out how much memory would be needed to complete an external sort in just one external pass, and then lower the effective maintenance_work_mem to that amount. But that would be hard to do with complete precision, and the consequences of getting it wrong are asymmetric. (More thoroughly, it would figure out the number of passes needed for the given maintenance_work_mem, and then lower the effective maintenance_work_mem to the smallest value that gives the same number of passes. But for nearly all practical situations, I think the number of passes for an index build is going to be 0 or 1.) Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Fwd: [GENERAL] odd locking behaviour
Thank you Jeff and others for the responses. One concern that I have is that even cases where there is no deadlock it is still acquiring stronger locks than necessary. I only discovered it because of the deadlock issue but I presume that there are many cases where it is acquiring a lock on the foreign table and really doesn't need to. That would seem to lead to higher lock contention in general which although it may not cause a deadlock in every case, does affect the overall performance. In my opinion this should be a case where it ought not to acquire any locks on the foreign table at all. It's not as though the columns relevant to the foreign key reference have changed. Anyway, at least it's good to understand it more. I'll have to find some way to work around this in my application. Thank you! On Mon, Jul 22, 2013 at 12:48 PM, Jeff Janes wrote: > On Sun, Jul 21, 2013 at 9:15 PM, Pavel Stehule > wrote: > > hello > > > > It can be artefact of RI implementation. > > > > see > http://michael.otacoo.com/postgresql-2/postgres-9-3-feature-highlight-for-key-share-and-for-no-key-update/ > > > > Try to 9.3, please, where RI uses more gently locks > > It still behaves this way in 9.4dev. > > >> > >> On Mon, Jul 8, 2013 at 9:54 AM, pg noob wrote: > >>> > >>> > >>> Thank you for the responses. Is it a bug? > > I don't think so. While PostgreSQL of course strives for maximum > concurrency, it makes no guarantee that it uses the weakest > theoretically possible locking in all possible cases. But it is kind > of unfortunate that updating the same row twice causes a lock > escalation when it is not obvious it should do so, because as you > found that makes avoiding deadlocks quite difficult. > > I'm rather surprised it doesn't block at the first update of the 2nd > session, rather than waiting for the 2nd update of that session. > > Anyway, when the 2nd session re-updates the same row in the same > transaction, it uses a 'multixact' to record this. Doing that > apparently defeats some locking optimization that takes place under > simpler cases. > > Sorry, that probably isn't the definitive answer you were hoping for. > > Cheers, > > Jeff >
[GENERAL] process deadlocking on its own transactionid?
We're seeing a problem with some of our processes hanging on locks. The select below makes it look like it's *waiting* for a ShareLock on transactionid, but it *has* an ExclusiveLock on the same value in virtualxid. That makes it look like the process has deadlocked on its own transactionid. Or are we reading the results of this query wrong, and this is expected behavior, and our problem lies elsewhere? (Yes, the process is doing a "select for update" on this context_objects table according to pg_stat_activity) production=> select locktype, database, relname, relation, virtualxid, virtualtransaction, pid, mode, granted from pg_locks left outer join pg_class on pg_class.oid = pg_locks.relation where pid = 2288; locktype| database | relname| relation | virtualxid | virtualtransaction | pid | mode | granted ---+--+--+--+--++--+-+- relation |41194 | context_objects_pkey | 95318843 | | 123/45694692 | 2288 | AccessShareLock | t relation |41194 | context_objects |41553 | | 123/45694692 | 2288 | RowShareLock| t virtualxid| | | | 123/45694692 | 123/45694692 | 2288 | ExclusiveLock | t transactionid | | | | | 123/45694692 | 2288 | ShareLock | f tuple |41194 | context_objects |41553 | | 123/45694692 | 2288 | ExclusiveLock | t (5 rows) ``` -- Kevin M. Goess Software Engineer Berkeley Electronic Press kgo...@bepress.com 510-665-1200 x179 www.bepress.com bepress: sustainable scholarly publishing
Re: [GENERAL] odd locking behaviour
Moshe Jacobson escribió: > I wish one of the PG developers would respond to this... Have you seen the thread in pgsql-bugs? http://www.postgresql.org/message-id/e1uwamw-vh...@wrigleys.postgresql.org -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] maintenance_work_mem and CREATE INDEX time
On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote wrote: > On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote wrote: >> Hello, >> >> While understanding the effect of maintenance_work_mem on time taken >> by CREATE INDEX, I observed that for the values of >> maintenance_work_mem less than the value for which an internal sort is >> performed, the time taken by CREATE INDEX increases as >> maintenance_work_increases. >> >> My guess is that for all those values an external sort is chosen at >> some point and larger the value of maintenance_work_mem, later the >> switch to external sort would be made causing CREATE INDEX to take >> longer. That is a smaller value of maintenance_work_mem would be >> preferred for when external sort is performed anyway. Does that make >> sense? >> > > Upon further investigation, it is found that the delay to switch to > external sort caused by a larger value of maintenance_work_mem is > small compared to the total time of CREATE INDEX. If you are using trace_sort to report that, it reports the switch as happening as soon as it runs out of memory. At point, all we have been doing is reading tuples into memory. The time it takes to do that will depend on maintenance_work_mem, because that affects how many tuples fit in memory. But all the rest of the tuples need to be read sooner or later anyway, so pushing more of them to later doesn't improve things overall it just shifts timing around. After it reports the switch, it still needs to heapify the existing in-memory tuples before the tapesort proper can begin. This is where the true lost opportunities start to arise, as the large heap starts driving cache misses which would not happen at all under different settings. Once the existing tuples are heapified, it then continues to use the heap to pop tuples from it to write out to "tape", and to push newly read tuples onto it. This also suffers lost opportunities. Once all the tuples are written out and it starts merging, then the large maintenance_work_mem is no longer a penalty as the new heap is limited by the number of tapes, which is almost always much smaller. In fact this stage will actually be faster, but not by enough to make up for the earlier slow down. So it is not surprising that the time before the switch is reported is a small part of the overall time difference. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] process deadlocking on its own transactionid?
On Tue, Jul 23, 2013 at 12:54 PM, Kevin Goess wrote: > We're seeing a problem with some of our processes hanging on locks. The > select below makes it look like it's *waiting* for a ShareLock on > transactionid, but it *has* an ExclusiveLock on the same value in > virtualxid. It has an ExclusiveLock on itself, but that is independent of the ShareLock it is waiting for. The transaction it is waiting for is in the transactionid column, which is not in your select list. The virtualxid column seems pretty useless to me, I don't really know why it is there. Also, since you are filtering for only the blocked pid, you will not see the blocking pid in your results, which is probably what you really want to see. > That makes it look like the process has deadlocked on its own transactionid. > Or are we reading the results of this query wrong, and this is expected > behavior, and our problem lies elsewhere? You are reading the results wrong, which is very easy to do. For this type of lock, you need to join the table to itself on the transactionid column. http://wiki.postgresql.org/wiki/Lock_Monitoring Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] process deadlocking on its own transactionid?
Jeff Janes escribió: > The transaction it is waiting for is in the transactionid column, > which is not in your select list. The virtualxid column seems pretty > useless to me, I don't really know why it is there. If you do CREATE INDEX CONCURRENTLY and it has to wait for other processes to finish their current transactions (in order for them to release their snapshots, which is what it needs to ensure the index can be enabled), it will use the virtualxid. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why are stored procedures looked on so negatively?
I've done quite a bit of reading on stored procedures recently and the consensus seems to be that you shouldn't use them unless you really must. I don't understand this argument. If you implement all of your logic in the application then you need to make a network request to the database server, return the required data from the database to the app server, do the processing and then return the results. A stored procedure is going to be a lot faster than that even if you just take away network latency / transfer time. I'm in the middle of building a database and was going to make extensive use of stored procedures and trigger functions because it makes more sense for the actions to happen at the database layer rather than in the app layer. Should I use them or not? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why are stored procedures looked on so negatively?
I presume you're refering to trigger. Since trigger often do something automagically :) and it sometime make developer hard to debug when something wrong since they they do not aware that there are triggers exist in database. Stored procedure is OK. CIIMW Sent from my BlackBerry® powered by Sinyal Kuat INDOSAT -Original Message- From: Some Developer Sender: pgsql-general-owner@postgresql.orgDate: Wed, 24 Jul 2013 01:29:14 To: Subject: [GENERAL] Why are stored procedures looked on so negatively? I've done quite a bit of reading on stored procedures recently and the consensus seems to be that you shouldn't use them unless you really must. I don't understand this argument. If you implement all of your logic in the application then you need to make a network request to the database server, return the required data from the database to the app server, do the processing and then return the results. A stored procedure is going to be a lot faster than that even if you just take away network latency / transfer time. I'm in the middle of building a database and was going to make extensive use of stored procedures and trigger functions because it makes more sense for the actions to happen at the database layer rather than in the app layer. Should I use them or not? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why are stored procedures looked on so negatively?
On 07/23/2013 05:29 PM, Some Developer wrote: I've done quite a bit of reading on stored procedures recently and the consensus seems to be that you shouldn't use them unless you really must. I don't understand this argument. If you implement all of your logic in the application then you need to make a network request to the database server, return the required data from the database to the app server, do the processing and then return the results. A stored procedure is going to be a lot faster than that even if you just take away network latency / transfer time. I'm in the middle of building a database and was going to make extensive use of stored procedures and trigger functions because it makes more sense for the actions to happen at the database layer rather than in the app layer. Should I use them or not? Personally I figure the arguments for and against are closely correlated with where on the development chain you are, and are tied in with job security. If you are an app developer than it is in your interest to have code in the app, if you are a database developer in the database. Me, I am tend to go with your argument about keeping procedures, where appropriate, in the database for the reasons you state. In other words an API in the database. -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why are stored procedures looked on so negatively?
On Tue, Jul 23, 2013 at 5:40 PM, Adrian Klaver wrote: > On 07/23/2013 05:29 PM, Some Developer wrote: > >> I'm in the middle of building a database and was going to make extensive >> use of stored procedures and trigger functions because it makes more >> sense for the actions to happen at the database layer rather than in the >> app layer. >> >> Should I use them or not? >> > > Personally I figure the arguments for and against are closely correlated > with where on the development chain you are, and are tied in with job > security. If you are an app developer than it is in your interest to have > code in the app, if you are a database developer in the database. What he says is very true. But make sure to think about things that may already be set up to manage the application code: versioning, testing, unit testing, packaging, release process, and documentation--how much of that is in place for your stored procedures and triggers? If a developer makes a change to application code, it gets checked in to source control, unit tested, code reviewed, goes through some QA, and is staged for the next roll to production--will that all happen for your stored procedures? And consider, there is already logic in the application, now some of the logic will be in the application and some of it will be in the database--does it make sense to have it in two places? I think those are the kind of concerns that make people shy about putting too much logic in the database. None of them are insurmountable, but you should at least think about them.
Re: [GENERAL] Why are stored procedures looked on so negatively?
Taking an absolutist position either way is pretty blind. What is the purpose of the procedure? Is it enforcing business rules? Are these rules that must be enforced against already existing data or are they more akin to validation of a credit card. How many people are accessing your database at one time? And most importantly, what are you best at? Adrian Klaver wrote: On 07/23/2013 05:29 PM, Some Developer wrote: I've done quite a bit of reading on stored procedures recently and the consensus seems to be that you shouldn't use them unless you really must. I don't understand this argument. If you implement all of your logic in the application then you need to make a network request to the database server, return the required data from the database to the app server, do the processing and then return the results. A stored procedure is going to be a lot faster than that even if you just take away network latency / transfer time. I'm in the middle of building a database and was going to make extensive use of stored procedures and trigger functions because it makes more sense for the actions to happen at the database layer rather than in the app layer. Should I use them or not? Personally I figure the arguments for and against are closely correlated with where on the development chain you are, and are tied in with job security. If you are an app developer than it is in your interest to have code in the app, if you are a database developer in the database. Me, I am tend to go with your argument about keeping procedures, where appropriate, in the database for the reasons you state. In other words an API in the database. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why are stored procedures looked on so negatively?
On 24/07/13 01:55, John Meyer wrote: Taking an absolutist position either way is pretty blind. What is the purpose of the procedure? Is it enforcing business rules? Are these rules that must be enforced against already existing data or are they more akin to validation of a credit card. How many people are accessing your database at one time? And most importantly, what are you best at? Basically what happens is an object is created in the application and saved to the database. When the insert has completed I need to start a process immediately based on the information in the object on another server (Amazon Simple Message Queue to be precise). So basically I'll have a trigger function that fires on INSERTs and does this work. That way the action will only be performed on INSERTs that have successfully completed and I can be sure that the trigger will always fire. On top of that there are a few common data structures that need to be written to the database that would be perfect for stored procedures since there is a little logic involved in saving them which shouldn't really be exposed to the application developers. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why are stored procedures looked on so negatively?
On Tue, Jul 23, 2013 at 06:55:56PM -0600, John Meyer wrote: > are accessing your database at one time? And most importantly, what > are you best at? That is one of the most important questions, for sure, but there's a close second that I'd suggest: what are the scaling properties? For practical purposes, if you're going to do complicated data validation and business logic in the application, you have any significant degree of contention, and you need to write some data, the use pattern is going to look something like this (A is application, D is database): A: get some data D: here you go, optimistic lock value L A: do some work A: given this value, get some more data D: here you go, optimistic lock value L2 A: INS/UPD/DEL data, optimistic lock value L, RETURNING data D: ok, here you go, optimistic lock value L3 A: do some work A: INS/UPD/DEL data, optimistic lock value L3 D: ok And that's if none of the optimistic locks fails. That's a lot of round trips. If you have 20 transactions a minute, this is just fine. If you have 2000 transactions per second, it totally sucks: you're buried in round trips. In my experience, if you want your application to scale to large numbers of users, you need to avoid application<->database round trips. Best, A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] maintenance_work_mem and CREATE INDEX time
On Wed, Jul 24, 2013 at 6:02 AM, Jeff Janes wrote: > On Tue, Jul 23, 2013 at 1:23 AM, Amit Langote wrote: >> On Tue, Jul 23, 2013 at 1:11 PM, Amit Langote >> wrote: >>> Hello, >>> >>> While understanding the effect of maintenance_work_mem on time taken >>> by CREATE INDEX, I observed that for the values of >>> maintenance_work_mem less than the value for which an internal sort is >>> performed, the time taken by CREATE INDEX increases as >>> maintenance_work_increases. >>> >>> My guess is that for all those values an external sort is chosen at >>> some point and larger the value of maintenance_work_mem, later the >>> switch to external sort would be made causing CREATE INDEX to take >>> longer. That is a smaller value of maintenance_work_mem would be >>> preferred for when external sort is performed anyway. Does that make >>> sense? >>> >> >> Upon further investigation, it is found that the delay to switch to >> external sort caused by a larger value of maintenance_work_mem is >> small compared to the total time of CREATE INDEX. > > If you are using trace_sort to report that, it reports the switch as > happening as soon as it runs out of memory. > > At point, all we have been doing is reading tuples into memory. The > time it takes to do that will depend on maintenance_work_mem, because > that affects how many tuples fit in memory. But all the rest of the > tuples need to be read sooner or later anyway, so pushing more of them > to later doesn't improve things overall it just shifts timing around. > > After it reports the switch, it still needs to heapify the existing > in-memory tuples before the tapesort proper can begin. This is where > the true lost opportunities start to arise, as the large heap starts > driving cache misses which would not happen at all under different > settings. > > Once the existing tuples are heapified, it then continues to use the > heap to pop tuples from it to write out to "tape", and to push newly > read tuples onto it. This also suffers lost opportunities. > > Once all the tuples are written out and it starts merging, then the > large maintenance_work_mem is no longer a penalty as the new heap is > limited by the number of tapes, which is almost always much smaller. > In fact this stage will actually be faster, but not by enough to make > up for the earlier slow down. > > So it is not surprising that the time before the switch is reported is > a small part of the overall time difference. > So, is it the actual sorting (before merging) that suffers with larger maintenance_work_mem? I am sorry but I can not grasp the complexity of external sort code at this point, so all I can say is that during an external sort a smaller value of maintenance_work_mem is beneficial (based on my observations in tests). But how that follows from what is going on in the implementation of external sort is still something I am working on understanding. -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] maintenance_work_mem and CREATE INDEX time
On Wed, Jul 24, 2013 at 11:30 AM, Amit Langote wrote: > On Wed, Jul 24, 2013 at 6:02 AM, Jeff Janes wrote: >> If you are using trace_sort to report that, it reports the switch as >> happening as soon as it runs out of memory. >> >> At point, all we have been doing is reading tuples into memory. The >> time it takes to do that will depend on maintenance_work_mem, because >> that affects how many tuples fit in memory. But all the rest of the >> tuples need to be read sooner or later anyway, so pushing more of them >> to later doesn't improve things overall it just shifts timing around. >> >> After it reports the switch, it still needs to heapify the existing >> in-memory tuples before the tapesort proper can begin. This is where >> the true lost opportunities start to arise, as the large heap starts >> driving cache misses which would not happen at all under different >> settings. >> >> Once the existing tuples are heapified, it then continues to use the >> heap to pop tuples from it to write out to "tape", and to push newly >> read tuples onto it. This also suffers lost opportunities. >> >> Once all the tuples are written out and it starts merging, then the >> large maintenance_work_mem is no longer a penalty as the new heap is >> limited by the number of tapes, which is almost always much smaller. >> In fact this stage will actually be faster, but not by enough to make >> up for the earlier slow down. >> >> So it is not surprising that the time before the switch is reported is >> a small part of the overall time difference. >> > > So, is it the actual sorting (before merging) that suffers with larger > maintenance_work_mem? I am sorry but I can not grasp the complexity of > external sort code at this point, so all I can say is that during an > external sort a smaller value of maintenance_work_mem is beneficial > (based on my observations in tests). But how that follows from what is > going on in the implementation of external sort is still something I > am working on understanding. > Or does the increased create index time follow from something else altogether (not any part of external sort) may be still another question. Since we have to relate that to maintenance_work_mem, the first thing I could think of was to look at sorting part of it. -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why are stored procedures looked on so negatively?
On 24/07/13 02:56, Andrew Sullivan wrote: On Tue, Jul 23, 2013 at 06:55:56PM -0600, John Meyer wrote: are accessing your database at one time? And most importantly, what are you best at? That is one of the most important questions, for sure, but there's a close second that I'd suggest: what are the scaling properties? For practical purposes, if you're going to do complicated data validation and business logic in the application, you have any significant degree of contention, and you need to write some data, the use pattern is going to look something like this (A is application, D is database): A: get some data D: here you go, optimistic lock value L A: do some work A: given this value, get some more data D: here you go, optimistic lock value L2 A: INS/UPD/DEL data, optimistic lock value L, RETURNING data D: ok, here you go, optimistic lock value L3 A: do some work A: INS/UPD/DEL data, optimistic lock value L3 D: ok And that's if none of the optimistic locks fails. That's a lot of round trips. If you have 20 transactions a minute, this is just fine. If you have 2000 transactions per second, it totally sucks: you're buried in round trips. In my experience, if you want your application to scale to large numbers of users, you need to avoid application<->database round trips. Best, A Thanks for the response. Obviously since I am still in the development stage I have no idea of the number of transactions I will need to handle but the business has the potential to be quite popular so I'd rather be safe than sorry and be able to handle large amounts of traffic from day one. I think ultimately it'll be simpler this way because the system I am developing is a quasi distributed system with lots of independent parts that need to be able to communicate and to share data with each other. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why are stored procedures looked on so negatively?
Some Developer wrote: > On 24/07/13 01:55, John Meyer wrote: > > Taking an absolutist position either way is pretty blind. What is the > > purpose of the procedure? Is it enforcing business rules? Are these > > rules that must be enforced against already existing data or are they > > more akin to validation of a credit card. How many people are accessing > > your database at one time? And most importantly, what are you best at? > > Basically what happens is an object is created in the application and > saved to the database. When the insert has completed I need to start a > process immediately based on the information in the object on another > server (Amazon Simple Message Queue to be precise). > > So basically I'll have a trigger function that fires on INSERTs and does > this work. That way the action will only be performed on INSERTs that > have successfully completed and I can be sure that the trigger will > always fire. If you want to write a (trigger) function that starts a process on a remote machine, there are a few points to think about: - Should the INSERT fail if the remote process cannot be started? If yes, then a trigger is a good idea. - If you code it as a trigger, be aware that the transaction is not complete until the remote process has been started. That might be a noticable delay and might affect concurrency negatively. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] maintenance_work_mem and CREATE INDEX time
On Wed, Jul 24, 2013 at 3:20 AM, Jeff Janes wrote: > On Mon, Jul 22, 2013 at 9:11 PM, Amit Langote wrote: >> Hello, >> >> While understanding the effect of maintenance_work_mem on time taken >> by CREATE INDEX, I observed that for the values of >> maintenance_work_mem less than the value for which an internal sort is >> performed, the time taken by CREATE INDEX increases as >> maintenance_work_increases. >> >> My guess is that for all those values an external sort is chosen at >> some point and larger the value of maintenance_work_mem, later the >> switch to external sort would be made causing CREATE INDEX to take >> longer. That is a smaller value of maintenance_work_mem would be >> preferred for when external sort is performed anyway. Does that make >> sense? > > The heap structure used in external sorts is cache-unfriendly. The > bigger the heap used, the more this unfriendliness becomes apparent. > And the bigger maintenance_work_mem, the bigger the heap used. > > The bigger heap also means you have fewer "runs" to merge in the > external sort. However, as long as the number of runs still fits in > the same number of merge passes, this is generally not a meaningful > difference. Does fewer runs mean more time (in whichever phase of external sort)? -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general