On Fri, Jan 10, 2014 at 11:09 AM, Steve Atkins <st...@blighty.com> wrote:
> > On Jan 10, 2014, at 8:35 AM, Preston Hagar <prest...@gmail.com> wrote: > > > tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors > despite the server now having 32 GB instead of 4 GB of RAM and the workload > and number of clients remaining the same. > > > > > > Details: > > > > We have been using Postgresql for some time internally with much > success. Recently, we completed a migration off of an older server running > 8.3 to a new server running 9.3. The older server had 4GB of RAM, the new > server has 32 GB. > > > > For some reason, since migrating we are getting lots of "out of memory" > and "cannot allocate memory" errors on the new server when the server gets > under a decent load. We have upped shmmax to 17179869184 and shmall to > 4194304. > > What are the exact error messages you’re getting, and where are you seeing > them? > > Thanks for the reply. I'm seeing them in the main postgresql log (/var/log/postgresql/postgresql-9.3-main.log) Here are a couple of examples from the incident we had this morning: 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for connection: Cannot allocate memory 2014-01-10 06:14:40 CST 30176 LOG: could not fork new process for connection: Cannot allocate memory TopMemoryContext: 160408 total in 19 blocks; 12984 free (41 chunks); 147424 used TopTransactionContext: 8192 total in 1 blocks; 7392 free (0 chunks); 800 used Btree proof lookup cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used Type information cache: 24240 total in 2 blocks; 3744 free (0 chunks); 20496 used Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used MessageContext: 524288 total in 7 blocks; 225240 free (3 chunks); 299048 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used smgr relation table: 57344 total in 3 blocks; 34320 free (10 chunks); 23024 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used PortalHeapMemory: 1024 total in 1 blocks; 768 free (0 chunks); 256 used ExecutorState: 329080 total in 8 blocks; 105944 free (4 chunks); 223136 used TupleSort: 32816 total in 2 blocks; 176 free (2 chunks); 32640 used printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 8128 free (3 chunks); 64 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 7896 free (0 chunks); 296 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 24576 total in 2 blocks; 11792 free (3 chunks); 12784 used CacheMemoryContext: 9815680 total in 25 blocks; 8143416 free (1 chunks); 1672264 used pg_toast_3598032_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used mls_office_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used staff_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used staff_desk_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used person_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used person_person_workphone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used person_person_mobilephone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used person_person_lastname_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used person_person_homephone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used person_person_homeofficephone_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used person_person_firstname_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used agent_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used agent_membertype_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used agent_floydtraining_id: 1024 total in 1 blocks; 152 free (0 chunks); 872 used agent_agent_sync_realtorid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used agent_agent_sync_oagentid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used agent_agent_personid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used agent_agent_license_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_toast_2619_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used mls_agent_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_uidprp_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_streetnum_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_streetname: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_sqfttotal_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_ownername_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_officelist_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_modified: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_mlsnumber_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_liststatus_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_listprice_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_city_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_beds_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_res_bathsfull_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_subdivision: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_zipcode: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_uidprp: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_priority: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_officelist: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_modified: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_mlsnum: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_longitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_liststatus: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_listprice: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_latitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_ishud_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_city_upcase_idx: 3072 total in 2 blocks; 1848 free (0 chunks); 1224 used mls_prp_city: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_agentlist2: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_prp_agentlist: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_media_uid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_media_tableuid: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_media_mediasource_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used mls_media_custompic_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_sync_ilresid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_streetnumber_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_streetname_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_mlsnumber_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_longitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_listingid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_latitude_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_brokerageid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listing_agentid_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used listing_listdate_idx: 1024 total in 1 blocks; 152 free (0 chunks); 872 used code_primary: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used class_primary: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_index_indrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_enum_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used pg_foreign_server_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_cast_source_target_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used pg_language_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_collation_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_amop_fam_strat_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_index_indexrelid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_ts_template_tmplname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_ts_config_map_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used pg_opclass_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_event_trigger_evtname_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_ts_dict_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_event_trigger_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_conversion_default_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used pg_enum_typid_label_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_ts_config_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_user_mapping_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used pg_foreign_table_relid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_type_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_aggregate_fnoid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_constraint_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_ts_parser_prsname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_ts_config_cfgname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_ts_parser_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_operator_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_namespace_nspname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_ts_template_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_amop_opr_fam_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1784 free (2 chunks); 1288 used pg_range_rngtypid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_ts_dict_dictname_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used pg_opfamily_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_class_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used pg_proc_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_language_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_namespace_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1736 free (2 chunks); 1336 used pg_foreign_server_name_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_conversion_oid_index: 1024 total in 1 blocks; 200 free (0 chunks); 824 used pg_user_mapping_user_server_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_conversion_name_nsp_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_authid_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_auth_members_member_role_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_tablespace_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_database_datname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_auth_members_role_member_index: 1024 total in 1 blocks; 64 free (0 chunks); 960 used pg_database_oid_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_authid_rolname_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used MdSmgr: 8192 total in 1 blocks; 5440 free (0 chunks); 2752 used ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used hba parser context: 15360 total in 4 blocks; 6544 free (2 chunks); 8816 used LOCALLOCK hash: 24576 total in 2 blocks; 13920 free (4 chunks); 10656 used Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production 10.1.1.6(36680)ERROR: out of memory 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production 10.1.1.6(36680)DETAIL: Failed on request of size 500. After some of the connections cleared up, (we load a lot of data around that time), things got "back to normal". Our Nagios monitoring showed a 30 minute window or so when no new connections could be made and the log is filled with more or less the same lines. > > > > We had originally copied our shared_buffers, work_mem, wal_buffers and > other similar settings from our old config, but after getting the memory > errors have tweaked them to the following: > > > > shared_buffers = 7680MB > > temp_buffers = 12MB > > max_prepared_transactions = 0 > > work_mem = 80MB > > maintenance_work_mem = 1GB > > wal_buffers = 8MB > > max_connections = 350 > > > > The current settings seem to have helped, but we are still occasionally > getting the errors. > > > > The weird thing is that our old server had 1/8th the RAM, was set to > max_connections = 600 and had the same clients connecting in the same way > to the same databases and we never saw any errors like this in the several > years we have been using it. > > > > One issue I could see is that one of our main applications that connects > to the database, opens a connection on startup, holds it open the entire > time it is running, and doesn't close it until the app is closed. In daily > usage, for much of our staff it is opened first thing in the morning and > left open all day (meaning the connection is held open for 8+ hours). This > was never an issue with 8.3, but I know it isn't a "best practice" in > general. > > That’s probably not related to the problems you’re seeing - I have apps > that hold a connection to the database open for years. As long as it > doesn’t keep a transaction open for a long time, you’re fine. > > Good to know, thanks. > > > > We are working to update our application to be able to use pgbouncer > with transaction connections to try to alleviate the long held connections, > but it will take some time. > > Using pgbouncer is probably a good idea - to reduce the number of > concurrent connections, rather than the length of connections, though. > > I'm testing a updated version of our application against pgbouncer now so hopefully that will help some. > > > > In the meantime, is there some other major difference or setting in 9.3 > that we should look out for that could be causing this? Like I said, the > same database with the same load and number of clients has been running on > a 8.3 install for years (pretty much since 2008 when 8.3 was released) with > lesser hardware with no issues. > > > > Let me know if any other information would help out or if anyone has > suggestions of things to check. > > Cheers, > Steve > > > Thanks for your help. Preston