Hi, Yes, I agree, 8.3 is out of support for a long time and this is the reason we are trying to migrate to 9.3 using SLONY to minimize downtime.
I eliminated the possibility of data corruption as the limit/offset technique indicated different rows each time it was executed. Actually, the failure is still happening and as it is running in a virtual machine, memory size configuration for this virtual machine was increased from 48GB to 64GB and we have scheduled a server shutdown/restart for the next coming weekend in order to try to get rid of the failure. The replication activity was aborted: SLONY triggers removed, SLONY processes terminated and SLONY schemas removed. Ulimit output was appended at the end of this note. Memory statistics dump from postmaster log resulted from a select * from "8147_spunico"."sincdc"; command: Thank you! TopMemoryContext: 80800 total in 9 blocks; 4088 free (10 chunks); 76712 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (1 chunks); 544 used MessageContext: 57344 total in 3 blocks; 40760 free (6 chunks); 16584 used smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks); 10672 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used PortalHeapMemory: 1024 total in 1 blocks; 720 free (0 chunks); 304 used ExecutorState: 381096528 total in 6 blocks; 49856 free (30 chunks); 381046672 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used CacheMemoryContext: 817392 total in 20 blocks; 230456 free (3 chunks); 586936 used pg_toast_729119_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used idx_sincdc_situacao: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used idx_sincdc_esqtab: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used idx_sincdc_datahoraexp: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used pk_sincdc: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_cast_source_target_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_operator_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_amop_opr_fam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used MdSmgr: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used Ulimit output: ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 385725 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited On Thu, Dec 11, 2014 at 1:30 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Carlos Henrique Reimer <carlos.rei...@opendb.com.br> writes: > > I've facing an out of memory condition after running SLONY several hours > to > > get a 1TB database with about 23,000 tables replicated. The error occurs > > after about 50% of the tables were replicated. > > I'd try bringing this up with the Slony crew. > > > I guess postgresql is trying to perform an atomic allocation (those which > > cannot wait for reclaim) to get a continues memory area and is failing > due > > to memory fragmentation. > > This theory has nothing to do with reality. More likely it's just a > garden variety memory leak. If it was an out-of-memory error reported > by Postgres, there should have been a memory statistics dump written in > the postmaster log --- can you find that and post it? > > Another possible theory is that you're just looking at lots of memory > needed to hold relcache entries for all 23000 tables :-(. If so there > may not be any easy way around it, except perhaps replicating subsets > of the tables. Unless you can boost the memory available to the backend > --- since this is a 64 bit build, the only reason I can see for > out-of-memory failures would be a restrictive ulimit setting. > > > After SLONY gets the out of memory condition, select * of the table also > > does not work: > > FiscalWeb=# select * from "8147_spunico"."sincdc"; > > ERROR: out of memory > > DETAIL: Failed on request of size 268435456. > > That's odd ... looks more like data corruption than anything else. > Does this happen even in a fresh session? What do you have to do > to get rid of the failure? > > > PostgreSQL 8.3.21 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) > > 4.4.6 20120305 (Red Hat 4.4.6-4) > > You realize of course that this version is years out of support, and that > even if this problem traces to a bug in Postgres, 8.3 is not going to get > fixed. > > regards, tom lane > -- Reimer 47-3347-1724 47-9183-0547 msn: carlos.rei...@opendb.com.br