Hi,

Yes, I agree, 8.3 is out of support for a long time and this is the reason
we are trying to migrate to 9.3 using SLONY to minimize downtime.

I eliminated the possibility of data corruption as the limit/offset
technique indicated different rows each time it was executed. Actually, the
failure is still happening and as it is running in a virtual machine,
memory size configuration for this virtual machine was increased from 48GB
to 64GB and we have scheduled a server shutdown/restart for the next coming
weekend in order to try to get rid of the failure.

The replication activity was aborted: SLONY triggers removed, SLONY
processes terminated and SLONY schemas removed.

Ulimit output was appended at the end of this note.

Memory statistics dump from postmaster log resulted from a select * from
"8147_spunico"."sincdc"; command:

Thank you!

TopMemoryContext: 80800 total in 9 blocks; 4088 free (10 chunks); 76712 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512
used
  TopTransactionContext: 8192 total in 1 blocks; 7648 free (1 chunks); 544
used
  MessageContext: 57344 total in 3 blocks; 40760 free (6 chunks); 16584 used
  smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks);
10672 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks);
32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
      ExecutorState: 381096528 total in 6 blocks; 49856 free (30 chunks);
381046672 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
  CacheMemoryContext: 817392 total in 20 blocks; 230456 free (3 chunks);
586936 used
    pg_toast_729119_index: 2048 total in 1 blocks; 608 free (0 chunks);
1440 used
    idx_sincdc_situacao: 2048 total in 1 blocks; 752 free (0 chunks); 1296
used
    idx_sincdc_esqtab: 2048 total in 1 blocks; 656 free (0 chunks); 1392
used
    idx_sincdc_datahoraexp: 2048 total in 1 blocks; 752 free (0 chunks);
1296 used
    pk_sincdc: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
    pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0
chunks); 1440 used
    pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks);
1688 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
    pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_cast_source_target_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
    pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328
used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
    pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2
chunks); 1472 used
    pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks);
1688 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free (3
chunks); 1640 used
    pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2
chunks); 1376 used
    pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376
used
    pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks);
1376 used
    pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks);
1448 used
    pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2
chunks); 1424 used
    pg_operator_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1648 free (2 chunks);
1424 used
    pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328
used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3
chunks); 1448 used
    pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks);
1328 used
  MdSmgr: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used
  LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used


Ulimit output:
ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 385725
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


On Thu, Dec 11, 2014 at 1:30 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Carlos Henrique Reimer <carlos.rei...@opendb.com.br> writes:
> > I've facing an out of memory condition after running SLONY several hours
> to
> > get a 1TB database with about 23,000 tables replicated. The error occurs
> > after about 50% of the tables were replicated.
>
> I'd try bringing this up with the Slony crew.
>
> > I guess postgresql is trying to perform an atomic allocation (those which
> > cannot wait for reclaim) to get a continues memory area and is failing
> due
> > to memory fragmentation.
>
> This theory has nothing to do with reality.  More likely it's just a
> garden variety memory leak.  If it was an out-of-memory error reported
> by Postgres, there should have been a memory statistics dump written in
> the postmaster log --- can you find that and post it?
>
> Another possible theory is that you're just looking at lots of memory
> needed to hold relcache entries for all 23000 tables :-(.  If so there
> may not be any easy way around it, except perhaps replicating subsets
> of the tables.  Unless you can boost the memory available to the backend
> --- since this is a 64 bit build, the only reason I can see for
> out-of-memory failures would be a restrictive ulimit setting.
>
> > After SLONY gets the out of memory condition, select * of the table also
> > does not work:
> > FiscalWeb=# select * from "8147_spunico"."sincdc";
> > ERROR:  out of memory
> > DETAIL:  Failed on request of size 268435456.
>
> That's odd ... looks more like data corruption than anything else.
> Does this happen even in a fresh session?  What do you have to do
> to get rid of the failure?
>
> >  PostgreSQL 8.3.21 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> > 4.4.6 20120305 (Red Hat 4.4.6-4)
>
> You realize of course that this version is years out of support, and that
> even if this problem traces to a bug in Postgres, 8.3 is not going to get
> fixed.
>
>                         regards, tom lane
>



-- 
Reimer
47-3347-1724 47-9183-0547 msn: carlos.rei...@opendb.com.br

Reply via email to