More two questions.. The patch for mmap calls will be on the mainly development for all archs ? Any problems if i send this patch's for Debian project ?
2011/6/3 Steven Dake <sd...@redhat.com>: > On 06/02/2011 08:16 PM, william felipe_welter wrote: >> Well, >> >> Now with this patch, the pacemakerd process starts and up his other >> process ( crmd, lrmd, pengine....) but after the process pacemakerd do >> a fork, the forked process pacemakerd dies due to "signal 10, Bus >> error".. And on the log, the process of pacemark ( crmd, lrmd, >> pengine....) cant connect to open ais plugin (possible because the >> "death" of the pacemakerd process). >> But this time when the forked pacemakerd dies, he generates a coredump. >> >> gdb -c "/usr/var/lib/heartbeat/cores/root/ pacemakerd 7986" -se >> /usr/sbin/pacemakerd : >> GNU gdb (GDB) 7.0.1-debian >> Copyright (C) 2009 Free Software Foundation, Inc. >> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> >> This is free software: you are free to change and redistribute it. >> There is NO WARRANTY, to the extent permitted by law. Type "show copying" >> and "show warranty" for details. >> This GDB was configured as "sparc-linux-gnu". >> For bug reporting instructions, please see: >> <http://www.gnu.org/software/gdb/bugs/>... >> Reading symbols from /usr/sbin/pacemakerd...done. >> Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols >> found)...done. >> Loaded symbols for /usr/lib64/libuuid.so.1 >> Reading symbols from /usr/lib/libcoroipcc.so.4...done. >> Loaded symbols for /usr/lib/libcoroipcc.so.4 >> Reading symbols from /usr/lib/libcpg.so.4...done. >> Loaded symbols for /usr/lib/libcpg.so.4 >> Reading symbols from /usr/lib/libquorum.so.4...done. >> Loaded symbols for /usr/lib/libquorum.so.4 >> Reading symbols from /usr/lib64/libcrmcommon.so.2...done. >> Loaded symbols for /usr/lib64/libcrmcommon.so.2 >> Reading symbols from /usr/lib/libcfg.so.4...done. >> Loaded symbols for /usr/lib/libcfg.so.4 >> Reading symbols from /usr/lib/libconfdb.so.4...done. >> Loaded symbols for /usr/lib/libconfdb.so.4 >> Reading symbols from /usr/lib64/libplumb.so.2...done. >> Loaded symbols for /usr/lib64/libplumb.so.2 >> Reading symbols from /usr/lib64/libpils.so.2...done. >> Loaded symbols for /usr/lib64/libpils.so.2 >> Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols >> found)...done. >> Loaded symbols for /lib/libbz2.so.1.0 >> Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols >> found)...done. >> Loaded symbols for /usr/lib/libxslt.so.1 >> Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols >> found)...done. >> Loaded symbols for /usr/lib/libxml2.so.2 >> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. >> Loaded symbols for /lib/libc.so.6 >> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done. >> Loaded symbols for /lib/librt.so.1 >> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. >> Loaded symbols for /lib/libdl.so.2 >> Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols >> found)...done. >> Loaded symbols for /lib/libglib-2.0.so.0 >> Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols >> found)...done. >> Loaded symbols for /usr/lib/libltdl.so.7 >> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols >> found)...done. >> Loaded symbols for /lib/ld-linux.so.2 >> Reading symbols from /lib/libpthread.so.0...(no debugging symbols >> found)...done. >> Loaded symbols for /lib/libpthread.so.0 >> Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. >> Loaded symbols for /lib/libm.so.6 >> Reading symbols from /usr/lib/libz.so.1...(no debugging symbols >> found)...done. >> Loaded symbols for /usr/lib/libz.so.1 >> Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done. >> Loaded symbols for /lib/libpcre.so.3 >> Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols >> found)...done. >> Loaded symbols for /lib/libnss_compat.so.2 >> Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. >> Loaded symbols for /lib/libnsl.so.1 >> Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols >> found)...done. >> Loaded symbols for /lib/libnss_nis.so.2 >> Reading symbols from /lib/libnss_files.so.2...(no debugging symbols >> found)...done. >> Loaded symbols for /lib/libnss_files.so.2 >> Core was generated by `pacemakerd'. >> Program terminated with signal 10, Bus error. >> #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at >> cpg.c:339 >> 339 switch (dispatch_data->id) { >> (gdb) bt >> #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at >> cpg.c:339 >> #1 0xf6f100f0 in ?? () >> #2 0xf6f100f4 in ?? () >> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >> >> >> >> I take a look at the cpg.c and see that the dispatch_data was aquired >> by coroipcc_dispatch_get (that was defined on lib/coroipcc.c) >> function: >> >> do { >> error = coroipcc_dispatch_get ( >> cpg_inst->handle, >> (void **)&dispatch_data, >> timeout); >> >> >> > > Try the recent patch sent to fix alignment. > > Regards > -steve > >> >> Resumed log: >> ... >> un 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 >> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue >> Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10 >> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10 >> to pending delivery queue >> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including f >> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 10 >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: >> Forked child 7991 for process lrmd >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: >> update_node_processes: Node xxxxxxxxxx now has process list: >> 00000000000000000000000000100112 (was >> 00000000000000000000000000100102) >> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 >> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue >> Jun 02 23:12:20 corosync [TOTEM ] Delivering 10 to 11 >> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 11 >> to pending delivery queue >> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 11 >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: >> Forked child 7992 for process attrd >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: >> update_node_processes: Node xxxxxxxxxx now has process list: >> 00000000000000000000000000101112 (was >> 00000000000000000000000000100112) >> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 >> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue >> Jun 02 23:12:20 corosync [TOTEM ] Delivering 11 to 12 >> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 12 >> to pending delivery queue >> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 12 >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: >> Forked child 7993 for process pengine >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: >> update_node_processes: Node xxxxxxxxxx now has process list: >> 00000000000000000000000000111112 (was >> 00000000000000000000000000101112) >> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 >> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue >> Jun 02 23:12:20 corosync [TOTEM ] Delivering 12 to 13 >> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 13 >> to pending delivery queue >> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 13 >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: >> Forked child 7994 for process crmd >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: >> update_node_processes: Node xxxxxxxxxx now has process list: >> 00000000000000000000000000111312 (was >> 00000000000000000000000000111112) >> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 >> Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: main: Starting mainloop >> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue >> Jun 02 23:12:20 corosync [TOTEM ] Delivering 13 to 14 >> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 14 >> to pending delivery queue >> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 14 >> Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 >> Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue >> Jun 02 23:12:20 corosync [TOTEM ] Delivering 14 to 15 >> Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 15 >> to pending delivery queue >> Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 15 >> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: Invoked: >> /usr/lib64/heartbeat/stonithd >> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: >> crm_log_init_worker: Changed active directory to >> /usr/var/lib/heartbeat/cores/root >> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: get_cluster_type: >> Cluster type is: 'openais'. >> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: >> crm_cluster_connect: Connecting to cluster infrastructure: classic >> openais (with plugin) >> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: >> init_ais_connection_classic: Creating connection to our Corosync >> plugin >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_log_init_worker: >> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: retrieveCib: Reading >> cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml >> (digest: /usr/var/lib/heartbeat/crm/cib.xml.sig) >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: retrieveCib: Cluster >> configuration not found: /usr/var/lib/heartbeat/crm/cib.xml >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: Primary >> configuration corrupt or unusable, trying backup... >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: get_last_sequence: >> Series file /usr/var/lib/heartbeat/crm/cib.last does not exist >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile: Backup >> file /usr/var/lib/heartbeat/crm/cib-99.raw not found >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: >> Continuing with an empty configuration. >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> <cib epoch="0" num_updates="0" admin_epoch="0" >> validate-with="pacemaker-1.2" > >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> <configuration > >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> <crm_config /> >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> <nodes /> >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> <resources /> >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> <constraints /> >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> </configuration> >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] >> <status /> >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] </cib> >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: validate_with_relaxng: >> Creating RNG parser context >> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: >> init_ais_connection_classic: Connection to our AIS plugin (9) failed: >> Doesn't exist (12) >> Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: CRIT: main: Cannot sign >> in to the cluster... terminating >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: Invoked: >> /usr/lib64/heartbeat/crmd >> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: Invoked: >> /usr/lib64/heartbeat/pengine >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crm_log_init_worker: >> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster >> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: crm_log_init_worker: >> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: main: CRM Hg Version: >> e872eeb39a5f6e1fdb57c3108551a5353648c4f4 >> >> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Checking for >> old instances of pengine >> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: >> /usr/var/run/crm/pengine >> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: enabling coredumps >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crmd_init: Starting crmd >> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: >> init_client_ipc_comms_nodispatch: Could not init comms on: >> /usr/var/run/crm/pengine >> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: debug: main: run the loop... >> Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: Started. >> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Init server comms >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: s_crmd_fsa: Processing >> I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ] >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action: >> actions:trace: // A_LOG >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action: >> actions:trace: // A_STARTUP >> Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: main: Starting pengine >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: >> Registering Signal Handlers >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: Creating >> CIB and LRM objects >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action: >> actions:trace: // A_CIB_START >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: >> /usr/var/run/crm/cib_rw >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: >> init_client_ipc_comms_nodispatch: Could not init comms on: >> /usr/var/run/crm/cib_rw >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: >> Connection to command channel failed >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: >> /usr/var/run/crm/cib_callback >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: >> init_client_ipc_comms_nodispatch: Could not init comms on: >> /usr/var/run/crm/cib_callback >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: >> Connection to callback channel failed >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: >> Connection to CIB failed: connection failed >> Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signoff: >> Signing out of the CIB Service >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: activateCibXml: >> Triggering CIB write for start op >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: startCib: CIB >> Initialization completed successfully >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: get_cluster_type: >> Cluster type is: 'openais'. >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_cluster_connect: >> Connecting to cluster infrastructure: classic openais (with plugin) >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: >> init_ais_connection_classic: Creating connection to our Corosync >> plugin >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: >> init_ais_connection_classic: Connection to our AIS plugin (9) failed: >> Doesn't exist (12) >> Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: CRIT: cib_init: Cannot sign in >> to the cluster... terminating >> Jun 02 23:12:21 corosync [CPG ] exit_fn for conn=0x62500 >> Jun 02 23:12:21 corosync [TOTEM ] mcasted message added to pending queue >> Jun 02 23:12:21 corosync [TOTEM ] Delivering 15 to 16 >> Jun 02 23:12:21 corosync [TOTEM ] Delivering MCAST message with seq 16 >> to pending delivery queue >> Jun 02 23:12:21 corosync [CPG ] got procleave message from cluster >> node 1377289226 >> Jun 02 23:12:21 corosync [TOTEM ] releasing messages up to and including 16 >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: Invoked: >> /usr/lib64/heartbeat/attrd >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_log_init_worker: >> Changed active directory to /usr/var/lib/heartbeat/cores/hacluster >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Starting up >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: get_cluster_type: >> Cluster type is: 'openais'. >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_cluster_connect: >> Connecting to cluster infrastructure: classic openais (with plugin) >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: >> init_ais_connection_classic: Creating connection to our Corosync >> plugin >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: >> init_ais_connection_classic: Connection to our AIS plugin (9) failed: >> Doesn't exist (12) >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: HA Signon failed >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Cluster connection >> active >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Accepting >> attribute updates >> Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: Aborting startup >> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: >> /usr/var/run/crm/cib_rw >> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: >> init_client_ipc_comms_nodispatch: Could not init comms on: >> /usr/var/run/crm/cib_rw >> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: >> Connection to command channel failed >> Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: >> init_client_ipc_comms_nodispatch: Attempting to talk on: >> /usr/var/run/crm/cib_callback >> ... >> >> >> 2011/6/2 Steven Dake <sd...@redhat.com>: >>> On 06/01/2011 11:05 PM, william felipe_welter wrote: >>>> I recompile my kernel without hugetlb .. and the result are the same.. >>>> >>>> My test program still resulting: >>>> PATH=/dev/shm/teste123XXXXXX >>>> page size=20000 >>>> fd=3 >>>> ADDR_ORIG:0xe000a000 ADDR:0xffffffff >>>> Erro >>>> >>>> And Pacemaker still resulting because the mmap error: >>>> Could not initialize Cluster Configuration Database API instance error 2 >>>> >>> >>> Give the patch I posted recently a spin - corosync WFM with this patch >>> on sparc64 with hugetlb set. Please report back results. >>> >>> Regards >>> -steve >>> >>>> For make sure that i have disable the hugetlb there is my /proc/meminfo: >>>> MemTotal: 33093488 kB >>>> MemFree: 32855616 kB >>>> Buffers: 5600 kB >>>> Cached: 53480 kB >>>> SwapCached: 0 kB >>>> Active: 45768 kB >>>> Inactive: 28104 kB >>>> Active(anon): 18024 kB >>>> Inactive(anon): 1560 kB >>>> Active(file): 27744 kB >>>> Inactive(file): 26544 kB >>>> Unevictable: 0 kB >>>> Mlocked: 0 kB >>>> SwapTotal: 6104680 kB >>>> SwapFree: 6104680 kB >>>> Dirty: 0 kB >>>> Writeback: 0 kB >>>> AnonPages: 14936 kB >>>> Mapped: 7736 kB >>>> Shmem: 4624 kB >>>> Slab: 39184 kB >>>> SReclaimable: 10088 kB >>>> SUnreclaim: 29096 kB >>>> KernelStack: 7088 kB >>>> PageTables: 1160 kB >>>> Quicklists: 17664 kB >>>> NFS_Unstable: 0 kB >>>> Bounce: 0 kB >>>> WritebackTmp: 0 kB >>>> CommitLimit: 22651424 kB >>>> Committed_AS: 519368 kB >>>> VmallocTotal: 1069547520 kB >>>> VmallocUsed: 11064 kB >>>> VmallocChunk: 1069529616 kB >>>> >>>> >>>> 2011/6/1 Steven Dake <sd...@redhat.com>: >>>>> On 06/01/2011 07:42 AM, william felipe_welter wrote: >>>>>> Steven, >>>>>> >>>>>> cat /proc/meminfo >>>>>> ... >>>>>> HugePages_Total: 0 >>>>>> HugePages_Free: 0 >>>>>> HugePages_Rsvd: 0 >>>>>> HugePages_Surp: 0 >>>>>> Hugepagesize: 4096 kB >>>>>> ... >>>>>> >>>>> >>>>> It definitely requires a kernel compile and setting the config option to >>>>> off. I don't know the debian way of doing this. >>>>> >>>>> The only reason you may need this option is if you have very large >>>>> memory sizes, such as 48GB or more. >>>>> >>>>> Regards >>>>> -steve >>>>> >>>>>> Its 4MB.. >>>>>> >>>>>> How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to >>>>>> kernel ?) >>>>>> >>>>>> 2011/6/1 Steven Dake <sd...@redhat.com <mailto:sd...@redhat.com>> >>>>>> >>>>>> On 06/01/2011 01:05 AM, Steven Dake wrote: >>>>>> > On 05/31/2011 09:44 PM, Angus Salkeld wrote: >>>>>> >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter >>>>>> wrote: >>>>>> >>> Angus, >>>>>> >>> >>>>>> >>> I make some test program (based on the code coreipcc.c) and i >>>>>> now i sure >>>>>> >>> that are problems with the mmap systems call on sparc.. >>>>>> >>> >>>>>> >>> Source code of my test program: >>>>>> >>> >>>>>> >>> #include <stdlib.h> >>>>>> >>> #include <sys/mman.h> >>>>>> >>> #include <stdio.h> >>>>>> >>> >>>>>> >>> #define PATH_MAX 36 >>>>>> >>> >>>>>> >>> int main() >>>>>> >>> { >>>>>> >>> >>>>>> >>> int32_t fd; >>>>>> >>> void *addr_orig; >>>>>> >>> void *addr; >>>>>> >>> char path[PATH_MAX]; >>>>>> >>> const char *file = "teste123XXXXXX"; >>>>>> >>> size_t bytes=10024; >>>>>> >>> >>>>>> >>> snprintf (path, PATH_MAX, "/dev/shm/%s", file); >>>>>> >>> printf("PATH=%s\n",path); >>>>>> >>> >>>>>> >>> fd = mkstemp (path); >>>>>> >>> printf("fd=%d \n",fd); >>>>>> >>> >>>>>> >>> >>>>>> >>> addr_orig = mmap (NULL, bytes, PROT_NONE, >>>>>> >>> MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); >>>>>> >>> >>>>>> >>> >>>>>> >>> addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE, >>>>>> >>> MAP_FIXED | MAP_SHARED, fd, 0); >>>>>> >>> >>>>>> >>> printf("ADDR_ORIG:%p ADDR:%p\n",addr_orig,addr); >>>>>> >>> >>>>>> >>> >>>>>> >>> if (addr != addr_orig) { >>>>>> >>> printf("Erro"); >>>>>> >>> } >>>>>> >>> } >>>>>> >>> >>>>>> >>> Results on x86: >>>>>> >>> PATH=/dev/shm/teste123XXXXXX >>>>>> >>> fd=3 >>>>>> >>> ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000 >>>>>> >>> >>>>>> >>> Results on sparc: >>>>>> >>> PATH=/dev/shm/teste123XXXXXX >>>>>> >>> fd=3 >>>>>> >>> ADDR_ORIG:0xf7f72000 ADDR:0xffffffff >>>>>> >> >>>>>> >> Note: 0xffffffff == MAP_FAILED >>>>>> >> >>>>>> >> (from man mmap) >>>>>> >> RETURN VALUE >>>>>> >> On success, mmap() returns a pointer to the mapped area. >>>>>> On >>>>>> >> error, the value MAP_FAILED (that is, (void *) -1) is >>>>>> returned, >>>>>> >> and errno is set appropriately. >>>>>> >> >>>>>> >>> >>>>>> >>> >>>>>> >>> But im wondering if is really needed to call mmap 2 times ? >>>>>> What are the >>>>>> >>> reason to call the mmap 2 times, on the second time using the >>>>>> address of the >>>>>> >>> first? >>>>>> >>> >>>>>> >>> >>>>>> >> Well there are 3 calls to mmap() >>>>>> >> 1) one to allocate 2 * what you need (in pages) >>>>>> >> 2) maps the first half of the mem to a real file >>>>>> >> 3) maps the second half of the mem to the same file >>>>>> >> >>>>>> >> The point is when you write to an address over the end of the >>>>>> >> first half of memory it is taken care of the the third mmap which >>>>>> maps >>>>>> >> the address back to the top of the file for you. This means you >>>>>> >> don't have to worry about ringbuffer wrapping which can be a >>>>>> headache. >>>>>> >> >>>>>> >> -Angus >>>>>> >> >>>>>> > >>>>>> > interesting this mmap operation doesn't work on sparc linux. >>>>>> > >>>>>> > Not sure how I can help here - Next step would be a follow up with >>>>>> the >>>>>> > sparc linux mailing list. I'll do that and cc you on the message >>>>>> - see >>>>>> > if we get any response. >>>>>> > >>>>>> > http://vger.kernel.org/vger-lists.html >>>>>> > >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> 2011/5/31 Angus Salkeld <asalk...@redhat.com >>>>>> <mailto:asalk...@redhat.com>> >>>>>> >>> >>>>>> >>>> On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter >>>>>> wrote: >>>>>> >>>>> Thanks Steven, >>>>>> >>>>> >>>>>> >>>>> Now im try to run on the MCP: >>>>>> >>>>> - Uninstall the pacemaker 1.0 >>>>>> >>>>> - Compile and install 1.1 >>>>>> >>>>> >>>>>> >>>>> But now i have problems to initialize the pacemakerd: Could not >>>>>> >>>> initialize >>>>>> >>>>> Cluster Configuration Database API instance error 2 >>>>>> >>>>> Debbuging with gdb i see that the error are on the confdb.. >>>>>> most >>>>>> >>>> specificaly >>>>>> >>>>> the errors start on coreipcc.c at line: >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> 448 if (addr != addr_orig) { >>>>>> >>>>> 449 goto error_close_unlink; <- enter here >>>>>> >>>>> 450 } >>>>>> >>>>> >>>>>> >>>>> Some ideia about what can cause this ? >>>>>> >>>>> >>>>>> >>>> >>>>>> >>>> I tried porting a ringbuffer (www.libqb.org >>>>>> <http://www.libqb.org>) to sparc and had the same >>>>>> >>>> failure. >>>>>> >>>> There are 3 mmap() calls and on sparc the third one keeps >>>>>> failing. >>>>>> >>>> >>>>>> >>>> This is a common way of creating a ring buffer, see: >>>>>> >>>> >>>>>> >>>>>> http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation >>>>>> >>>> >>>>>> >>>> I couldn't get it working in the short time I tried. It's >>>>>> probably >>>>>> >>>> worth looking at the clib implementation to see why it's failing >>>>>> >>>> (I didn't get to that). >>>>>> >>>> >>>>>> >>>> -Angus >>>>>> >>>> >>>>>> >>>>>> Note, we sorted this out we believe. Your kernel has hugetlb >>>>>> enabled, >>>>>> probably with 4MB pages. This requires corosync to allocate 4MB >>>>>> pages. >>>>>> >>>>>> Can you verify your hugetlb settings? >>>>>> >>>>>> If you can turn this option off, you should have atleast a working >>>>>> corosync. >>>>>> >>>>>> Regards >>>>>> -steve >>>>>> >>>> >>>>>> >>>> _______________________________________________ >>>>>> >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> <mailto:Pacemaker@oss.clusterlabs.org> >>>>>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>> >>>>>> >>>> Project Home: http://www.clusterlabs.org >>>>>> >>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> >>>> Bugs: >>>>>> >>>> >>>>>> >>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>>> >>>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> -- >>>>>> >>> William Felipe Welter >>>>>> >>> ------------------------------ >>>>>> >>> Consultor em Tecnologias Livres >>>>>> >>> william.wel...@4linux.com.br >>>>>> <mailto:william.wel...@4linux.com.br> >>>>>> >>> www.4linux.com.br <http://www.4linux.com.br> >>>>>> >> >>>>>> >>> _______________________________________________ >>>>>> >>> Openais mailing list >>>>>> >>> open...@lists.linux-foundation.org >>>>>> <mailto:open...@lists.linux-foundation.org> >>>>>> >>> https://lists.linux-foundation.org/mailman/listinfo/openais >>>>>> >> >>>>>> >> >>>>>> >> _______________________________________________ >>>>>> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> <mailto:Pacemaker@oss.clusterlabs.org> >>>>>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >> >>>>>> >> Project Home: http://www.clusterlabs.org >>>>>> >> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> >> Bugs: >>>>>> >>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>>> > >>>>>> > _______________________________________________ >>>>>> > Openais mailing list >>>>>> > open...@lists.linux-foundation.org >>>>>> <mailto:open...@lists.linux-foundation.org> >>>>>> > https://lists.linux-foundation.org/mailman/listinfo/openais >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> <mailto:Pacemaker@oss.clusterlabs.org> >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: >>>>>> >>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> William Felipe Welter >>>>>> ------------------------------ >>>>>> Consultor em Tecnologias Livres >>>>>> william.wel...@4linux.com.br <mailto:william.wel...@4linux.com.br> >>>>>> www.4linux.com.br <http://www.4linux.com.br> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: >>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> >> >> > > -- William Felipe Welter ------------------------------ Consultor em Tecnologias Livres william.wel...@4linux.com.br www.4linux.com.br _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker