On 8 Nov 2013, at 10:27 am, Andrew Beekhof <and...@beekhof.net> wrote:
> > On 7 Oct 2013, at 5:52 pm, Mailing List SVR <li...@svrinformatica.it> wrote: > >> Il 07/10/2013 04:16, Andrew Beekhof ha scritto: >>> On 05/10/2013, at 7:11 AM, Mailing List SVR <li...@svrinformatica.it> >>> wrote: >>> >>> >>>> Hi, >>>> >>>> I have a pacemaker cluster running fine since 2 months, I noticed that in >>>> the folder /var/lib/pacemaker/cores/root I have about 1,5 GB of files >>>> core.xxxx, who is responsabile to cleanup these files, >>>> >>> Ideally they would have been reported upstream so the underlying problem >>> that caused them could be fixed. >> >> if you are interested here are some core dumps: >> >> http://195.250.34.59/temp/cores.tar.bz2 >> > > dammit, we're not correctly collecting metadata for the 'service' class. > these core files are produced when we try to parse the result as xml. The others are borking on control characters in the xml string. lrmd_rsc_output=\"Stopping postgresql service: \033[60G[\033[0;32m OK \033[0; ..." which was fixed by https://github.com/beekhof/pacemaker/commit/c351934 and is included in https://rhn.redhat.com/errata/RHEA-2013-1493.html > > at least > > [root@pcmk-5 ~]# crm_resource --show-metadata service:nfs > Usage: nfs > {start|stop|status|restart|reload|force-reload|condrestart|try-restart|condstop} > > vs. > > [root@pcmk-5 ~]# crm_resource --show-metadata lsb:nfs > <?xml version='1.0'?> > <!DOCTYPE resource-agent SYSTEM 'ra-api-1.dtd'> > <resource-agent name='nfs' version='0.1'> > <version>1.0</version> > <longdesc lang='en'> > NFS is a popular protocol for file sharing across networks. > This service provides NFS server functionality, which is \ > configured via the /etc/exports file. > > </longdesc> > <shortdesc lang='en'>nfs</shortdesc> > <parameters> > </parameters> > <actions> > <action name='meta-data' timeout='5' /> > <action name='start' timeout='15' /> > <action name='stop' timeout='15' /> > <action name='status' timeout='15' /> > <action name='restart' timeout='15' /> > <action name='force-reload' timeout='15' /> > <action name='monitor' timeout='15' interval='15' /> > </actions> > <special tag='LSB'> > <Provides></Provides> > <Required-Start></Required-Start> > <Required-Stop></Required-Stop> > <Should-Start></Should-Start> > <Should-Stop></Should-Stop> > <Default-Start></Default-Start> > <Default-Stop></Default-Stop> > </special> > </resource-agent> > > > Fixed in https://github.com/beekhof/pacemaker/commit/644752e > >> this is a pacemaker/cman cluster on centos 6.4 >> >> pacemaker-libs-1.1.8-7.el6.x86_64 >> pacemaker-cluster-libs-1.1.8-7.el6.x86_64 >> pacemaker-1.1.8-7.el6.x86_64 >> pacemaker-cli-1.1.8-7.el6.x86_64 >> cman-3.0.12.1-49.el6_4.2.x86_64 >> >> pcs config >> Corosync Nodes: >> >> Pacemaker Nodes: >> server3.<domain.com> server4.<domain.com> >> >> Resources: >> Master: DatiClone >> Resource: Dati (provider=linbit type=drbd class=ocf) >> Attributes: drbd_resource=dati >> Operations: monitor interval=120s >> Resource: DatiFs (provider=heartbeat type=Filesystem class=ocf) >> Attributes: device=/dev/drbd/by-res/dati directory=/srv/dati fstype=ext4 >> options=noatime,nodiratime,nodev run_fsck=force >> Resource: ClusterIp (provider=heartbeat type=IPaddr2 class=ocf) >> Attributes: ip=172.16.20.9 cidr_netmask=32 >> Operations: monitor interval=60s >> Resource: Smb (type=smb class=service) >> Operations: monitor interval=1min >> Resource: Nmb (type=nmb class=service) >> Operations: monitor interval=1min >> Resource: PgSQL (type=postgresql class=service) >> Operations: monitor interval=1min >> Resource: SmbManager (type=smbmanager class=service) >> Operations: monitor interval=5min >> Resource: ipmi-fencing3 (type=fence_ipmilan class=stonith) >> Attributes: pcmk_host_list=server3.<domain.com>.com ipaddr=172.16.20.6 >> login=root passwd=pwd123 lanplus=1 >> Operations: monitor interval=60s >> Resource: ipmi-fencing4 (type=fence_ipmilan class=stonith) >> Attributes: pcmk_host_list=server4.<domain.com> ipaddr=172.16.20.7 >> login=root passwd=pwd123 lanplus=1 >> Operations: monitor interval=60s >> >> Location Constraints: >> Resource: ipmi-fencing4 >> Disabled on: server4.<domain.com> >> Resource: ipmi-fencing3 >> Disabled on: server3.<domain.com> >> Ordering Constraints: >> start ClusterIp then start Smb >> start Nmb then start Smb >> promote DatiClone then start DatiFs >> start DatiFs then start Nmb >> start DatiFs then start PgSQL >> start PgSQL then start SmbManager >> Colocation Constraints: >> ClusterIp with Smb >> Smb with Nmb >> Smb with DatiFs >> DatiFs with DatiClone (with-rsc-role:Master) >> PgSQL with DatiFs >> SmbManager with DatiFs >> >> Cluster Properties: >> dc-version: 1.1.8-7.el6-394e906 >> cluster-infrastructure: cman >> no-quorum-policy: ignore >> stonith-enabled: true >> >> >>> >>>> is it safe to remove the files older than a months with a cron script? >>>> >>> Yes >> >> ok thanks, >> Nicola >> >>> >>>> thanks >>>> Nicola >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: >>>> Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> >>>> Project Home: >>>> http://www.clusterlabs.org >>>> >>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> >>>> Bugs: >>>> http://bugs.clusterlabs.org >> > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org