Thanks guys, I already disabled all cron jobs and everythgin (did not disable logrotate though...) - will share my findings.
Thanks a lot for hint. On 23 February 2015 at 17:30, Simon Weller <swel...@ena.com> wrote: > I agree with Marcus. I suggest you start monitoring everything that's > going on around this time frame. > Maybe dump available memory, and IO (both disk and network) to a file > every minute or so, and see if you can correlate it to something in > particular that might be happening on the underlying server, or the network > connectivity to that server. Maybe slowly move VMs one at a time to a > different host and see if the issue follows a particular VM. > > In the mean time in order to reduce the affect of this problem, you could > use a process monitoring like Monit to watch the PID and restart > cloudstack-agent if a failure is detected. > > - Si > > ________________________________________ > From: Marcus <shadow...@gmail.com> > Sent: Monday, February 23, 2015 10:21 AM > To: dev@cloudstack.apache.org > Cc: us...@cloudstack.apache.org > Subject: Re: Agent dies every night/morning.... memory violation > > It doesn't really sound like an agent problem, but some other root > problem that is causing issues for the agent. Perhaps it is specific > to the host simply because there is a particular VM that always runs > on that host and the VM itself is triggering the issue. Perhaps a > heavy logrotate or cron job on the vm causes issues for librados. Just > grasping at straws here. From the output provided it does seem that > the libvirt bindings that include ceph code are terminating the agent > execution. My guess is that if you focus on "why this host" as > opposed to "what's going on", you'll find the answer to both. Sorry, I > know that's not much help. > > On Mon, Feb 23, 2015 at 7:29 AM, Andrija Panic <andrija.pa...@gmail.com> > wrote: > > Anybody?, before I start to cry :( > > > > On 21 February 2015 at 21:18, Andrija Panic <andrija.pa...@gmail.com> > wrote: > > > >> HI Simon, > >> > >> selinux is disabled, I have just double checked. > >> > >> BTW, this is what I can see in the cloudstack-agent.err log - seems like > >> some CEPH related issues, but not sure why would agent die... > >> If I recall correclty, this might be happening since the CEPH update > from > >> 0.80.3? to 0.87 - and this seesm like some crash in librados.... > >> > >> > >> libust[1907/2046]: Warning: HOME environment variable not set. Disabling > >> LTTng-UST per-user tracing. (in setup_local_apps() at > lttng-ust-comm.c:305) > >> libvirt: error : name in virDomainLookupByName must not be NULL > >> libvirt: error : name in virDomainLookupByName must not be NULL > >> libvirt: error : name in virDomainLookupByName must not be NULL > >> libvirt: error : name in virDomainLookupByName must not be NULL > >> libvirt: Storage Driver error : failed to remove volume > >> 'cloudstack/bd751250-de35-4d2e-a4e3-3ee4b636c2a7': Device or resource > busy > >> ./log/SubsystemMap.h: In function 'bool > >> ceph::log::SubsystemMap::should_gather(unsigned int, int)' thread > >> 7f04427fc700 time 2015-02-21 06:39:38.839210 > >> ./log/SubsystemMap.h: 62: FAILED assert(sub < m_subsys.size()) > >> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) > >> 1: (()+0x1fe223) [0x7f060c932223] > >> 2: (ObjectCacher::flusher_entry()+0x155) [0x7f060c9866e5] > >> 3: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f060c9976cd] > >> 4: (()+0x79d1) [0x7f06605ee9d1] > >> 5: (clone()+0x6d) [0x7f066033bb5d] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed > >> to interpret this. > >> terminate called after throwing an instance of 'ceph::FailedAssertion' > >> 21/02/2015 06:39:38 1905 jsvc.exec error: Service did not exit cleanly > >> > >> On 20 February 2015 at 21:56, Simon Weller <swel...@ena.com> wrote: > >> > >>> Andrija, > >>> > >>> What is SELinux set to on this host? > >>> > >>> > >>> - SI > >>> > >>> > >>> ________________________________________ > >>> From: Andrija Panic <andrija.pa...@gmail.com> > >>> Sent: Friday, February 20, 2015 6:06 AM > >>> To: dev@cloudstack.apache.org; us...@cloudstack.apache.org > >>> Subject: Agent dies every night/morning.... memory violation > >>> > >>> Hi, > >>> > >>> I have crazy agent on one of the hosts, that is being killed each > morning > >>> and I found this in /var/log/audit.log: > >>> > >>> type=ANOM_ABEND msg=audit(1424321463.930:430678): auid=0 uid=0 gid=0 > >>> ses=68891 pid=10831 comm="jsvc" reason="memory violation" sig=6 > >>> > >>> I dont remember changing anything on the system, but this keeps > happening > >>> each morning arrond same time 5.20am-5.40am. > >>> > >>> I'm wondering what the hack is happening, any suggestions where to > >>> troubleshoot ? > >>> Will check logs in details anyway... > >>> > >>> -- > >>> > >>> Andrija Panić > >>> > >> > >> > >> > >> -- > >> > >> Andrija Panić > >> > > > > > > > > -- > > > > Andrija Panić > -- Andrija Panić