Maybe you could try to execute the monitor probes in the node, 1. ssh the node 2. Go to /var/tmp/one/im 3. Execute run_probes kvm-probes
Make sure you do not have a host using the same hostname fgtest14 and running a collectd process On Jul 29, 2014 4:35 PM, "Steven Timm" <[email protected]> wrote: > > I am still trying to debug a nasty monitoring inconsistency. > > -bash-4.1$ onevm list | grep fgtest14 > 26 oneadmin oneadmin fgt6x4-26 runn 6 4G fgtest14 117d > 19h50 > 27 oneadmin oneadmin fgt5x4-27 runn 10 4G fgtest14 117d > 17h57 > 28 oneadmin oneadmin fgt1x1-28 runn 10 4.1G fgtest14 117d > 16h59 > 30 oneadmin oneadmin fgt5x1-30 runn 0 4G fgtest14 116d > 23h50 > 33 oneadmin oneadmin ip6sl5vda-33 runn 6 4G fgtest14 116d > 19h57 > -bash-4.1$ onehost list > ID NAME CLUSTER RVM ALLOCATED_CPU ALLOCATED_MEM > STAT > 3 fgtest11 ipv6 0 0 / 400 (0%) 0K / 15.7G (0%) on > 4 fgtest12 ipv6 0 0 / 400 (0%) 0K / 15.7G (0%) on > 7 fgtest13 ipv6 0 0 / 800 (0%) 0K / 23.6G (0%) on > 8 fgtest14 ipv6 5 0 / 800 (0%) 0K / 23.6G (0%) on > 9 fgtest20 ipv6 3 300 / 800 (37%) 12G / 31.4G (38%) on > 11 fgtest19 ipv6 0 0 / 800 (0%) 0K / 31.5G (0%) on > -bash-4.1$ onehost show 8 > HOST 8 INFORMATION > ID : 8 > NAME : fgtest14 > CLUSTER : ipv6 > STATE : MONITORED > IM_MAD : kvm > VM_MAD : kvm > VN_MAD : dummy > LAST MONITORING TIME : 07/29 09:25:45 > > HOST SHARES > TOTAL MEM : 23.6G > USED MEM (REAL) : 876.4M > USED MEM (ALLOCATED) : 0K > TOTAL CPU : 800 > USED CPU (REAL) : 0 > USED CPU (ALLOCATED) : 0 > RUNNING VMS : 5 > > LOCAL SYSTEM DATASTORE #102 CAPACITY > TOTAL: : 548.8G > USED: : 175.3G > FREE: : 345.6G > > MONITORING INFORMATION > ARCH="x86_64" > CPUSPEED="2992" > HOSTNAME="fgtest14.fnal.gov" > HYPERVISOR="kvm" > MODELNAME="Intel(R) Xeon(R) CPU E5450 @ 3.00GHz" > NETRX="234844577" > NETTX="21553126" > RESERVED_CPU="" > RESERVED_MEM="" > VERSION="4.6.0" > > VIRTUAL MACHINES > > ID USER GROUP NAME STAT UCPU UMEM HOST TIME > 26 oneadmin oneadmin fgt6x4-26 runn 6 4G fgtest14 117d > 19h50 > 27 oneadmin oneadmin fgt5x4-27 runn 10 4G fgtest14 117d > 17h57 > 28 oneadmin oneadmin fgt1x1-28 runn 10 4.1G fgtest14 117d > 17h00 > 30 oneadmin oneadmin fgt5x1-30 runn 0 4G fgtest14 116d > 23h50 > 33 oneadmin oneadmin ip6sl5vda-33 runn 6 4G fgtest14 116d > 19h57 > ------------------------------------------------------------ > ----------------------- > > All of this looks great, right? > Just one problem: There are no VM's running on fgtest14 and > haven't been for 4 days. > > [root@fgtest14 ~]# virsh list > Id Name State > ---------------------------------------------------- > > [root@fgtest14 ~]# > > ------------------------------------------------------------------------- > Yet the monitoring reports no errors. > > Tue Jul 29 09:28:10 2014 [InM][D]: Host fgtest14 (8) successfully > monitored. > > ------------------------------------------------------------ > ----------------- > At the same time, there is no evidence that ONE is actually trying to or > succeeding to monitor these five vm's yet they are still stuck in "runn" > which means I can't do a onevm restart to restart them. > (the vm images of these 5 vm's are still out there on the VM host and > I would like to save and restart them if I can). > > What is the remotes command that ONE4.6 would use to monitor this host? > Can I do it manually and see what output I get? > > Are we dealing with some kind of a bug, or just a very confused system? > Any help is appreciated. I have to get this sorted out before > I dare deploy one4.x in production. > > Steve Timm > > > ------------------------------------------------------------------ > Steven C. Timm, Ph.D (630) 840-8525 > [email protected] http://home.fnal.gov/~timm/ > Fermilab Scientific Computing Division, Scientific Computing Services Quad. > Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing > _______________________________________________ > Users mailing list > [email protected] > http://lists.opennebula.org/listinfo.cgi/users-opennebula.org >
_______________________________________________ Users mailing list [email protected] http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
