I can see the is ok files are there *root@ceph1:/var/run/ceph# ls -la* *total 0* *drwxrwx--- 2 ceph ceph 80 Feb 1 10:51 .* *drwxr-xr-x 18 root root 640 Feb 1 10:52 ..* *srwxr-xr-x 1 ceph ceph 0 Feb 1 10:51 ceph-mon.ceph1.asok* *srwxr-xr-x 1 root root 0 Jan 27 15:08 ceph-osd.0.asok* *root@ceph1:/var/run/ceph#* *root@ceph1:/var/run/ceph#* *root@ceph1:/var/run/ceph#*
Running diamond in debug show the below [2016-02-01 10:55:23,774] [Thread-1] Collecting data from: NetworkCollector [2016-02-01 10:56:23,484] [Thread-1] Collecting data from: CPUCollector [2016-02-01 10:56:23,487] [Thread-6] Collecting data from: MemoryCollector [2016-02-01 10:56:23,489] [Thread-7] Collecting data from: SockstatCollector [2016-02-01 10:56:23,768] [Thread-1] Collecting data from: CephCollector [2016-02-01 10:56:23,768] [Thread-1] gathering service stats for /var/run/ceph/ceph-mon.ceph1.asok [2016-02-01 10:56:24,094] [Thread-1] Traceback (most recent call last): File "/usr/lib/pymodules/python2.7/diamond/collector.py", line 412, in _run self.collect() File "/usr/share/diamond/collectors/ceph/ceph.py", line 464, in collect self._collect_service_stats(path) File "/usr/share/diamond/collectors/ceph/ceph.py", line 450, in _collect_service_stats self._publish_stats(counter_prefix, stats, schema, GlobalName) File "/usr/share/diamond/collectors/ceph/ceph.py", line 305, in _publish_stats assert path[-1] == 'type' AssertionError [2016-02-01 10:56:24,096] [Thread-8] Collecting data from: LoadAverageCollector [2016-02-01 10:56:24,098] [Thread-1] Collecting data from: VMStatCollector [2016-02-01 10:56:24,099] [Thread-1] Collecting data from: DiskUsageCollector [2016-02-01 10:56:24,104] [Thread-9] Collecting data from: DiskSpaceCollector Check the md5 on the file returns the below: *root@ceph1:/var/run/ceph# md5sum /usr/share/diamond/collectors/ceph/ceph.py* *aeb3915f8ac7fdea61495805d2c99f33 /usr/share/diamond/collectors/ceph/ceph.py* *root@ceph1:/var/run/ceph#* I've found that replacing the ceph.py file with the below stops the diamond error https://raw.githubusercontent.com/BrightcoveOS/Diamond/master/src/collectors/ceph/ceph.py *root@ceph1:/usr/share/diamond/collectors/ceph# md5sum ceph.py* *13ac74ce0df39a5def879cb5fc530015 ceph.py* [2016-02-01 11:14:33,116] [Thread-42] Collecting data from: MemoryCollector [2016-02-01 11:14:33,117] [Thread-1] Collecting data from: CPUCollector [2016-02-01 11:14:33,123] [Thread-43] Collecting data from: SockstatCollector *[2016-02-01 11:14:35,453] [Thread-1] Collecting data from: CephCollector* *[2016-02-01 11:14:35,454] [Thread-1] checking /var/run/ceph/ceph-mon.ceph1.asok* *[2016-02-01 11:14:35,552] [Thread-1] checking /var/run/ceph/ceph-osd.0.asok* [2016-02-01 11:14:35,685] [Thread-44] Collecting data from: LoadAverageCollector [2016-02-01 11:14:35,686] [Thread-1] Collecting data from: VMStatCollector [2016-02-01 11:14:35,687] [Thread-1] Collecting data from: DiskUsageCollector [2016-02-01 11:14:35,692] [Thread-45] Collecting data from: DiskSpaceCollector But after all that it's still NOT working What diamond version are you running ? I'm running Diamond version 3.4.67 On Mon, Feb 1, 2016 at 11:01 PM, Daniel Rolfe <daniel.rolfe...@gmail.com> wrote: > I can see the is ok files are there > > *root@ceph1:/var/run/ceph# ls -la* > *total 0* > *drwxrwx--- 2 ceph ceph 80 Feb 1 10:51 .* > *drwxr-xr-x 18 root root 640 Feb 1 10:52 ..* > *srwxr-xr-x 1 ceph ceph 0 Feb 1 10:51 ceph-mon.ceph1.asok* > *srwxr-xr-x 1 root root 0 Jan 27 15:08 ceph-osd.0.asok* > *root@ceph1:/var/run/ceph#* > *root@ceph1:/var/run/ceph#* > *root@ceph1:/var/run/ceph#* > > > Running diamond in debug show the below > > [2016-02-01 10:55:23,774] [Thread-1] Collecting data from: NetworkCollector > [2016-02-01 10:56:23,484] [Thread-1] Collecting data from: CPUCollector > [2016-02-01 10:56:23,487] [Thread-6] Collecting data from: MemoryCollector > [2016-02-01 10:56:23,489] [Thread-7] Collecting data from: > SockstatCollector > [2016-02-01 10:56:23,768] [Thread-1] Collecting data from: CephCollector > [2016-02-01 10:56:23,768] [Thread-1] gathering service stats for > /var/run/ceph/ceph-mon.ceph1.asok > [2016-02-01 10:56:24,094] [Thread-1] Traceback (most recent call last): > File "/usr/lib/pymodules/python2.7/diamond/collector.py", line 412, in > _run > self.collect() > File "/usr/share/diamond/collectors/ceph/ceph.py", line 464, in collect > self._collect_service_stats(path) > File "/usr/share/diamond/collectors/ceph/ceph.py", line 450, in > _collect_service_stats > self._publish_stats(counter_prefix, stats, schema, GlobalName) > File "/usr/share/diamond/collectors/ceph/ceph.py", line 305, in > _publish_stats > assert path[-1] == 'type' > AssertionError > > [2016-02-01 10:56:24,096] [Thread-8] Collecting data from: > LoadAverageCollector > [2016-02-01 10:56:24,098] [Thread-1] Collecting data from: VMStatCollector > [2016-02-01 10:56:24,099] [Thread-1] Collecting data from: > DiskUsageCollector > [2016-02-01 10:56:24,104] [Thread-9] Collecting data from: > DiskSpaceCollector > > > > Check the md5 on the file returns the below: > > *root@ceph1:/var/run/ceph# md5sum > /usr/share/diamond/collectors/ceph/ceph.py* > *aeb3915f8ac7fdea61495805d2c99f33 > /usr/share/diamond/collectors/ceph/ceph.py* > *root@ceph1:/var/run/ceph#* > > > > I've found that replacing the ceph.py file with the below stops the > diamond error > > > > https://raw.githubusercontent.com/BrightcoveOS/Diamond/master/src/collectors/ceph/ceph.py > > *root@ceph1:/usr/share/diamond/collectors/ceph# md5sum ceph.py* > *13ac74ce0df39a5def879cb5fc530015 ceph.py* > > > [2016-02-01 11:14:33,116] [Thread-42] Collecting data from: MemoryCollector > [2016-02-01 11:14:33,117] [Thread-1] Collecting data from: CPUCollector > [2016-02-01 11:14:33,123] [Thread-43] Collecting data from: > SockstatCollector > *[2016-02-01 11:14:35,453] [Thread-1] Collecting data from: CephCollector* > *[2016-02-01 11:14:35,454] [Thread-1] checking > /var/run/ceph/ceph-mon.ceph1.asok* > *[2016-02-01 11:14:35,552] [Thread-1] checking > /var/run/ceph/ceph-osd.0.asok* > [2016-02-01 11:14:35,685] [Thread-44] Collecting data from: > LoadAverageCollector > [2016-02-01 11:14:35,686] [Thread-1] Collecting data from: VMStatCollector > [2016-02-01 11:14:35,687] [Thread-1] Collecting data from: > DiskUsageCollector > [2016-02-01 11:14:35,692] [Thread-45] Collecting data from: > DiskSpaceCollector > > > But after all that it's still now working > > What diamond version are you running ? > > I'm running Diamond version 3.4.67 > > > On Mon, Feb 1, 2016 at 12:24 PM, hnuzhoulin <hnuzhoul...@gmail.com> wrote: > >> Yes,in my environment I fix it. >> BTW,I check the md5 of ceph collection file.It is correct. >> >> 在 Sun, 31 Jan 2016 22:46:42 +0800,Daniel Rolfe <daniel.rolfe...@gmail.com> >> 写道: >> >> Hi, thanks for the reply >> >> Just to confirm , did you manage to fix this issue ? >> >> I've restarted the whole ceph cluster a few times. >> >> Sent from my iPhone >> >> On 1 Feb 2016, at 1:26 AM, hnuzhoulin <hnuzhoul...@gmail.com> wrote: >> >> I just face the same problem. >> >> The problem is my cluster missing the asok files of mons although the >> cluster works well. >> >> so kill mon process and restart it may fix it.(using service command to >> restart mon daemon may do not work) >> >> >> 在 Sun, 31 Jan 2016 10:35:25 +0800,Daniel Rolfe <daniel.rolfe...@gmail.com> >> 写道: >> >> Seem to be having an issue with global ceph stats getting back to calamari >> >> Individual node and osd stats are working >> >> If anyone can point me into the right direction that would be great >> >> https://github.com/ceph/calamari/issues/384 >> >> >> >> >> >> >> >> -- >> ------------------------- >> hnuzhoul...@gmail.com >> >> >> >> >> -- >> ------------------------- >> hnuzhou...@gmail.com >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com