Hi Erik,
Is your mon still running properly?
Mark
On 08/01/2013 05:06 PM, Erik Logtenberg wrote:
Hi,
I think the high CPU usage was due to the system time not being right. I
activated ntp and it had to do quite big adjustment, and after that the
high CPU usage was gone.
Anyway, I immediately ran into another issue. I ran a simple benchmark:
# rados bench --pool benchmark 300 write --no-cleanup
During the benchmark, one of my osd's went down. I checked the logs and
apparently there was no hardware failure (the disk is still nicely
mounted and the osd is still running, but the logfile fills up rapidly
with these messages:
2013-08-02 00:03:40.014982 7fe7336fd700 0 -- 192.168.1.15:6801/1229 >>
192.168.1.16:6801/3001 pipe(0x39e9680 sd=28 :36884 s=2 pgs=86874
cs=173547 l=0).fault, initiating reconnect
2013-08-02 00:03:40.016682 7fe7336fd700 0 -- 192.168.1.15:6801/1229 >>
192.168.1.16:6801/3001 pipe(0x39e9680 sd=28 :36885 s=2 pgs=86875
cs=173549 l=0).fault, initiating reconnect
2013-08-02 00:03:40.019241 7fe7336fd700 0 -- 192.168.1.15:6801/1229 >>
192.168.1.16:6801/3001 pipe(0x39e9680 sd=28 :36886 s=2 pgs=86876
cs=173551 l=0).fault, initiating reconnect
What could be wrong here?
King regards,
Erik.
On 08/01/2013 08:00 AM, Dan Mick wrote:
Logging might well help.
http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/
On 07/31/2013 03:51 PM, Erik Logtenberg wrote:
Hi,
I just added a second node to my ceph test platform. The first node has
a mon and three osd's, the second node only has three osd's. Adding the
osd's was pretty painless, and ceph distributed the data from the first
node evenly over both nodes so everything seems to be fine. The monitor
also thinks everything is fine:
2013-08-01 00:41:12.719640 mon.0 [INF] pgmap v1283: 292 pgs: 292
active+clean; 9264 MB data, 24826 MB used, 5541 GB / 5578 GB avail
Unfortunately, the three osd's on the second node keep eating a lot of
cpu, while there is no activity whatsoever:
PID USER VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21272 root 441440 34632 7848 S 61.8 0.9 4:08.62 ceph-osd
21145 root 439852 29316 8360 S 60.4 0.7 4:04.31 ceph-osd
21036 root 443828 31324 8336 S 60.1 0.8 4:07.55 ceph-osd
Any idea why that is and how I can even ask an osd what it's doing?
There is no corresponding hdd activity, it's only cpu and hardly any
memory usage.
Also the monitor on the first node is doing the same thing:
PID USER VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12825 root 186900 23492 5540 S 141.1 0.590 9:47.64 ceph-mon
I tried stopping the three osd's: that makes the monitor calm down, but
after restarting the osd's, the monitor resumes its cpu usage. I also
tried stopping the monitor, which makes the three osd's calm down, but
once again they will start eating cpu again as soon as the monitor is
back online.
In the mean time, the first three osd's, the ones on the same machine as
the monitor, don't behave like this at all. Currently as there is no
activity, they are just idling on low cpu usage, as expected.
Kind regards,
Erik.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com