Re: [ceph-users] Good way to monitor detailed latency/throughput

2014-09-06 Thread Christian Balzer
On Fri, 05 Sep 2014 16:23:13 +0200 Josef Johansson wrote: > Hi, > > How do you guys monitor the cluster to find disks that behave bad, or > VMs that impact the Ceph cluster? > > I'm looking for something where I could get a good bird-view of > latency/throughput, that uses something easy like SN

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Fri, 5 Sep 2014 09:42:02 + Dan Van Der Ster wrote: > > > On 05 Sep 2014, at 11:04, Christian Balzer wrote: > > > > On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrote: > >> > >>> On 05 Sep 2014, at 03:09, Christian Balzer wrote: > >>> > >>> On Thu, 4 Sep 2014 14:49:39 -0700 Craig

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Also putting this on the list. On 06 Sep 2014, at 13:36, Josef Johansson wrote: > Hi, > > Same issues again, but I think we found the drive that causes the problems. > > But this is causing problems as it’s trying to do a recover to that osd at > the moment. > > So we’re left with the status

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote: > Also putting this on the list. > > On 06 Sep 2014, at 13:36, Josef Johansson wrote: > > > Hi, > > > > Same issues again, but I think we found the drive that causes the > > problems. > > > > But this is causing problems as it’

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 13:53, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote: > >> Also putting this on the list. >> >> On 06 Sep 2014, at 13:36, Josef Johansson wrote: >> >>> Hi, >>> >>> Same issues again, but I think we found the drive tha

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Actually, it only worked with restarting for a period of time to get the recovering process going. Can’t get passed the 21k object mark. I’m uncertain if the disk really is messing this up right now as well. So I’m not glad to start moving 300k objects around. Regards, Josef On 06 Sep 2014, a

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Dan van der Ster
Hi Christian, Let's keep debating until a dev corrects us ;) September 6 2014 1:27 PM, "Christian Balzer" wrote: > On Fri, 5 Sep 2014 09:42:02 + Dan Van Der Ster wrote: > >>> On 05 Sep 2014, at 11:04, Christian Balzer wrote: >>> >>> On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrot

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
FWI I did restart the OSDs until I saw a server that made impact. Until that server stopped doing impact, I didn’t get lower in the number objects being degraded. After a while it was done with recovering that OSD and happily started with others. I guess I will be seeing the same behaviour when

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote: > Hi Christian, > > Let's keep debating until a dev corrects us ;) > For the time being, I give the recent: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg12203.html And not so recent: http://www.spinics.net/lists/ceph-users/

Re: [ceph-users] ceph osd unexpected error

2014-09-06 Thread Haomai Wang
Hi, Could you give some more detail infos such as operation before occur errors? And what's your ceph version? On Fri, Sep 5, 2014 at 3:16 PM, 廖建锋 wrote: > Dear CEPH , > Urgent question, I met a "FAILED assert(0 == "unexpected error")" > yesterday , Now i have not way to start

Re: [ceph-users] ceph osd unexpected error

2014-09-06 Thread Haomai Wang
Hi, Could you give some more detail infos such as operation before occur errors? And what's your ceph version? On Fri, Sep 5, 2014 at 3:16 PM, 廖建锋 wrote: > Dear CEPH , > Urgent question, I met a "FAILED assert(0 == "unexpected error")" > yesterday , Now i have not way to start this OSDS

Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-06 Thread Haomai Wang
Sorry for the late message, I'm back from a short vacation. I would like to try it this weekends. Thanks for your patient :-) On Wed, Sep 3, 2014 at 9:16 PM, Kenneth Waegeman wrote: > I also can reproduce it on a new slightly different set up (also EC on KV > and Cache) by running ceph pg scrub o

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Dan van der Ster
September 6 2014 4:01 PM, "Christian Balzer" wrote: > On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote: > >> Hi Christian, >> >> Let's keep debating until a dev corrects us ;) > > For the time being, I give the recent: > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg1220

Re: [ceph-users] resizing the OSD

2014-09-06 Thread Christian Balzer
Hello, On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote: > Hello Cephers, > > We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of the > stuff seems to be working fine but we are seeing some degrading on the > osd's due to lack of space on the osd's. Please elaborate on that

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
We manage to go through the restore, but the performance degradation is still there. Looking through the OSDs to pinpoint a source of the degradation and hoping the current load will be lowered. I’m a bit afraid of doing the 0 to weight of an OSD, wouldn’t it be tough if the degradation is sti

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: > We manage to go through the restore, but the performance degradation is > still there. > Manifesting itself how? > Looking through the OSDs to pinpoint a source of the degradation and > hoping the current load will be lowered. >

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Sat, 6 Sep 2014 14:50:20 + Dan van der Ster wrote: > September 6 2014 4:01 PM, "Christian Balzer" wrote: > > On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote: > > > >> Hi Christian, > >> > >> Let's keep debating until a dev corrects us ;) > > > > For the time being, I give the

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 17:27, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: > >> We manage to go through the restore, but the performance degradation is >> still there. >> > Manifesting itself how? > Awful slow io on the VMs, and iowait, it’

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, Just realised that it could also be with a popularity bug as well and lots a small traffic. And seeing that it’s fast it gets popular until it hits the curb. I’m seeing this in the stats I think. Linux 3.13-0.bpo.1-amd64 (osd1) 09/06/2014 _x86_64_(24 CPU) 09/06/2014 05

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote: > Hi, > > On 06 Sep 2014, at 17:27, Christian Balzer wrote: > > > > > Hello, > > > > On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: > > > >> We manage to go through the restore, but the performance degradation > >> i

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
Hello, On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote: > Hi, > > Just realised that it could also be with a popularity bug as well and > lots a small traffic. And seeing that it’s fast it gets popular until it > hits the curb. > I don't think I ever heard the term "popularity bug" bef

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Scott Laird
Backing up slightly, have you considered RAID 5 over your SSDs? Practically speaking, there's no performance downside to RAID 5 when your devices aren't IOPS-bound. On Sat Sep 06 2014 at 8:37:56 AM Christian Balzer wrote: > On Sat, 6 Sep 2014 14:50:20 + Dan van der Ster wrote: > > > Septemb

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 18:05, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote: > >> Hi, >> >> Just realised that it could also be with a popularity bug as well and >> lots a small traffic. And seeing that it’s fast it gets popular until it >> hi

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, On 06 Sep 2014, at 17:59, Christian Balzer wrote: > > Hello, > > On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote: > >> Hi, >> >> On 06 Sep 2014, at 17:27, Christian Balzer wrote: >> >>> >>> Hello, >>> >>> On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote: >>> We m

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Christian Balzer
On Sat, 06 Sep 2014 16:06:56 + Scott Laird wrote: > Backing up slightly, have you considered RAID 5 over your SSDs? > Practically speaking, there's no performance downside to RAID 5 when > your devices aren't IOPS-bound. > Well... For starters with RAID5 you would loose 25% throughput in bo

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Dan Van Der Ster
RAID5... Hadn't considered it due to the IOPS penalty (it would get 1/4th of the IOPS of separated journal devices, according to some online raid calc). Compared to RAID10, I guess we'd get 50% more capacity, but lower performance. After the anecdotes that the DCS3700 is very rarely failing, and

Re: [ceph-users] ceph osd unexpected error

2014-09-06 Thread Somnath Roy
Have you set the open file descriptor limit in the OSD node ? Try setting it like 'ulimit -n 65536" -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Haomai Wang Sent: Saturday, September 06, 2014 7:44 AM To: 廖建锋 Cc: ceph-user

Re: [ceph-users] resizing the OSD

2014-09-06 Thread JIten Shah
Thanks Christian. Replies inline. On Sep 6, 2014, at 8:04 AM, Christian Balzer wrote: > > Hello, > > On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote: > >> Hello Cephers, >> >> We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of the >> stuff seems to be working fine but we

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
Hi, Unfortunatly the journal tuning did not do much. That’s odd, because I don’t see much utilisation on OSDs themselves. Now this leads to a network-issue between the OSDs right? On 06 Sep 2014, at 18:17, Josef Johansson wrote: > Hi, > > On 06 Sep 2014, at 17:59, Christian Balzer wrote: >

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
On 06 Sep 2014, at 19:37, Josef Johansson wrote: > Hi, > > Unfortunatly the journal tuning did not do much. That’s odd, because I don’t > see much utilisation on OSDs themselves. Now this leads to a network-issue > between the OSDs right? > To answer my own question. Restarted a bond and it

Re: [ceph-users] SSD journal deployment experiences

2014-09-06 Thread Scott Laird
IOPS are weird things with SSDs. In theory, you'd see 25% of the write IOPS when writing to a 4-way RAID5 device, since you write to all 4 devices in parallel. Except that's not actually true--unlike HDs where an IOP is an IOP, SSD IOPS limits are really just a function of request size. Because

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Christian Balzer
On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote: > > On 06 Sep 2014, at 19:37, Josef Johansson wrote: > > > Hi, > > > > Unfortunatly the journal tuning did not do much. That’s odd, because I > > don’t see much utilisation on OSDs themselves. Now this leads to a > > network-issue betwee

[ceph-users] 答复: ceph osd unexpected error

2014-09-06 Thread 廖建锋
I use latest version 0.80.6 I am setting the limitation now, and watching? 发件人: Somnath Roy [somnath@sandisk.com] 发送时间: 2014年9月7日 1:12 到: Haomai Wang; 廖建锋 Cc: ceph-users; ceph-devel 主题: RE: [ceph-users] ceph osd unexpected error Have you set the open

Re: [ceph-users] resizing the OSD

2014-09-06 Thread Christian Balzer
Hello, On Sat, 06 Sep 2014 10:28:19 -0700 JIten Shah wrote: > Thanks Christian. Replies inline. > On Sep 6, 2014, at 8:04 AM, Christian Balzer wrote: > > > > > Hello, > > > > On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote: > > > >> Hello Cephers, > >> > >> We created a ceph cluster w

Re: [ceph-users] 答复: ceph osd unexpected error

2014-09-06 Thread Haomai Wang
Yes, if you still meet this error, please add "debug_keyvaluestore=20/20" to your config and catch the debug output On Sun, Sep 7, 2014 at 11:11 AM, 廖建锋 wrote: > I use latest version 0.80.6 > I am setting the limitation now, and watching? > > > > 发件人: Som

[ceph-users] 答复: 答复: ceph osd unexpected error

2014-09-06 Thread 廖建锋
it happend this morning, i can not wait, so I remove and add osd again next time I will set debug level up when it happend again thanks very much 发件人: Haomai Wang [haomaiw...@gmail.com] 发送时间: 2014年9月7日 12:08 到: 廖建锋 Cc: Somnath Roy; ceph-users; ceph-devel

Re: [ceph-users] Huge issues with slow requests

2014-09-06 Thread Josef Johansson
On 07 Sep 2014, at 04:47, Christian Balzer wrote: > On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote: > >> >> On 06 Sep 2014, at 19:37, Josef Johansson wrote: >> >>> Hi, >>> >>> Unfortunatly the journal tuning did not do much. That’s odd, because I >>> don’t see much utilisation on O