On Fri, 05 Sep 2014 16:23:13 +0200 Josef Johansson wrote:
> Hi,
>
> How do you guys monitor the cluster to find disks that behave bad, or
> VMs that impact the Ceph cluster?
>
> I'm looking for something where I could get a good bird-view of
> latency/throughput, that uses something easy like SN
On Fri, 5 Sep 2014 09:42:02 + Dan Van Der Ster wrote:
>
> > On 05 Sep 2014, at 11:04, Christian Balzer wrote:
> >
> > On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrote:
> >>
> >>> On 05 Sep 2014, at 03:09, Christian Balzer wrote:
> >>>
> >>> On Thu, 4 Sep 2014 14:49:39 -0700 Craig
Also putting this on the list.
On 06 Sep 2014, at 13:36, Josef Johansson wrote:
> Hi,
>
> Same issues again, but I think we found the drive that causes the problems.
>
> But this is causing problems as it’s trying to do a recover to that osd at
> the moment.
>
> So we’re left with the status
Hello,
On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote:
> Also putting this on the list.
>
> On 06 Sep 2014, at 13:36, Josef Johansson wrote:
>
> > Hi,
> >
> > Same issues again, but I think we found the drive that causes the
> > problems.
> >
> > But this is causing problems as it’
Hi,
On 06 Sep 2014, at 13:53, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 13:37:25 +0200 Josef Johansson wrote:
>
>> Also putting this on the list.
>>
>> On 06 Sep 2014, at 13:36, Josef Johansson wrote:
>>
>>> Hi,
>>>
>>> Same issues again, but I think we found the drive tha
Actually, it only worked with restarting for a period of time to get the
recovering process going. Can’t get passed the 21k object mark.
I’m uncertain if the disk really is messing this up right now as well. So I’m
not glad to start moving 300k objects around.
Regards,
Josef
On 06 Sep 2014, a
Hi Christian,
Let's keep debating until a dev corrects us ;)
September 6 2014 1:27 PM, "Christian Balzer" wrote:
> On Fri, 5 Sep 2014 09:42:02 + Dan Van Der Ster wrote:
>
>>> On 05 Sep 2014, at 11:04, Christian Balzer wrote:
>>>
>>> On Fri, 5 Sep 2014 07:46:12 + Dan Van Der Ster wrot
FWI I did restart the OSDs until I saw a server that made impact. Until that
server stopped doing impact, I didn’t get lower in the number objects being
degraded.
After a while it was done with recovering that OSD and happily started with
others.
I guess I will be seeing the same behaviour when
On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote:
> Hi Christian,
>
> Let's keep debating until a dev corrects us ;)
>
For the time being, I give the recent:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg12203.html
And not so recent:
http://www.spinics.net/lists/ceph-users/
Hi,
Could you give some more detail infos such as operation before occur errors?
And what's your ceph version?
On Fri, Sep 5, 2014 at 3:16 PM, 廖建锋 wrote:
> Dear CEPH ,
> Urgent question, I met a "FAILED assert(0 == "unexpected error")"
> yesterday , Now i have not way to start
Hi,
Could you give some more detail infos such as operation before occur errors?
And what's your ceph version?
On Fri, Sep 5, 2014 at 3:16 PM, 廖建锋 wrote:
> Dear CEPH ,
> Urgent question, I met a "FAILED assert(0 == "unexpected error")"
> yesterday , Now i have not way to start this OSDS
Sorry for the late message, I'm back from a short vacation. I would
like to try it this weekends. Thanks for your patient :-)
On Wed, Sep 3, 2014 at 9:16 PM, Kenneth Waegeman
wrote:
> I also can reproduce it on a new slightly different set up (also EC on KV
> and Cache) by running ceph pg scrub o
September 6 2014 4:01 PM, "Christian Balzer" wrote:
> On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote:
>
>> Hi Christian,
>>
>> Let's keep debating until a dev corrects us ;)
>
> For the time being, I give the recent:
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg1220
Hello,
On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
> Hello Cephers,
>
> We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of the
> stuff seems to be working fine but we are seeing some degrading on the
> osd's due to lack of space on the osd's.
Please elaborate on that
We manage to go through the restore, but the performance degradation is still
there.
Looking through the OSDs to pinpoint a source of the degradation and hoping the
current load will be lowered.
I’m a bit afraid of doing the 0 to weight of an OSD, wouldn’t it be tough if
the degradation is sti
Hello,
On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
> We manage to go through the restore, but the performance degradation is
> still there.
>
Manifesting itself how?
> Looking through the OSDs to pinpoint a source of the degradation and
> hoping the current load will be lowered.
>
On Sat, 6 Sep 2014 14:50:20 + Dan van der Ster wrote:
> September 6 2014 4:01 PM, "Christian Balzer" wrote:
> > On Sat, 6 Sep 2014 13:07:27 + Dan van der Ster wrote:
> >
> >> Hi Christian,
> >>
> >> Let's keep debating until a dev corrects us ;)
> >
> > For the time being, I give the
Hi,
On 06 Sep 2014, at 17:27, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
>
>> We manage to go through the restore, but the performance degradation is
>> still there.
>>
> Manifesting itself how?
>
Awful slow io on the VMs, and iowait, it’
Hi,
Just realised that it could also be with a popularity bug as well and lots a
small traffic. And seeing that it’s fast it gets popular until it hits the curb.
I’m seeing this in the stats I think.
Linux 3.13-0.bpo.1-amd64 (osd1) 09/06/2014 _x86_64_(24 CPU)
09/06/2014 05
Hello,
On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote:
> Hi,
>
> On 06 Sep 2014, at 17:27, Christian Balzer wrote:
>
> >
> > Hello,
> >
> > On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
> >
> >> We manage to go through the restore, but the performance degradation
> >> i
Hello,
On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote:
> Hi,
>
> Just realised that it could also be with a popularity bug as well and
> lots a small traffic. And seeing that it’s fast it gets popular until it
> hits the curb.
>
I don't think I ever heard the term "popularity bug" bef
Backing up slightly, have you considered RAID 5 over your SSDs?
Practically speaking, there's no performance downside to RAID 5 when your
devices aren't IOPS-bound.
On Sat Sep 06 2014 at 8:37:56 AM Christian Balzer wrote:
> On Sat, 6 Sep 2014 14:50:20 + Dan van der Ster wrote:
>
> > Septemb
Hi,
On 06 Sep 2014, at 18:05, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 17:52:59 +0200 Josef Johansson wrote:
>
>> Hi,
>>
>> Just realised that it could also be with a popularity bug as well and
>> lots a small traffic. And seeing that it’s fast it gets popular until it
>> hi
Hi,
On 06 Sep 2014, at 17:59, Christian Balzer wrote:
>
> Hello,
>
> On Sat, 6 Sep 2014 17:41:02 +0200 Josef Johansson wrote:
>
>> Hi,
>>
>> On 06 Sep 2014, at 17:27, Christian Balzer wrote:
>>
>>>
>>> Hello,
>>>
>>> On Sat, 6 Sep 2014 17:10:11 +0200 Josef Johansson wrote:
>>>
We m
On Sat, 06 Sep 2014 16:06:56 + Scott Laird wrote:
> Backing up slightly, have you considered RAID 5 over your SSDs?
> Practically speaking, there's no performance downside to RAID 5 when
> your devices aren't IOPS-bound.
>
Well...
For starters with RAID5 you would loose 25% throughput in bo
RAID5... Hadn't considered it due to the IOPS penalty (it would get 1/4th of
the IOPS of separated journal devices, according to some online raid calc).
Compared to RAID10, I guess we'd get 50% more capacity, but lower performance.
After the anecdotes that the DCS3700 is very rarely failing, and
Have you set the open file descriptor limit in the OSD node ?
Try setting it like 'ulimit -n 65536"
-Original Message-
From: ceph-devel-ow...@vger.kernel.org
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Haomai Wang
Sent: Saturday, September 06, 2014 7:44 AM
To: 廖建锋
Cc: ceph-user
Thanks Christian. Replies inline.
On Sep 6, 2014, at 8:04 AM, Christian Balzer wrote:
>
> Hello,
>
> On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
>
>> Hello Cephers,
>>
>> We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of the
>> stuff seems to be working fine but we
Hi,
Unfortunatly the journal tuning did not do much. That’s odd, because I don’t
see much utilisation on OSDs themselves. Now this leads to a network-issue
between the OSDs right?
On 06 Sep 2014, at 18:17, Josef Johansson wrote:
> Hi,
>
> On 06 Sep 2014, at 17:59, Christian Balzer wrote:
>
On 06 Sep 2014, at 19:37, Josef Johansson wrote:
> Hi,
>
> Unfortunatly the journal tuning did not do much. That’s odd, because I don’t
> see much utilisation on OSDs themselves. Now this leads to a network-issue
> between the OSDs right?
>
To answer my own question. Restarted a bond and it
IOPS are weird things with SSDs. In theory, you'd see 25% of the write
IOPS when writing to a 4-way RAID5 device, since you write to all 4 devices
in parallel. Except that's not actually true--unlike HDs where an IOP is
an IOP, SSD IOPS limits are really just a function of request size.
Because
On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote:
>
> On 06 Sep 2014, at 19:37, Josef Johansson wrote:
>
> > Hi,
> >
> > Unfortunatly the journal tuning did not do much. That’s odd, because I
> > don’t see much utilisation on OSDs themselves. Now this leads to a
> > network-issue betwee
I use latest version 0.80.6
I am setting the limitation now, and watching?
发件人: Somnath Roy [somnath@sandisk.com]
发送时间: 2014年9月7日 1:12
到: Haomai Wang; 廖建锋
Cc: ceph-users; ceph-devel
主题: RE: [ceph-users] ceph osd unexpected error
Have you set the open
Hello,
On Sat, 06 Sep 2014 10:28:19 -0700 JIten Shah wrote:
> Thanks Christian. Replies inline.
> On Sep 6, 2014, at 8:04 AM, Christian Balzer wrote:
>
> >
> > Hello,
> >
> > On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
> >
> >> Hello Cephers,
> >>
> >> We created a ceph cluster w
Yes, if you still meet this error, please add
"debug_keyvaluestore=20/20" to your config and catch the debug output
On Sun, Sep 7, 2014 at 11:11 AM, 廖建锋 wrote:
> I use latest version 0.80.6
> I am setting the limitation now, and watching?
>
>
>
> 发件人: Som
it happend this morning, i can not wait, so I remove and add osd again
next time I will set debug level up when it happend again
thanks very much
发件人: Haomai Wang [haomaiw...@gmail.com]
发送时间: 2014年9月7日 12:08
到: 廖建锋
Cc: Somnath Roy; ceph-users; ceph-devel
On 07 Sep 2014, at 04:47, Christian Balzer wrote:
> On Sat, 6 Sep 2014 19:47:13 +0200 Josef Johansson wrote:
>
>>
>> On 06 Sep 2014, at 19:37, Josef Johansson wrote:
>>
>>> Hi,
>>>
>>> Unfortunatly the journal tuning did not do much. That’s odd, because I
>>> don’t see much utilisation on O
37 matches
Mail list logo